2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
 
   3 SPDX-License-Identifier: CC0-1.0
 
   7 <b>Ladys simple markup language.</b>
 
   9 💄📝 Les·M·L is a document markup language designed with two goals in
 
  12 1. It must be trivial to parse, even with limited tooling such as that
 
  15 2. It must be sophisticated enough to handle longform hypertext
 
  16      documents and associated metadata.
 
  18 It is implemented as an X·S·L·T transformation from a
 
  19   `<html:script type="text/lesml">` element into H·T·M·L
 
  24 <i>Les·M·L</i> is an abbreviation of the phrase “Ladys Extremely Simple
 
  29 The first line of any 💄📝 Les·M·L document should be the string
 
  32 Following the shebang, document metadata may be provided in the [Record
 
  33   Jar][draft-phillips-record-jar-01] format.
 
  34 The body of the document begins after the last line which begins with
 
  35   the string `%%`, or after the shebang line if none exists.
 
  37 Documents are broken into paragraphs by blank lines.
 
  38 Empty paragraphs are ignored.
 
  39 Non·empty paragraphs are classified as follows :—
 
  41 - If the paragraph consists of only the following section‐break
 
  42     characters, plus any amount of white·space, then it is
 
  43     considered to be a section break (`<html:hr>`).
 
  45   The section break characters are :—
 
  47   | Character | Codepoint | Unicode Name |
 
  48   | --------- | --------- | ------------ |
 
  49   | `#` | `U+0023` | `NUMBER SIGN` |
 
  50   | `*` | `U+002A` | `ASTERISK` |
 
  51   | `-` | `U+002D` | `HYPHEN-MINUS` |
 
  52   | `.` | `U+002E` | `FULL STOP` |
 
  53   | `=` | `U+003D` | `EQUALS SIGN` |
 
  54   | `_` | `U+005F` | `LOW LINE` |
 
  55   | `~` | `U+007E` | `TILDE` |
 
  56   | `·` | `U+00B7` | `MIDDLE DOT` |
 
  57   | `․` | `U+2024` | `ONE DOT LEADER` |
 
  58   | `‥` | `U+2025` | `TWO DOT LEADER` |
 
  59   | `…` | `U+2026` | `HORIZONTAL ELLIPSIS` |
 
  60   | `⁂` | `U+2042` | `ASTERISM` |
 
  61   | `⋯` | `U+22EF` | `MIDLINE HORIZONTAL ELLIPSIS` |
 
  62   | `─` | `U+2500` | `BOX DRAWINGS LIGHT HORIZONTAL` |
 
  63   | `━` | `U+2501` | `BOX DRAWINGS HEAVY HORIZONTAL` |
 
  64   | `┄` | `U+2504` | `BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL` |
 
  65   | `┅` | `U+2505` | `BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL` |
 
  66   | `┈` | `U+2508` | `BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL` |
 
  67   | `┉` | `U+2509` | `BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL` |
 
  68   | `╌` | `U+254C` | `BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL` |
 
  69   | `╍` | `U+254D` | `BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL` |
 
  70   | `═` | `U+2550` | `BOX DRAWINGS DOUBLE HORIZONTAL` |
 
  71   | `╴` | `U+2574` | `BOX DRAWINGS LIGHT LEFT` |
 
  72   | `╶` | `U+2576` | `BOX DRAWINGS LIGHT RIGHT` |
 
  73   | `╸` | `U+2578` | `BOX DRAWINGS HEAVY LEFT` |
 
  74   | `╺` | `U+257A` | `BOX DRAWINGS HEAVY RIGHT` |
 
  75   | `☙` | `U+2619` | `REVERSED ROTATED FLORAL HEART BULLET` |
 
  76   | `❧` | `U+2767` | `ROTATED FLORAL HEART BULLET` |
 
  77   | ` ` | `U+3000` | `IDEOGRAPHIC SPACE` |
 
  78   | `・` | `U+30FB` | `KATAKANA MIDDLE DOT` |
 
  79   | `*` | `U+FF0A` | `FULLWIDTH ASTERISK` |
 
  80   | `-` | `U+FF0D` | `FULLWIDTH HYPHEN-MINUS` |
 
  81   | `.` | `U+FF0E` | `FULLWIDTH FULL STOP` |
 
  82   | `=` | `U+FF1D` | `FULLWIDTH EQUALS SIGN` |
 
  83   | `_` | `U+FF3F` | `FULLWIDTH LOW LINE` |
 
  84   | `~` | `U+FF5E` | `FULLWIDTH TILDE` |
 
  86 - If every line in the paragraph begins with at least one space, then
 
  87     it is considered to be a quoted paragraph (`<html:blockquote>`).
 
  88   There is only one level of paragraph quoting; quoted paragraphs may
 
  91 - Otherwise, the paragraph is unquoted.
 
  93 After this classification, each quoted or unquoted paragraph is further
 
  94   classified by type based on its first character (which is must be
 
  95    followed by white·space to be recognized) :—
 
  97 - If the paragraph begins with `⁌`, it is a chapter heading
 
 100 - If the paragraph begins with `§`, it is a section heading
 
 103 - If the paragraph begins with `❦`, it is a subsection heading
 
 106 - If the paragraph begins with `✠`, it is a subsubsection heading
 
 109 - If the paragraph begins with `•` or `🔢`, it is a primary unordered
 
 110     or ordered list item (`<html:li class="unordered" data-level="1">`
 
 111     or `<html:li class="ordered" data-level="1">`).
 
 113 - If the paragraph begins with `◦` or `🔠`, it is a secondary unordered
 
 114     or ordered list item (`<html:li class="unordered" data-level="2">`
 
 115     or `<html:li class="ordered" data-level="2">`).
 
 116   Secondary list items are considered to be nested inside of primary
 
 117     list items which precede them.
 
 119 - If the paragraph begins with `▪` or `🔡`, it is a tertiary unordered
 
 120     or ordered list item (`<html:li class="unordered" data-level="3">`
 
 121     or `<html:li class="ordered" data-level="3">`).
 
 122   Tertiary list items are considered to be nested inside of primary
 
 123     and secondary list items which precede them.
 
 125 - If the paragraph begins with `⁃` or `🔣`, it is a quaternary
 
 126     unordered or ordered list item
 
 127     (`<html:li class="unordered" data-level="4">` or
 
 128     `<html:li class="ordered" data-level="4">`).
 
 129   Quaternary list items are considered to be nested inside of primary,
 
 130     secondary, and tertiary list items which precede them.
 
 132 - If the paragraph begins with `※`, it is an ordinary note
 
 133     (`<html:div role="note" class="note">`).
 
 135 - If the paragraph begins with `☡`, it is a cautionary note
 
 136     (`<html:div role="note" class="caution">`).
 
 138 - If the paragraph begins with `🛈`, it is an informative note
 
 139     (`<html:div role="note" class="info">`).
 
 141 - If the paragraph begins with `⯑`, it is a questioning note
 
 142     (`<html:div role="note" class="query">`).
 
 144 - If the paragraph begins with `⚠︎`, it is a warning note
 
 145     (`<html:div role="note" class="warn">`).
 
 147 - If the paragraph begins with `⋯`, it is a continuation paragraph
 
 148     (`<html:div class="continuation">`).
 
 149   Continuation paragraphs may be used to continue a preceding list item
 
 151   Note, however, that an unquoted paragraph cannot continue a quoted
 
 154 - Otherwise, it is an ordinary paragraph.
 
 156 Following this sigil (if any, including trailing white·space) there may
 
 157   be a `¶` followed by zero or more non·white·space characters.
 
 158 The characters following the `¶` give the identifier for the paragraph,
 
 159   which is expected to be unique within a document.
 
 161 The remaining characters in a paragraph form its contents.
 
 162 Markup within paragraphs is delimited with·out exception by pairs of
 
 163   characters, with the following precedence :—
 
 165 - The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L
 
 167   The hyperlink must contain at least one `<`; the content before the
 
 168     last `<` gives the text of the link, and the content after gives
 
 169     the U·R·L that the link points to.
 
 170   If no text is given, the U·R·L will be used instead.
 
 172 - The characters `⸠` and `⸡` indicate a strikethru (`<html:s>`).
 
 174 - The characters `⸤` and `⸥` indicate underlining (`<html:u>`).
 
 176 - The characters `⟦` and `⟧` indicate an inline note
 
 177     (`<html:small role="note">`).
 
 179 - The characters `⸨` and `⸩` indicate parenthetical content
 
 182 - The characters `☞︎` and `☜︎` indicate strong importance
 
 185 - The characters `⹐` and `⹑` indicate emphasis (`<html:em>`).
 
 187 - The characters `⟪` and `⟫` indicate titles (`<html:cite>`).
 
 189 - The characters `⟨` and `⟩` indicate offset text (`<html:i>`).
 
 190   This may be followed by a `@`, a language tag, and a `$` to provide
 
 191     the language of the text.
 
 193 - The characters `⦃` and `⦄` indicate keyword highlighting
 
 196 - The characters `` ` `` and `´` indicate code (`<html:code>`).
 
 198 Once the tree is built as above, it is remediated into its final form
 
 199   by the following steps :—
 
 201 - Successive quoted paragraphs are joined into one quote.
 
 202   If the final quoted paragraph is an ordinary paragraph which begins
 
 203     with `—` and a space, the quote is wrapped in a `<html:figure>`
 
 204     and the final paragraph becomes its `<html:figcaption>`.
 
 206 - Continuation paragraphs are joined with the preceding list items or
 
 209 - List items of a higher level are nested in preceding list items, when
 
 212 - Successive list items of the same level and class are joined into
 
 215 Finally, any character can be escaped by instead providing its Unicode
 
 216   codepoint in the form `<U+NNNN>`, where `NNNN` is one or more
 
 218 Multiple codepoints may be provided separated by periods, as in
 
 223 💄📝 Les·M·L is designed for usage with [⛩📰 书社][Shushe].
 
 224 Simply include the `parser.xslt` provided by this repository to
 
 225   ⛩📰 书社 as an additional parser, and `magic` as an additional
 
 230 This repository conforms to [REUSE][].
 
 232 The parser is licensed under the terms of the <cite>Mozilla Public
 
 233   License, version 2.0</cite>.
 
 235 [REUSE]: <https://reuse.software/spec/>
 
 236 [Shushe]: <https://git.ladys.computer/Shushe/>
 
 237 [draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>