2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
7 <b>Ladys simple markup language.</b>
9 💄📝 Les·M·L is a document markup language designed with two goals in
12 1. It must be trivial to parse, even with limited tooling such as that
15 2. It must be sophisticated enough to handle longform hypertext
16 documents and associated metadata.
18 It is implemented as an X·S·L·T transformation from a
19 `<html:script type="text/lesml">` element into H·T·M·L
24 <i>Les·M·L</i> is an abbreviation of the phrase “Ladys Extremely Simple
29 The first line of any 💄📝 Les·M·L document should be the string
31 A language tag may follow this, beginning with `@` and terminated with
34 Regardless of whether a language tag is present, the shebang line may
35 be terminated by a space‐separated list of properties of the form
37 Only one property is currently permitted: `profile`, whose value should
38 be a U·R·I and is translated to the `@data-lesml-profile` attribute
39 on the resulting `<html:article>` element.
41 Following the shebang line, document metadata may be provided in the
42 [Record Jar][draft-phillips-record-jar-01] format.
43 The body of the document begins after the last line which begins with
44 the string `%%`, or after the shebang line if none exists.
46 Multiple documents can be catenated into a single file; a new document
47 is begun on any line which starts with `#!lesml` or `##`.
48 Documents in the later case inherit the latest preceding `#!lesml`
50 `##` may be followed by other text; this is treated as an interdocument
53 Documents are broken into paragraphs by blank lines.
54 Empty paragraphs are ignored.
55 Non·empty paragraphs are classified as follows :—
57 - If the paragraph consists of only the following section‐break
58 characters, plus any amount of white·space, then it is
59 considered to be a section break (`<html:hr>`).
61 The section break characters are :—
63 | Character | Codepoint | Unicode Name |
64 | --------- | --------- | ------------ |
65 | `*` | `U+002A` | `ASTERISK` |
66 | `-` | `U+002D` | `HYPHEN-MINUS` |
67 | `.` | `U+002E` | `FULL STOP` |
68 | `=` | `U+003D` | `EQUALS SIGN` |
69 | `_` | `U+005F` | `LOW LINE` |
70 | `~` | `U+007E` | `TILDE` |
71 | `·` | `U+00B7` | `MIDDLE DOT` |
72 | `․` | `U+2024` | `ONE DOT LEADER` |
73 | `‥` | `U+2025` | `TWO DOT LEADER` |
74 | `…` | `U+2026` | `HORIZONTAL ELLIPSIS` |
75 | `⁂` | `U+2042` | `ASTERISM` |
76 | `⋯` | `U+22EF` | `MIDLINE HORIZONTAL ELLIPSIS` |
77 | `─` | `U+2500` | `BOX DRAWINGS LIGHT HORIZONTAL` |
78 | `━` | `U+2501` | `BOX DRAWINGS HEAVY HORIZONTAL` |
79 | `┄` | `U+2504` | `BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL` |
80 | `┅` | `U+2505` | `BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL` |
81 | `┈` | `U+2508` | `BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL` |
82 | `┉` | `U+2509` | `BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL` |
83 | `╌` | `U+254C` | `BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL` |
84 | `╍` | `U+254D` | `BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL` |
85 | `═` | `U+2550` | `BOX DRAWINGS DOUBLE HORIZONTAL` |
86 | `╴` | `U+2574` | `BOX DRAWINGS LIGHT LEFT` |
87 | `╶` | `U+2576` | `BOX DRAWINGS LIGHT RIGHT` |
88 | `╸` | `U+2578` | `BOX DRAWINGS HEAVY LEFT` |
89 | `╺` | `U+257A` | `BOX DRAWINGS HEAVY RIGHT` |
90 | `☙` | `U+2619` | `REVERSED ROTATED FLORAL HEART BULLET` |
91 | `❧` | `U+2767` | `ROTATED FLORAL HEART BULLET` |
92 | ` ` | `U+3000` | `IDEOGRAPHIC SPACE` |
93 | `・` | `U+30FB` | `KATAKANA MIDDLE DOT` |
94 | `*` | `U+FF0A` | `FULLWIDTH ASTERISK` |
95 | `-` | `U+FF0D` | `FULLWIDTH HYPHEN-MINUS` |
96 | `.` | `U+FF0E` | `FULLWIDTH FULL STOP` |
97 | `=` | `U+FF1D` | `FULLWIDTH EQUALS SIGN` |
98 | `_` | `U+FF3F` | `FULLWIDTH LOW LINE` |
99 | `~` | `U+FF5E` | `FULLWIDTH TILDE` |
101 - If every line in the paragraph begins with at least one space, then
102 it is considered to be a quoted paragraph (`<html:blockquote>`).
103 There is only one level of paragraph quoting; quoted paragraphs may
106 - Otherwise, the paragraph is unquoted.
108 After this classification, each quoted or unquoted paragraph is further
109 classified by type based on its first character (which is must be
110 followed by white·space to be recognized) :—
112 - If the paragraph begins with `⁌`, it is a chapter heading
115 - If the paragraph begins with `§`, it is a section heading
118 - If the paragraph begins with `❦`, it is a subsection heading
121 - If the paragraph begins with `✠`, it is a subsubsection heading
124 - If the paragraph begins with `•` or `🔢`, it is a primary unordered
125 or ordered list item (`<html:li class="unordered" data-level="1">`
126 or `<html:li class="ordered" data-level="1">`).
128 - If the paragraph begins with `◦` or `🔠`, it is a secondary unordered
129 or ordered list item (`<html:li class="unordered" data-level="2">`
130 or `<html:li class="ordered" data-level="2">`).
131 Secondary list items are considered to be nested inside of primary
132 list items which precede them.
134 - If the paragraph begins with `▪` or `🔡`, it is a tertiary unordered
135 or ordered list item (`<html:li class="unordered" data-level="3">`
136 or `<html:li class="ordered" data-level="3">`).
137 Tertiary list items are considered to be nested inside of primary
138 and secondary list items which precede them.
140 - If the paragraph begins with `⁃` or `🔣`, it is a quaternary
141 unordered or ordered list item
142 (`<html:li class="unordered" data-level="4">` or
143 `<html:li class="ordered" data-level="4">`).
144 Quaternary list items are considered to be nested inside of primary,
145 secondary, and tertiary list items which precede them.
147 - If the paragraph begins with `※`, it is an ordinary note
148 (`<html:div role="note" class="note">`).
150 - If the paragraph begins with `☡`, it is a cautionary note
151 (`<html:div role="note" class="caution">`).
153 - If the paragraph begins with `🛈`, it is an informative note
154 (`<html:div role="note" class="info">`).
156 - If the paragraph begins with `⯑`, it is a questioning note
157 (`<html:div role="note" class="query">`).
159 - If the paragraph begins with `⚠︎`, it is a warning note
160 (`<html:div role="note" class="warn">`).
162 - If the paragraph begins with `#`, it is a comment.
163 Comments produce X·M·L comment nodes and can be used to break up list
164 items into separate lists.
166 - If the paragraph begins with `⋯`, it is a continuation paragraph
167 (`<html:div class="continuation">`).
168 Continuation paragraphs may be used to continue a preceding list item
170 Note, however, that an unquoted paragraph cannot continue a quoted
173 - Otherwise, it is an ordinary paragraph.
175 Following this sigil (if any, including trailing white·space) there may
176 be a `¶` followed by zero or more non·white·space characters.
177 The characters following the `¶` give the identifier for the paragraph,
178 which is expected to be unique within a document.
180 The remaining characters in a paragraph form its contents.
181 Markup within paragraphs is delimited with·out exception by pairs of
182 characters, with the following precedence :—
184 - The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L
186 The hyperlink must contain at least one `<`; the content before the
187 last `<` gives the text of the link, and the content after gives
188 the U·R·L that the link points to.
189 If no text is given, the U·R·L will be used instead.
191 - The characters `⸠` and `⸡` indicate a strikethru (`<html:s>`).
193 - The characters `⸤` and `⸥` indicate underlining (`<html:u>`).
195 - The characters `⟦` and `⟧` indicate an inline note
196 (`<html:small role="note">`).
198 - The characters `⸨` and `⸩` indicate parenthetical content
201 - The characters `☞︎` and `☜︎` indicate strong importance
204 - The characters `⹐` and `⹑` indicate emphasis (`<html:em>`).
206 - The characters `⟪` and `⟫` indicate titles (`<html:cite>`).
208 - The characters `⟨` and `⟩` indicate offset text (`<html:i>`).
209 This may be followed by a `@`, a language tag, and a `$` to provide
210 the language of the text.
212 - The characters `⦃` and `⦄` indicate keyword highlighting
215 - The characters `` ` `` and `´` indicate code (`<html:code>`).
217 Once the tree is built as above, it is remediated into its final form
218 by the following steps :—
220 - Successive quoted paragraphs are joined into one quote.
221 If the final quoted paragraph is an ordinary paragraph which begins
222 with `—` and a space, the quote is wrapped in a `<html:figure>`
223 and the final paragraph becomes its `<html:figcaption>`.
225 - Continuation paragraphs are joined with the preceding list items or
228 - List items of a higher level are nested in preceding list items, when
231 - Successive list items of the same level and class are joined into
234 Finally, any character can be escaped by instead providing its Unicode
235 codepoint in the form `<U+NNNN>`, where `NNNN` is one or more
237 Multiple codepoints may be provided separated by periods, as in
242 💄📝 Les·M·L is designed for usage with [⛩📰 书社][Shushe].
243 Simply include the `parser.xslt` provided by this repository to
244 ⛩📰 书社 as an additional parser, and `magic` as an additional
249 This repository conforms to [REUSE][].
251 The parser is licensed under the terms of the <cite>Mozilla Public
252 License, version 2.0</cite>.
254 [REUSE]: <https://reuse.software/spec/>
255 [Shushe]: <https://git.ladys.computer/Shushe/>
256 [draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>