]> Lady’s Gitweb - LesML/blob - README.markdown
Add support for footnotes
[LesML] / README.markdown
1 <!--
2 SPDX-FileCopyrightText: 2024, 2025 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
4 -->
5 # 💄📝 Les·M·L
6
7 <b>Ladys simple markup language.</b>
8
9 💄📝 Les·M·L is a document markup language designed with two goals in
10 mind :⁠—
11
12 1. It must be trivial to parse, even with limited tooling such as that
13 provided by X·S·L·T.
14
15 2. It must be sophisticated enough to handle longform hypertext
16 documents and associated metadata.
17
18 It is implemented as an X·S·L·T transformation from a
19 `<html:script type="text/lesml">` element into H·T·M·L
20 (`parser.xslt`).
21
22 ## Nomenclature
23
24 <i>Les·M·L</i> is an abbreviation of the phrase “Ladys Extremely Simple
25 Markup Language”.
26
27 ## Markup Syntax
28
29 The first line of any 💄📝 Les·M·L document should be the string
30 `#!lesml`.
31 A language tag may follow this, beginning with `@` and terminated with
32 `$`, like so:
33 `#!lesml@en$`.
34 Regardless of whether a language tag is present, the shebang line may
35 be terminated by a space‐separated list of properties of the form
36 `key=value`.
37 Only one property is currently permitted: `profile`, whose value should
38 be a U·R·I and is translated to the `@data-lesml-profile` attribute
39 on the resulting `<html:article>` element.
40
41 Following the shebang line, document metadata may be provided in the
42 [Record Jar][draft-phillips-record-jar-01] format.
43 The body of the document begins after the last line which begins with
44 the string `%%`, or after the shebang line if none exists.
45
46 Multiple documents can be catenated into a single file; a new document
47 is begun on any line which starts with `#!lesml` or `##`.
48 Documents in the later case inherit the latest preceding `#!lesml`
49 declaration.
50 `##` may be followed by other text; this is treated as an interdocument
51 comment.
52
53 Documents are broken into paragraphs by blank lines.
54 Empty paragraphs are ignored.
55
56 If every line in the paragraph begins with (optional white·space
57 followed by) `»` it is quoted (`<html:blockquote>`); if every line
58 begins with `]` it is bracketed.
59 The lines, minus this leading, are then re‐analysed.
60 Bracketed paragraphs which end quotes are treated as captions
61 (`<html:figcaption>`); otherwise, they are footers (`<html:footer>`).
62
63 Non·empty paragraphs are classified as follows :⁠—
64
65 - If the paragraph consists of only the following section‐break
66 characters, plus any amount of white·space, then it is
67 considered to be a section break (`<html:hr>`).
68
69 The section break characters are :⁠—
70
71 | Character | Codepoint | Unicode Name |
72 | --------- | --------- | ------------ |
73 | `*` | `U+002A` | `ASTERISK` |
74 | `-` | `U+002D` | `HYPHEN-MINUS` |
75 | `.` | `U+002E` | `FULL STOP` |
76 | `=` | `U+003D` | `EQUALS SIGN` |
77 | `_` | `U+005F` | `LOW LINE` |
78 | `~` | `U+007E` | `TILDE` |
79 | `·` | `U+00B7` | `MIDDLE DOT` |
80 | `․` | `U+2024` | `ONE DOT LEADER` |
81 | `‥` | `U+2025` | `TWO DOT LEADER` |
82 | `…` | `U+2026` | `HORIZONTAL ELLIPSIS` |
83 | `⁂` | `U+2042` | `ASTERISM` |
84 | `⋯` | `U+22EF` | `MIDLINE HORIZONTAL ELLIPSIS` |
85 | `─` | `U+2500` | `BOX DRAWINGS LIGHT HORIZONTAL` |
86 | `━` | `U+2501` | `BOX DRAWINGS HEAVY HORIZONTAL` |
87 | `┄` | `U+2504` | `BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL` |
88 | `┅` | `U+2505` | `BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL` |
89 | `┈` | `U+2508` | `BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL` |
90 | `┉` | `U+2509` | `BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL` |
91 | `╌` | `U+254C` | `BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL` |
92 | `╍` | `U+254D` | `BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL` |
93 | `═` | `U+2550` | `BOX DRAWINGS DOUBLE HORIZONTAL` |
94 | `╴` | `U+2574` | `BOX DRAWINGS LIGHT LEFT` |
95 | `╶` | `U+2576` | `BOX DRAWINGS LIGHT RIGHT` |
96 | `╸` | `U+2578` | `BOX DRAWINGS HEAVY LEFT` |
97 | `╺` | `U+257A` | `BOX DRAWINGS HEAVY RIGHT` |
98 | `☙` | `U+2619` | `REVERSED ROTATED FLORAL HEART BULLET` |
99 | `❧` | `U+2767` | `ROTATED FLORAL HEART BULLET` |
100 | ` ` | `U+3000` | `IDEOGRAPHIC SPACE` |
101 | `・` | `U+30FB` | `KATAKANA MIDDLE DOT` |
102 | `*` | `U+FF0A` | `FULLWIDTH ASTERISK` |
103 | `-` | `U+FF0D` | `FULLWIDTH HYPHEN-MINUS` |
104 | `.` | `U+FF0E` | `FULLWIDTH FULL STOP` |
105 | `=` | `U+FF1D` | `FULLWIDTH EQUALS SIGN` |
106 | `_` | `U+FF3F` | `FULLWIDTH LOW LINE` |
107 | `~` | `U+FF5E` | `FULLWIDTH TILDE` |
108
109 - If every line in the paragraph begins with zero or more white·space
110 characters followed by `|`, it is a “preformatted” paragraph and
111 white·space is not collapsed (`<html:pre>`).
112
113 - Otherwise, the paragraph is ordinary.
114
115 After this classification, each ordinary paragraph is further
116 classified by type based on its first character (which is must be
117 followed by white·space, a pilcrow, or else the only thing on the
118 line) :⁠—
119
120 - If the paragraph is preformatted, it is an ordinary paragraph.
121
122 - If the paragraph begins with `⁌`, it is a chapter heading
123 (`<html:h1>`).
124
125 - If the paragraph begins with `§`, it is a section heading
126 (`<html:h2>`).
127
128 - If the paragraph begins with `❦`, it is a subsection heading
129 (`<html:h3>`).
130
131 - If the paragraph begins with `✠`, it is a subsubsection heading
132 (`<html:h4>`).
133
134 - If the paragraph begins with `•` or `🔢`, it is a primary unordered
135 or ordered list item (`<html:li class="unordered" aria-level="1">`
136 or `<html:li class="ordered" aria-level="1">`).
137
138 - If the paragraph begins with `◦` or `🔠`, it is a secondary unordered
139 or ordered list item (`<html:li class="unordered" aria-level="2">`
140 or `<html:li class="ordered" aria-level="2">`).
141 Secondary list items are considered to be nested inside of primary
142 list items which precede them.
143
144 - If the paragraph begins with `▪` or `🔡`, it is a tertiary unordered
145 or ordered list item (`<html:li class="unordered" aria-level="3">`
146 or `<html:li class="ordered" aria-level="3">`).
147 Tertiary list items are considered to be nested inside of primary
148 and secondary list items which precede them.
149
150 - If the paragraph begins with `⁃` or `🔣`, it is a quaternary
151 unordered or ordered list item
152 (`<html:li class="unordered" aria-level="4">` or
153 `<html:li class="ordered" aria-level="4">`).
154 Quaternary list items are considered to be nested inside of primary,
155 secondary, and tertiary list items which precede them.
156
157 - If the paragraph begins with `※`, it is an ordinary note
158 (`<html:div role="note" class="note">`).
159
160 - If the paragraph begins with `☡`, it is a cautionary note
161 (`<html:div role="note" class="caution">`).
162
163 - If the paragraph begins with `🛈`, it is an informative note
164 (`<html:div role="note" class="info">`).
165
166 - If the paragraph begins with `⯑`, it is a questioning note
167 (`<html:div role="note" class="query">`).
168
169 - If the paragraph begins with `⚠︎`, it is a warning note
170 (`<html:div role="note" class="warn">`).
171
172 - If the paragraph begins with `^`, it is a footnote
173 (`<html:li class="ordered footnote" aria-level="1">`).
174 Footnotes are ignored unless their first paragraph has an i·d
175 (specified with `¶`) which is referenced by one or more footnote
176 references.
177 Footnotes are treated as level 1 ordered list items, so they can
178 contain nested lists.
179
180 Footnotes are removed from the normal document flow and placed in a
181 footer (`<html:section role="doc-endnotes">`) in order of first
182 reference.
183 It is recommended that the i·d¦s you choose are kept stable, so that
184 links to footnotes do not break.
185
186 - If the paragraph begins with `#`, it is a comment.
187 Comments produce X·M·L comment nodes and can be used to break up list
188 items into separate lists.
189
190 - If the paragraph begins with `⋯`, it is a continuation paragraph.
191 Continuation paragraphs may be used to continue a preceding note,
192 footnote, or list item.
193 If there is no such preceding note, footnote, or list item, they will
194 attach to adjacent heading elements to form heading groups
195 (`<html:hgroup>`).
196 Otherwise, they will be treated as ordinary paragraphs.
197
198 - Otherwise, it is an ordinary paragraph.
199
200 Following this sigil (if any) there may be a `¶` followed by zero or
201 more non·white·space characters.
202 The characters following the `¶` give the identifier for the paragraph,
203 which is expected to be unique within a document.
204 This may be suffixed with a language tag beginning with `@` and
205 terminated with `$`.
206
207 The remaining characters in a paragraph form its contents.
208 Markup within paragraphs is delimited with·out exception by pairs of
209 characters, with the following precedence :⁠—
210
211 - The characters `⌦` and `⌫` indicate inline comments.
212 A single character `⌧` may be used to indicate an “empty” comment
213 (consisting of `U+034F COMBINING GRAPHEME JOINER` for X·M·L
214 compatibility).
215
216 - The characters `{@` and `"}` indicate attribute specifications.
217 The attribute specification must contain at least one `="` which
218 separates the key of the attribute from the value.
219 Attributes attach to the previous element or text node, with
220 white·space‐only text nodes after elements ignored; if there is no
221 such previous element or text node, an empty text node is used
222 instead.
223 Multiple attributes can be given in sequence using multiple
224 specifications.
225 Text nodes with attributes are wrapped in `<html:span>`.
226
227 - The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L
228 (`<html:a>`).
229 The hyperlink must contain at least one `<`; the content before the
230 last `<` gives the text of the link, and the content after gives
231 the U·R·L that the link points to.
232 If no text is given, the U·R·L will be used instead.
233
234 - The characters `⸠` and `⸡` indicate a strikethru (`<html:s>`).
235
236 - The characters `⸤` and `⸥` indicate underlining (`<html:u>`).
237
238 - The characters `⟦` and `⟧` indicate an inline note
239 (`<html:small role="note">`).
240
241 - The characters `⸨` and `⸩` indicate parenthetical content
242 (`<html:small>`).
243
244 - The characters `` ` `` and `´` indicate code (`<html:code>`).
245
246 - The characters `⟪` and `⟫` indicate titles (`<html:cite>`).
247
248 - The characters `⸶` and `⸷` indicate names (`<html:u class="name">`).
249
250 - The characters `⟨` and `⟩` indicate offset text (`<html:i>`).
251
252 - The characters `⦃` and `⦄` indicate keyword highlighting
253 (`<html:b>`).
254
255 - The characters `☞︎` and `☜︎` indicate strong importance
256 (`<html:strong>`).
257
258 - The characters `⹐` and `⹑` indicate emphasis (`<html:em>`).
259
260 - The characters `^` and `.` indicate a footnote reference
261 (`<html:a role="doc-noteref">`).
262 The characters between these sigils must match the i·d of the first
263 paragraph of some footnote in the same document.
264
265 Once the tree is built as above, it is remediated into its final form
266 by the following steps :⁠—
267
268 - Continuation paragraphs are joined with the preceding list items or
269 divs.
270
271 - List items of a higher level are nested in preceding list items, when
272 present.
273
274 - Successive list items of the same level and class are joined into
275 a single list.
276
277 - Linebreaks in preformatted paragraphs are replaced with `<html:br>`.
278
279 Finally, any character can be escaped by instead providing its Unicode
280 codepoint in the form `{U+NNNN}`, where `NNNN` is one or more
281 hexadecimal digits.
282 Multiple codepoints may be provided separated by periods, as in
283 `{U+WWWW.ZZZZ}`.
284 Due to limitations in X·S·L·T, characters cannot be escaped in
285 attributes (including link targets).
286
287 ## Usage
288
289 💄📝 Les·M·L is designed for usage with [⛩📰 书社][Shushe].
290 Simply include the `parser.xslt` provided by this repository to
291 ⛩📰 书社 as an additional parser, and `magic` as an additional
292 magic file.
293
294 ## License
295
296 This repository conforms to [REUSE][].
297
298 The parser is licensed under the terms of the <cite>Mozilla Public
299 License, version 2.0</cite>.
300
301 [REUSE]: <https://reuse.software/spec/>
302 [Shushe]: <https://git.ladys.computer/Shushe/>
303 [draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>
This page took 0.0578 seconds and 5 git commands to generate.