]> Lady’s Gitweb - LesML/blob - README.markdown
11c8c3723b7ca80497e6bff3ac194d8bc663e0ad
[LesML] / README.markdown
1 <!--
2 SPDX-FileCopyrightText: 2024, 2025 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
4 -->
5 # 💄📝 Les·M·L
6
7 <b>Ladys simple markup language.</b>
8
9 💄📝 Les·M·L is a document markup language designed with two goals in
10 mind :⁠—
11
12 1. It must be trivial to parse, even with limited tooling such as that
13 provided by X·S·L·T.
14
15 2. It must be sophisticated enough to handle longform hypertext
16 documents and associated metadata.
17
18 It is implemented as an X·S·L·T transformation from a
19 `<html:script type="text/lesml">` element into H·T·M·L
20 (`parser.xslt`).
21
22 ## Nomenclature
23
24 <i>Les·M·L</i> is an abbreviation of the phrase “Ladys Extremely Simple
25 Markup Language”.
26
27 ## Markup Syntax
28
29 The first line of any 💄📝 Les·M·L document should be the string
30 `#!lesml`.
31 A language tag may follow this, beginning with `@` and terminated with
32 `$`, like so:
33 `#!lesml@en$`.
34 Regardless of whether a language tag is present, the shebang line may
35 be terminated by a space‐separated list of properties of the form
36 `key=value`.
37 Only one property is currently permitted: `profile`, whose value should
38 be a U·R·I and is translated to the `@data-lesml-profile` attribute
39 on the resulting `<html:article>` element.
40
41 Following the shebang line, document metadata may be provided in the
42 [Record Jar][draft-phillips-record-jar-01] format.
43 The body of the document begins after the last line which begins with
44 the string `%%`, or after the shebang line if none exists.
45
46 Multiple documents can be catenated into a single file; a new document
47 is begun on any line which starts with `#!lesml` or `##`.
48 Documents in the later case inherit the latest preceding `#!lesml`
49 declaration.
50 `##` may be followed by other text; this is treated as an interdocument
51 comment.
52
53 Documents are broken into paragraphs by blank lines.
54 Empty paragraphs are ignored.
55 Non·empty paragraphs are classified as follows :⁠—
56
57 - If the paragraph consists of only the following section‐break
58 characters, plus any amount of white·space, then it is
59 considered to be a section break (`<html:hr>`).
60
61 The section break characters are :⁠—
62
63 | Character | Codepoint | Unicode Name |
64 | --------- | --------- | ------------ |
65 | `*` | `U+002A` | `ASTERISK` |
66 | `-` | `U+002D` | `HYPHEN-MINUS` |
67 | `.` | `U+002E` | `FULL STOP` |
68 | `=` | `U+003D` | `EQUALS SIGN` |
69 | `_` | `U+005F` | `LOW LINE` |
70 | `~` | `U+007E` | `TILDE` |
71 | `·` | `U+00B7` | `MIDDLE DOT` |
72 | `․` | `U+2024` | `ONE DOT LEADER` |
73 | `‥` | `U+2025` | `TWO DOT LEADER` |
74 | `…` | `U+2026` | `HORIZONTAL ELLIPSIS` |
75 | `⁂` | `U+2042` | `ASTERISM` |
76 | `⋯` | `U+22EF` | `MIDLINE HORIZONTAL ELLIPSIS` |
77 | `─` | `U+2500` | `BOX DRAWINGS LIGHT HORIZONTAL` |
78 | `━` | `U+2501` | `BOX DRAWINGS HEAVY HORIZONTAL` |
79 | `┄` | `U+2504` | `BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL` |
80 | `┅` | `U+2505` | `BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL` |
81 | `┈` | `U+2508` | `BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL` |
82 | `┉` | `U+2509` | `BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL` |
83 | `╌` | `U+254C` | `BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL` |
84 | `╍` | `U+254D` | `BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL` |
85 | `═` | `U+2550` | `BOX DRAWINGS DOUBLE HORIZONTAL` |
86 | `╴` | `U+2574` | `BOX DRAWINGS LIGHT LEFT` |
87 | `╶` | `U+2576` | `BOX DRAWINGS LIGHT RIGHT` |
88 | `╸` | `U+2578` | `BOX DRAWINGS HEAVY LEFT` |
89 | `╺` | `U+257A` | `BOX DRAWINGS HEAVY RIGHT` |
90 | `☙` | `U+2619` | `REVERSED ROTATED FLORAL HEART BULLET` |
91 | `❧` | `U+2767` | `ROTATED FLORAL HEART BULLET` |
92 | ` ` | `U+3000` | `IDEOGRAPHIC SPACE` |
93 | `・` | `U+30FB` | `KATAKANA MIDDLE DOT` |
94 | `*` | `U+FF0A` | `FULLWIDTH ASTERISK` |
95 | `-` | `U+FF0D` | `FULLWIDTH HYPHEN-MINUS` |
96 | `.` | `U+FF0E` | `FULLWIDTH FULL STOP` |
97 | `=` | `U+FF1D` | `FULLWIDTH EQUALS SIGN` |
98 | `_` | `U+FF3F` | `FULLWIDTH LOW LINE` |
99 | `~` | `U+FF5E` | `FULLWIDTH TILDE` |
100
101 - If every line in the paragraph begins with at least one space, then
102 it is considered to be a quoted paragraph (`<html:blockquote>`).
103 There is only one level of paragraph quoting; quoted paragraphs may
104 not be quoted again.
105
106 - If every line in the paragraph begins with zero or more white·space
107 characters followed by `|`, it is a “preformatted” paragraph and
108 white·space is not collapsed (`<html:pre>`).
109 A paragraph may be both quoted and preformatted.
110
111 - Otherwise, the paragraph is unquoted.
112
113 After this classification, each quoted or unquoted paragraph is further
114 classified by type based on its first character (which is must be
115 followed by white·space, or else the only thing on the line) :⁠—
116
117 - If the paragraph is preformatted, it is an ordinary paragraph.
118
119 - If the paragraph begins with `⁌`, it is a chapter heading
120 (`<html:h1>`).
121
122 - If the paragraph begins with `§`, it is a section heading
123 (`<html:h2>`).
124
125 - If the paragraph begins with `❦`, it is a subsection heading
126 (`<html:h3>`).
127
128 - If the paragraph begins with `✠`, it is a subsubsection heading
129 (`<html:h4>`).
130
131 - If the paragraph begins with `•` or `🔢`, it is a primary unordered
132 or ordered list item (`<html:li class="unordered" data-level="1">`
133 or `<html:li class="ordered" data-level="1">`).
134
135 - If the paragraph begins with `◦` or `🔠`, it is a secondary unordered
136 or ordered list item (`<html:li class="unordered" data-level="2">`
137 or `<html:li class="ordered" data-level="2">`).
138 Secondary list items are considered to be nested inside of primary
139 list items which precede them.
140
141 - If the paragraph begins with `▪` or `🔡`, it is a tertiary unordered
142 or ordered list item (`<html:li class="unordered" data-level="3">`
143 or `<html:li class="ordered" data-level="3">`).
144 Tertiary list items are considered to be nested inside of primary
145 and secondary list items which precede them.
146
147 - If the paragraph begins with `⁃` or `🔣`, it is a quaternary
148 unordered or ordered list item
149 (`<html:li class="unordered" data-level="4">` or
150 `<html:li class="ordered" data-level="4">`).
151 Quaternary list items are considered to be nested inside of primary,
152 secondary, and tertiary list items which precede them.
153
154 - If the paragraph begins with `※`, it is an ordinary note
155 (`<html:div role="note" class="note">`).
156
157 - If the paragraph begins with `☡`, it is a cautionary note
158 (`<html:div role="note" class="caution">`).
159
160 - If the paragraph begins with `🛈`, it is an informative note
161 (`<html:div role="note" class="info">`).
162
163 - If the paragraph begins with `⯑`, it is a questioning note
164 (`<html:div role="note" class="query">`).
165
166 - If the paragraph begins with `⚠︎`, it is a warning note
167 (`<html:div role="note" class="warn">`).
168
169 - If the paragraph begins with `#`, it is a comment.
170 Comments produce X·M·L comment nodes and can be used to break up list
171 items into separate lists.
172
173 - If the paragraph begins with `⋯`, it is a continuation paragraph
174 (`<html:div class="continuation">`).
175 Continuation paragraphs may be used to continue a preceding list item
176 or quote.
177 Note, however, that an unquoted paragraph cannot continue a quoted
178 one, or vice·versa.
179
180 - Otherwise, it is an ordinary paragraph.
181
182 Following this sigil (if any, including trailing white·space) there may
183 be a `¶` followed by zero or more non·white·space characters.
184 The characters following the `¶` give the identifier for the paragraph,
185 which is expected to be unique within a document.
186
187 The remaining characters in a paragraph form its contents.
188 Markup within paragraphs is delimited with·out exception by pairs of
189 characters, with the following precedence :⁠—
190
191 - The characters `⌦` and `⌫` indicate inline comments.
192 A single character `⌧` may be used to indicate an “empty” comment
193 (consisting of `U+034F COMBINING GRAPHEME JOINER` for X·M·L
194 compatibility).
195
196 - The characters `{@` and `"}` indicate attribute specifications.
197 The attribute specification must contain at least one `="` which
198 separates the key of the attribute from the value.
199 Attributes attach to the previous element or text node, with
200 white·space‐only text nodes after elements ignored; if there is no
201 such previous element or text node, an empty text node is used
202 instead.
203 Multiple attributes can be given in sequence.
204 Text nodes with attributes are wrapped in `<html:span>`.
205
206 - The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L
207 (`<html:a>`).
208 The hyperlink must contain at least one `<`; the content before the
209 last `<` gives the text of the link, and the content after gives
210 the U·R·L that the link points to.
211 If no text is given, the U·R·L will be used instead.
212
213 - The characters `⸠` and `⸡` indicate a strikethru (`<html:s>`).
214
215 - The characters `⸤` and `⸥` indicate underlining (`<html:u>`).
216
217 - The characters `⟦` and `⟧` indicate an inline note
218 (`<html:small role="note">`).
219
220 - The characters `⸨` and `⸩` indicate parenthetical content
221 (`<html:small>`).
222
223 - The characters `` ` `` and `´` indicate code (`<html:code>`).
224
225 - The characters `⟪` and `⟫` indicate titles (`<html:cite>`).
226
227 - The characters `⸶` and `⸷` indicate names (`<html:u class="name">`).
228
229 - The characters `⟨` and `⟩` indicate offset text (`<html:i>`).
230 This may be followed by a `@`, a language tag, and a `$` to provide
231 the language of the text.
232
233 - The characters `⦃` and `⦄` indicate keyword highlighting
234 (`<html:b>`).
235
236 - The characters `☞︎` and `☜︎` indicate strong importance
237 (`<html:strong>`).
238
239 - The characters `⹐` and `⹑` indicate emphasis (`<html:em>`).
240
241 Once the tree is built as above, it is remediated into its final form
242 by the following steps :⁠—
243
244 - Successive quoted paragraphs are joined into one quote.
245 If the final quoted paragraph is an ordinary paragraph which begins
246 with `—` and a space, the quote is wrapped in a `<html:figure>`
247 and the final paragraph becomes its `<html:figcaption>`.
248
249 - Continuation paragraphs are joined with the preceding list items or
250 quotes.
251
252 - List items of a higher level are nested in preceding list items, when
253 present.
254
255 - Successive list items of the same level and class are joined into
256 a single list.
257
258 - Linebreaks in preformatted paragraphs are replaced with `<html:br>`.
259
260 Finally, any character can be escaped by instead providing its Unicode
261 codepoint in the form `{U+NNNN}`, where `NNNN` is one or more
262 hexadecimal digits.
263 Multiple codepoints may be provided separated by periods, as in
264 `{U+WWWW.ZZZZ}`.
265 Due to limitations in X·S·L·T, characters cannot be escaped in
266 attributes (including link targets).
267
268 ## Usage
269
270 💄📝 Les·M·L is designed for usage with [⛩📰 书社][Shushe].
271 Simply include the `parser.xslt` provided by this repository to
272 ⛩📰 书社 as an additional parser, and `magic` as an additional
273 magic file.
274
275 ## License
276
277 This repository conforms to [REUSE][].
278
279 The parser is licensed under the terms of the <cite>Mozilla Public
280 License, version 2.0</cite>.
281
282 [REUSE]: <https://reuse.software/spec/>
283 [Shushe]: <https://git.ladys.computer/Shushe/>
284 [draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>
This page took 0.186667 seconds and 3 git commands to generate.