]> Lady’s Gitweb - LesML/blob - README.lesml
Support definition lists (finally!)
[LesML] / README.lesml
1 #!lesml@en$
2 ## @(#)💄📝 Les·M·L README.lesml 2026-03-31T01:31:16Z
3 ## SPDX-FileCopyrightText: 2024, 2025, 2026 Lady <https://www.ladys.computer/about/#lady>
4 ## SPDX-License-Identifier: CC0-1.0
5
6 ⁌ 💄📝 Les·M·L
7
8 💄📝 Les·M·L is a document markup language designed with two goals in
9 mind :⁠—
10
11 № It must be trivial to parse, even with limited tooling such as that
12 provided by X·S·L·T.
13
14 № It must be sophisticated enough to handle longform hypertext
15 documents and associated metadata.
16
17 It is implemented as an X·S·L·T transformation from a
18 `<html:script type="text/lesml">´ element into H·T·M·L
19 (`parser.xslt´).
20
21 § Nomenclature
22
23 ⟨Les·M·L⟩ is an abbreviation of the phrase ⟨Ladys Extremely Simple
24 Markup Language⟩.
25
26 § Markup syntax
27
28 ❦ Document headers
29
30 The first line of any 💄📝 Les·M·L document should be the string
31 `#?lesml´.
32 A language tag may follow this, beginning with `@´ and terminated with
33 `$´, like so: `#?lesml@en$´.
34 Regardless of whether a language tag is present, this initial line may
35 be terminated by a space‐separated list of properties of the form
36 `key=value´.
37 Only one property is currently permitted—`profile´—whose value should
38 be a U·R·I and identifies the set of conventions that the document is
39 using.
40
41 Following the opening line, document metadata may be provided in the
42 {🔗Record
43 Jar<http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>}
44 {@title="Data File Metaformats | The Art of Unix Programming"}
45 format.[*fn_record-jar]
46 The body of the document begins after the last line which begins with
47 the string `%%´, or after the opening line if none exists.
48
49 *¶fn_record-jar
50 The format differs a bit from the Record Jar format specified in the
51 I·E·T·F `draft-phillips-record-jar-02´ draft:
52 There are no restrictions on field names; newlines are a simple line
53 feed; continuation lines insert a space; character escapes are not
54 supported.
55 These differences are negligible for most uses.
56
57 Multiple documents can be catenated into a single file; a new document
58 is begun on any line which starts with `#?lesml´ or `##´.
59 Documents in the later case inherit the latest preceding `#?lesml´
60 declaration.
61 `##´ may be followed by other text; this is treated as an interdocument
62 comment.
63
64 ❦ Document bodies
65
66 Document bodies are broken into blocks by blank lines.
67 Empty blocks are ignored.
68
69 Non·empty blocks (which, to be clear, may still result in empty
70 elements) are classified by the sigils which begin them.
71
72 ✠ Block level
73
74 A block can begin with any number of `⋮´ characters; these
75 increase the level of the block.
76 Blocks of higher level are nested within blocks of lower level, with
77 the exception that plain blocks cannot be nested as the first
78 children of other plain blocks, and no blocks are nestable within
79 comments.
80
81 ✠ Block sigils
82
83 Following this, new blocks are opened for each successive sigil :⁠—
84
85 • A `•´ sigil indicates an unordered list item.
86 When it is the first sigil in the list, `◦´ may be used as a
87 shorthand for `⋮•´, `▪´ for `⋮⋮•´, and `⁃´ for `⋮⋮⋮•´.
88
89 • A `℣´ sigil indicates a definition term, and a `℟´ sigil indicates
90 the corresponding value.
91
92 • A `№´ sigil indicates an ordered list item.
93
94 • A `※´ sigil indicates an ordinary note.
95
96 • A `⯑´ sigil indicates a questioning note.
97
98 • A `∫´ sigil indicates an abstract or summary.
99
100 • A `☡´ sigil indicates a cautionary notice.
101
102 • A `⚠´ sigil indicates a warning notice.
103
104 • A `🛈´ sigil indicates an informative callout.
105
106 • A `💡´ sigil indicates a tip.
107
108 • A `»´ sigil indicates a block quotation.
109
110 • A `∎´ sigil indicates a footer or caption.
111
112 A conceptual “plain” block exists at the end of the list of explicit
113 blocks.
114
115 Whitespace characters can appear on either side of each sigil or `⋮´
116 character.
117
118 ✠ Paragraph types
119
120 Each block contains a single paragraph, which is classified as
121 follows :⁠—
122
123 • If the paragraph is a single line and consists of only the following
124 section‐break characters, plus any amount of white·space, then it is
125 considered to be a section break.
126
127 ⋮ The section break characters are :⁠—
128
129 ⋮ • `U+002A * ASTERISK´
130
131 ⋮ • `U+002D - HYPHEN-MINUS´
132
133 ⋮ • `U+002E . FULL STOP´
134
135 ⋮ • `U+003D = EQUALS SIGN´
136
137 ⋮ • `U+005F _ LOW LINE´
138
139 ⋮ • `U+007E ~ TILDE´
140
141 ⋮ • `U+00A0   NO-BREAK SPACE´
142
143 ⋮ • `U+00B7 · MIDDLE DOT´
144
145 ⋮ • `U+2024 ․ ONE DOT LEADER´
146
147 ⋮ • `U+2025 ‥ TWO DOT LEADER´
148
149 ⋮ • `U+2026 … HORIZONTAL ELLIPSIS´
150
151 ⋮ • `U+2042 ⁂ ASTERISM´
152
153 ⋮ • `U+2060 ⁠ WORD JOINER´
154
155 ⋮ • `U+22EF ⋯ MIDLINE HORIZONTAL ELLIPSIS´
156
157 ⋮ • `U+2500 ─ BOX DRAWINGS LIGHT HORIZONTAL´
158
159 ⋮ • `U+2501 ━ BOX DRAWINGS HEAVY HORIZONTAL´
160
161 ⋮ • `U+2504 ┄ BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL´
162
163 ⋮ • `U+2505 ┅ BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL´
164
165 ⋮ • `U+2508 ┈ BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL´
166
167 ⋮ • `U+2509 ┉ BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL´
168
169 ⋮ • `U+254C ╌ BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL´
170
171 ⋮ • `U+254D ╍ BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL´
172
173 ⋮ • `U+2550 ═ BOX DRAWINGS DOUBLE HORIZONTAL´
174
175 ⋮ • `U+2574 ╴ BOX DRAWINGS LIGHT LEFT´
176
177 ⋮ • `U+2576 ╶ BOX DRAWINGS LIGHT RIGHT´
178
179 ⋮ • `U+2578 ╸ BOX DRAWINGS HEAVY LEFT´
180
181 ⋮ • `U+257A ╺ BOX DRAWINGS HEAVY RIGHT´
182
183 ⋮ • `U+2619 ☙ REVERSED ROTATED FLORAL HEART BULLET´
184
185 ⋮ • `U+2767 ❧ ROTATED FLORAL HEART BULLET´
186
187 ⋮ • `U+3000   IDEOGRAPHIC SPACE´
188
189 ⋮ • `U+30FB ・ KATAKANA MIDDLE DOT´
190
191 ⋮ • `U+FF0A * FULLWIDTH ASTERISK´
192
193 ⋮ • `U+FF0D - FULLWIDTH HYPHEN-MINUS´
194
195 ⋮ • `U+FF0E . FULLWIDTH FULL STOP´
196
197 ⋮ • `U+FF1D = FULLWIDTH EQUALS SIGN´
198
199 ⋮ • `U+FF3F _ FULLWIDTH LOW LINE´
200
201 ⋮ • `U+FF5E ~ FULLWIDTH TILDE´
202
203 • If the opening string of `⋮´ characters, sigils, and whitespace
204 characters is followed by a `|´, and this full sequence appears at
205 the beginning of each successive line, the paragraph is preformatted.
206 If each `|´ is immediately followed by a `$´, it is a code block.
207 A syntax may be specified for the code block by inserting its name
208 between the `|´ and `$´.
209
210 • If the paragraph begins with `#´, it is an editorial comment and
211 should not be rendered or processed further.
212
213 • If the paragraph begins with `⁌´, `§´, `❦´, or `✠´, it is a
214 chapter, section, subsection, or subsubsection heading, respectively.
215
216 • If the paragraph begins with `^´, it is a footnote.
217 To be reference·able, the footnote must have an identifier, described
218 below.
219 Footnotes which are not referenced are dropped from the output.
220
221 • Otherwise, the paragraph is ordinary.
222
223 Finally, at the beginning of each (noncomment, nonrule) paragraph there
224 may be a `¶´ (optionally preceded by whitespace) followed by zero or
225 more nonwhitespace characters.
226 The characters following the `¶´, if present, give the identifier for
227 the paragraph, which is expected to be unique within a document.
228 This may be suffixed with a language tag beginning with `@´ and
229 terminated with `$´.
230
231 The remaining characters in a paragraph form its contents.
232 Markup within paragraphs is delimited with·out exception by pairs of
233 characters, with the following precedence :⁠—
234
235 • The characters `⌦´ and `⌫´ indicate inline comments.
236 A single character `⌧´ may be used to indicate an “empty” comment
237 (consisting of `U+034F COMBINING GRAPHEME JOINER´ for X·M·L
238 compatibility).
239
240 • The characters `{@´ and `"}´ indicate attribute specifications.
241 The attribute specification must contain at least one `="´ which
242 separates the key of the attribute from the value.
243 Attributes attach to the previous element or text node; if there is no
244 such previous element or text node, an empty text node is used
245 instead.
246 Multiple attributes can be given in sequence using multiple
247 specifications.
248
249 • The characters `{🔗´ and `>}´ indicate a hyperlink to a U·R·L.
250 The hyperlink must contain at least one `<´; the content before the
251 last `<` gives the text of the link, and the content after gives the
252 U·R·L that the link points to.
253 If no text is given, the U·R·L will be used instead.
254
255 • The characters `⸠´ and `⸡´ indicate a strikethru.
256
257 • The characters `⸤´ and `⸥´ indicate underlining.
258
259 • The characters `⟦´ and `⟧´ indicate an inline note.
260
261 • The characters `⸨´ and `⸩´ indicate parenthetical content.
262
263 • The characters `{U+60}´ and `{U+B4}´ indicate code.
264
265 • The characters `⟪´ and `⟫´ indicate titles.
266
267 • The characters `⸶´ and `⸷´ indicate names.
268
269 • The characters `⟨´ and `⟩´ indicate offset text.
270
271 • The characters `⦃´ and `⦄´ indicate keyword highlighting.
272
273 • The characters `☞︎´ and `☜︎´ indicate strong importance.
274
275 • The characters `⹐´ and `⹑´ indicate emphasis.
276
277 • The characters `[^´ and `]´ indicate a footnote reference.
278 The characters between these sigils must match the i·d of some
279 footnote which is a sibling to the current paragraph or one of its
280 ancestors.
281
282 Once the tree is built as above, it is remediated into its final form
283 by the following steps :⁠—
284
285 • Blocks of higher level are nested within preceding blocks of lower
286 level, as described above.
287
288 • Successive list items of the same type are joined into a single list.
289
290 Finally, any character can be escaped by instead providing its Unicode
291 codepoint in the form `{U+NNNN}´, where `NNNN´ is one or more
292 hexadecimal digits.
293 Multiple codepoints may be provided separated by periods, as in
294 `{U+WWWW.ZZZZ}´.
295 Due to limitations in X·S·L·T, characters cannot be escaped in
296 attributes (including link targets).
297
298 § Usage
299
300 💄📝 Les·M·L is designed for usage with
301 {🔗⛩📰 书社<https://git.ladys.computer/Shushe/>}.
302 Simply include the `xslt/lesml.xslt´ provided by this repository to
303 ⛩📰 书社 as an additional parser, and `magic/lesml.magic´ as an
304 additional magic file.
305
306 For simpler usecases, the `bin/lesml´ script can be used to convert a
307 single file (or standard input).
308
309 § License
310
311 This repository conforms to {🔗REUSE<https://reuse.software/spec/>}.
312
313 The parser is licensed under the terms of the Mozilla Public
314 License, version 2.0.
This page took 0.385413 seconds and 5 git commands to generate.