]> Lady’s Gitweb - LesML/blob - README.lesml
New block behaviour and repository layout
[LesML] / README.lesml
1 #!lesml@en$
2 ## @(#)💄📝 Les·M·L README.lesml 2026-03-31T01:28:11Z
3 ## SPDX-FileCopyrightText: 2024, 2025, 2026 Lady <https://www.ladys.computer/about/#lady>
4 ## SPDX-License-Identifier: CC0-1.0
5
6 ⁌ 💄📝 Les·M·L
7
8 💄📝 Les·M·L is a document markup language designed with two goals in
9 mind :⁠—
10
11 № It must be trivial to parse, even with limited tooling such as that
12 provided by X·S·L·T.
13
14 № It must be sophisticated enough to handle longform hypertext
15 documents and associated metadata.
16
17 It is implemented as an X·S·L·T transformation from a
18 `<html:script type="text/lesml">´ element into H·T·M·L
19 (`parser.xslt´).
20
21 § Nomenclature
22
23 ⟨Les·M·L⟩ is an abbreviation of the phrase ⟨Ladys Extremely Simple
24 Markup Language⟩.
25
26 § Markup syntax
27
28 ❦ Document headers
29
30 The first line of any 💄📝 Les·M·L document should be the string
31 `#?lesml´.
32 A language tag may follow this, beginning with `@´ and terminated with
33 `$´, like so: `#?lesml@en$´.
34 Regardless of whether a language tag is present, this initial line may
35 be terminated by a space‐separated list of properties of the form
36 `key=value´.
37 Only one property is currently permitted—`profile´—whose value should
38 be a U·R·I and identifies the set of conventions that the document is
39 using.
40
41 Following the opening line, document metadata may be provided in the
42 {🔗Record
43 Jar<http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>}
44 {@title="Data File Metaformats | The Art of Unix Programming"}
45 format.[*fn_record-jar]
46 The body of the document begins after the last line which begins with
47 the string `%%´, or after the opening line if none exists.
48
49 *¶fn_record-jar
50 The format differs a bit from the Record Jar format specified in the
51 I·E·T·F `draft-phillips-record-jar-02´ draft:
52 There are no restrictions on field names; newlines are a simple line
53 feed; continuation lines insert a space; character escapes are not
54 supported.
55 These differences are negligible for most uses.
56
57 Multiple documents can be catenated into a single file; a new document
58 is begun on any line which starts with `#?lesml´ or `##´.
59 Documents in the later case inherit the latest preceding `#?lesml´
60 declaration.
61 `##´ may be followed by other text; this is treated as an interdocument
62 comment.
63
64 ❦ Document bodies
65
66 Document bodies are broken into blocks by blank lines.
67 Empty blocks are ignored.
68
69 Non·empty blocks (which, to be clear, may still result in empty
70 elements) are classified by the sigils which begin them.
71
72 ✠ Block level
73
74 A block can begin with any number of `⋮´ characters; these
75 increase the level of the block.
76 Blocks of higher level are nested within blocks of lower level, with
77 the exception that plain blocks cannot be nested as the first
78 children of other plain blocks, and no blocks are nestable within
79 comments.
80
81 ✠ Block sigils
82
83 Following this, new blocks are opened for each successive sigil :⁠—
84
85 • A `•´ sigil indicates an unordered list item.
86 When it is the first sigil in the list, `◦´ may be used as a
87 shorthand for `⋮•´, `▪´ for `⋮⋮•´, and `⁃´ for `⋮⋮⋮•´.
88
89 • A `№´ sigil indicates an ordered list item.
90
91 • A `※´ sigil indicates an ordinary note.
92
93 • A `⯑´ sigil indicates a questioning note.
94
95 • A `∫´ sigil indicates an abstract or summary.
96
97 • A `☡´ sigil indicates a cautionary notice.
98
99 • A `⚠´ sigil indicates a warning notice.
100
101 • A `🛈´ sigil indicates an informative callout.
102
103 • A `💡´ sigil indicates a tip.
104
105 • A `»´ sigil indicates a block quotation.
106
107 • A `∎´ sigil indicates a footer or caption.
108
109 A conceptual “plain” block exists at the end of the list of explicit
110 blocks.
111
112 Whitespace characters can appear on either side of each sigil or `⋮´
113 character.
114
115 ✠ Paragraph types
116
117 Each block contains a single paragraph, which is classified as
118 follows :⁠—
119
120 • If the paragraph is a single line and consists of only the following
121 section‐break characters, plus any amount of white·space, then it is
122 considered to be a section break.
123
124 ⋮ The section break characters are :⁠—
125
126 ⋮ • `U+002A * ASTERISK´
127
128 ⋮ • `U+002D - HYPHEN-MINUS´
129
130 ⋮ • `U+002E . FULL STOP´
131
132 ⋮ • `U+003D = EQUALS SIGN´
133
134 ⋮ • `U+005F _ LOW LINE´
135
136 ⋮ • `U+007E ~ TILDE´
137
138 ⋮ • `U+00A0   NO-BREAK SPACE´
139
140 ⋮ • `U+00B7 · MIDDLE DOT´
141
142 ⋮ • `U+2024 ․ ONE DOT LEADER´
143
144 ⋮ • `U+2025 ‥ TWO DOT LEADER´
145
146 ⋮ • `U+2026 … HORIZONTAL ELLIPSIS´
147
148 ⋮ • `U+2042 ⁂ ASTERISM´
149
150 ⋮ • `U+2060 ⁠ WORD JOINER´
151
152 ⋮ • `U+22EF ⋯ MIDLINE HORIZONTAL ELLIPSIS´
153
154 ⋮ • `U+2500 ─ BOX DRAWINGS LIGHT HORIZONTAL´
155
156 ⋮ • `U+2501 ━ BOX DRAWINGS HEAVY HORIZONTAL´
157
158 ⋮ • `U+2504 ┄ BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL´
159
160 ⋮ • `U+2505 ┅ BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL´
161
162 ⋮ • `U+2508 ┈ BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL´
163
164 ⋮ • `U+2509 ┉ BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL´
165
166 ⋮ • `U+254C ╌ BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL´
167
168 ⋮ • `U+254D ╍ BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL´
169
170 ⋮ • `U+2550 ═ BOX DRAWINGS DOUBLE HORIZONTAL´
171
172 ⋮ • `U+2574 ╴ BOX DRAWINGS LIGHT LEFT´
173
174 ⋮ • `U+2576 ╶ BOX DRAWINGS LIGHT RIGHT´
175
176 ⋮ • `U+2578 ╸ BOX DRAWINGS HEAVY LEFT´
177
178 ⋮ • `U+257A ╺ BOX DRAWINGS HEAVY RIGHT´
179
180 ⋮ • `U+2619 ☙ REVERSED ROTATED FLORAL HEART BULLET´
181
182 ⋮ • `U+2767 ❧ ROTATED FLORAL HEART BULLET´
183
184 ⋮ • `U+3000   IDEOGRAPHIC SPACE´
185
186 ⋮ • `U+30FB ・ KATAKANA MIDDLE DOT´
187
188 ⋮ • `U+FF0A * FULLWIDTH ASTERISK´
189
190 ⋮ • `U+FF0D - FULLWIDTH HYPHEN-MINUS´
191
192 ⋮ • `U+FF0E . FULLWIDTH FULL STOP´
193
194 ⋮ • `U+FF1D = FULLWIDTH EQUALS SIGN´
195
196 ⋮ • `U+FF3F _ FULLWIDTH LOW LINE´
197
198 ⋮ • `U+FF5E ~ FULLWIDTH TILDE´
199
200 • If the opening string of `⋮´ characters, sigils, and whitespace
201 characters is followed by a `|´, and this full sequence appears at
202 the beginning of each successive line, the paragraph is preformatted.
203 If each `|´ is immediately followed by a `$´, it is a code block.
204 A syntax may be specified for the code block by inserting its name
205 between the `|´ and `$´.
206
207 • If the paragraph begins with `#´, it is an editorial comment and
208 should not be rendered or processed further.
209
210 • If the paragraph begins with `⁌´, `§´, `❦´, or `✠´, it is a
211 chapter, section, subsection, or subsubsection heading, respectively.
212
213 • If the paragraph begins with `^´, it is a footnote.
214 To be reference·able, the footnote must have an identifier, described
215 below.
216 Footnotes which are not referenced are dropped from the output.
217
218 • Otherwise, the paragraph is ordinary.
219
220 Finally, at the beginning of each (noncomment, nonrule) paragraph there
221 may be a `¶´ (optionally preceded by whitespace) followed by zero or
222 more nonwhitespace characters.
223 The characters following the `¶´, if present, give the identifier for
224 the paragraph, which is expected to be unique within a document.
225 This may be suffixed with a language tag beginning with `@´ and
226 terminated with `$´.
227
228 The remaining characters in a paragraph form its contents.
229 Markup within paragraphs is delimited with·out exception by pairs of
230 characters, with the following precedence :⁠—
231
232 • The characters `⌦´ and `⌫´ indicate inline comments.
233 A single character `⌧´ may be used to indicate an “empty” comment
234 (consisting of `U+034F COMBINING GRAPHEME JOINER´ for X·M·L
235 compatibility).
236
237 • The characters `{@´ and `"}´ indicate attribute specifications.
238 The attribute specification must contain at least one `="´ which
239 separates the key of the attribute from the value.
240 Attributes attach to the previous element or text node; if there is no
241 such previous element or text node, an empty text node is used
242 instead.
243 Multiple attributes can be given in sequence using multiple
244 specifications.
245
246 • The characters `{🔗´ and `>}´ indicate a hyperlink to a U·R·L.
247 The hyperlink must contain at least one `<´; the content before the
248 last `<` gives the text of the link, and the content after gives the
249 U·R·L that the link points to.
250 If no text is given, the U·R·L will be used instead.
251
252 • The characters `⸠´ and `⸡´ indicate a strikethru.
253
254 • The characters `⸤´ and `⸥´ indicate underlining.
255
256 • The characters `⟦´ and `⟧´ indicate an inline note.
257
258 • The characters `⸨´ and `⸩´ indicate parenthetical content.
259
260 • The characters `{U+60}´ and `{U+B4}´ indicate code.
261
262 • The characters `⟪´ and `⟫´ indicate titles.
263
264 • The characters `⸶´ and `⸷´ indicate names.
265
266 • The characters `⟨´ and `⟩´ indicate offset text.
267
268 • The characters `⦃´ and `⦄´ indicate keyword highlighting.
269
270 • The characters `☞︎´ and `☜︎´ indicate strong importance.
271
272 • The characters `⹐´ and `⹑´ indicate emphasis.
273
274 • The characters `[^´ and `]´ indicate a footnote reference.
275 The characters between these sigils must match the i·d of some
276 footnote which is a sibling to the current paragraph or one of its
277 ancestors.
278
279 Once the tree is built as above, it is remediated into its final form
280 by the following steps :⁠—
281
282 • Blocks of higher level are nested within preceding blocks of lower
283 level, as described above.
284
285 • Successive list items of the same type are joined into a single list.
286
287 Finally, any character can be escaped by instead providing its Unicode
288 codepoint in the form `{U+NNNN}´, where `NNNN´ is one or more
289 hexadecimal digits.
290 Multiple codepoints may be provided separated by periods, as in
291 `{U+WWWW.ZZZZ}´.
292 Due to limitations in X·S·L·T, characters cannot be escaped in
293 attributes (including link targets).
294
295 § Usage
296
297 💄📝 Les·M·L is designed for usage with
298 {🔗⛩📰 书社<https://git.ladys.computer/Shushe/>}.
299 Simply include the `xslt/lesml.xslt´ provided by this repository to
300 ⛩📰 书社 as an additional parser, and `magic/lesml.magic´ as an
301 additional magic file.
302
303 For simpler usecases, the `bin/lesml´ script can be used to convert a
304 single file (or standard input).
305
306 § License
307
308 This repository conforms to {🔗REUSE<https://reuse.software/spec/>}.
309
310 The parser is licensed under the terms of the Mozilla Public
311 License, version 2.0.
This page took 0.460491 seconds and 5 git commands to generate.