2 ## @(#)💄📝 Les·M·L README.lesml 2026-03-31T01:31:16Z
3 ## SPDX-FileCopyrightText: 2024, 2025, 2026 Lady <https://www.ladys.computer/about/#lady>
4 ## SPDX-License-Identifier: CC0-1.0
8 💄📝 Les·M·L is a document markup language designed with two goals in
11 № It must be trivial to parse, even with limited tooling such as that
14 № It must be sophisticated enough to handle longform hypertext
15 documents and associated metadata.
17 It is implemented as an X·S·L·T transformation from a
18 `<html:script type="text/lesml">´ element into H·T·M·L
23 ⟨Les·M·L⟩ is an abbreviation of the phrase ⟨Ladys Extremely Simple
30 The first line of any 💄📝 Les·M·L document should be the string
32 A language tag may follow this, beginning with `@´ and terminated with
33 `$´, like so: `#?lesml@en$´.
34 Regardless of whether a language tag is present, this initial line may
35 be terminated by a space‐separated list of properties of the form
37 Only one property is currently permitted—`profile´—whose value should
38 be a U·R·I and identifies the set of conventions that the document is
41 Following the opening line, document metadata may be provided in the
43 Jar<http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>}
44 {@title="Data File Metaformats | The Art of Unix Programming"}
45 format.[*fn_record-jar]
46 The body of the document begins after the last line which begins with
47 the string `%%´, or after the opening line if none exists.
50 The format differs a bit from the Record Jar format specified in the
51 I·E·T·F `draft-phillips-record-jar-02´ draft:
52 There are no restrictions on field names; newlines are a simple line
53 feed; continuation lines insert a space; character escapes are not
55 These differences are negligible for most uses.
57 Multiple documents can be catenated into a single file; a new document
58 is begun on any line which starts with `#?lesml´ or `##´.
59 Documents in the later case inherit the latest preceding `#?lesml´
61 `##´ may be followed by other text; this is treated as an interdocument
66 Document bodies are broken into blocks by blank lines.
67 Empty blocks are ignored.
69 Non·empty blocks (which, to be clear, may still result in empty
70 elements) are classified by the sigils which begin them.
74 A block can begin with any number of `⋮´ characters; these
75 increase the level of the block.
76 Blocks of higher level are nested within blocks of lower level, with
77 the exception that plain blocks cannot be nested as the first
78 children of other plain blocks, and no blocks are nestable within
83 Following this, new blocks are opened for each successive sigil :—
85 • A `•´ sigil indicates an unordered list item.
86 When it is the first sigil in the list, `◦´ may be used as a
87 shorthand for `⋮•´, `▪´ for `⋮⋮•´, and `⁃´ for `⋮⋮⋮•´.
89 • A `℣´ sigil indicates a definition term, and a `℟´ sigil indicates
90 the corresponding value.
92 • A `№´ sigil indicates an ordered list item.
94 • A `※´ sigil indicates an ordinary note.
96 • A `⯑´ sigil indicates a questioning note.
98 • A `∫´ sigil indicates an abstract or summary.
100 • A `☡´ sigil indicates a cautionary notice.
102 • A `⚠´ sigil indicates a warning notice.
104 • A `🛈´ sigil indicates an informative callout.
106 • A `💡´ sigil indicates a tip.
108 • A `»´ sigil indicates a block quotation.
110 • A `∎´ sigil indicates a footer or caption.
112 A conceptual “plain” block exists at the end of the list of explicit
115 Whitespace characters can appear on either side of each sigil or `⋮´
120 Each block contains a single paragraph, which is classified as
123 • If the paragraph is a single line and consists of only the following
124 section‐break characters, plus any amount of white·space, then it is
125 considered to be a section break.
127 ⋮ The section break characters are :—
129 ⋮ • `U+002A * ASTERISK´
131 ⋮ • `U+002D - HYPHEN-MINUS´
133 ⋮ • `U+002E . FULL STOP´
135 ⋮ • `U+003D = EQUALS SIGN´
137 ⋮ • `U+005F _ LOW LINE´
141 ⋮ • `U+00A0 NO-BREAK SPACE´
143 ⋮ • `U+00B7 · MIDDLE DOT´
145 ⋮ • `U+2024 ․ ONE DOT LEADER´
147 ⋮ • `U+2025 ‥ TWO DOT LEADER´
149 ⋮ • `U+2026 … HORIZONTAL ELLIPSIS´
151 ⋮ • `U+2042 ⁂ ASTERISM´
153 ⋮ • `U+2060 WORD JOINER´
155 ⋮ • `U+22EF ⋯ MIDLINE HORIZONTAL ELLIPSIS´
157 ⋮ • `U+2500 ─ BOX DRAWINGS LIGHT HORIZONTAL´
159 ⋮ • `U+2501 ━ BOX DRAWINGS HEAVY HORIZONTAL´
161 ⋮ • `U+2504 ┄ BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL´
163 ⋮ • `U+2505 ┅ BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL´
165 ⋮ • `U+2508 ┈ BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL´
167 ⋮ • `U+2509 ┉ BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL´
169 ⋮ • `U+254C ╌ BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL´
171 ⋮ • `U+254D ╍ BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL´
173 ⋮ • `U+2550 ═ BOX DRAWINGS DOUBLE HORIZONTAL´
175 ⋮ • `U+2574 ╴ BOX DRAWINGS LIGHT LEFT´
177 ⋮ • `U+2576 ╶ BOX DRAWINGS LIGHT RIGHT´
179 ⋮ • `U+2578 ╸ BOX DRAWINGS HEAVY LEFT´
181 ⋮ • `U+257A ╺ BOX DRAWINGS HEAVY RIGHT´
183 ⋮ • `U+2619 ☙ REVERSED ROTATED FLORAL HEART BULLET´
185 ⋮ • `U+2767 ❧ ROTATED FLORAL HEART BULLET´
187 ⋮ • `U+3000 IDEOGRAPHIC SPACE´
189 ⋮ • `U+30FB ・ KATAKANA MIDDLE DOT´
191 ⋮ • `U+FF0A * FULLWIDTH ASTERISK´
193 ⋮ • `U+FF0D - FULLWIDTH HYPHEN-MINUS´
195 ⋮ • `U+FF0E . FULLWIDTH FULL STOP´
197 ⋮ • `U+FF1D = FULLWIDTH EQUALS SIGN´
199 ⋮ • `U+FF3F _ FULLWIDTH LOW LINE´
201 ⋮ • `U+FF5E ~ FULLWIDTH TILDE´
203 • If the opening string of `⋮´ characters, sigils, and whitespace
204 characters is followed by a `|´, and this full sequence appears at
205 the beginning of each successive line, the paragraph is preformatted.
206 If each `|´ is immediately followed by a `$´, it is a code block.
207 A syntax may be specified for the code block by inserting its name
208 between the `|´ and `$´.
210 • If the paragraph begins with `#´, it is an editorial comment and
211 should not be rendered or processed further.
213 • If the paragraph begins with `⁌´, `§´, `❦´, or `✠´, it is a
214 chapter, section, subsection, or subsubsection heading, respectively.
216 • If the paragraph begins with `^´, it is a footnote.
217 To be reference·able, the footnote must have an identifier, described
219 Footnotes which are not referenced are dropped from the output.
221 • Otherwise, the paragraph is ordinary.
223 Finally, at the beginning of each (noncomment, nonrule) paragraph there
224 may be a `¶´ (optionally preceded by whitespace) followed by zero or
225 more nonwhitespace characters.
226 The characters following the `¶´, if present, give the identifier for
227 the paragraph, which is expected to be unique within a document.
228 This may be suffixed with a language tag beginning with `@´ and
231 The remaining characters in a paragraph form its contents.
232 Markup within paragraphs is delimited with·out exception by pairs of
233 characters, with the following precedence :—
235 • The characters `⌦´ and `⌫´ indicate inline comments.
236 A single character `⌧´ may be used to indicate an “empty” comment
237 (consisting of `U+034F COMBINING GRAPHEME JOINER´ for X·M·L
240 • The characters `{@´ and `"}´ indicate attribute specifications.
241 The attribute specification must contain at least one `="´ which
242 separates the key of the attribute from the value.
243 Attributes attach to the previous element or text node; if there is no
244 such previous element or text node, an empty text node is used
246 Multiple attributes can be given in sequence using multiple
249 • The characters `{🔗´ and `>}´ indicate a hyperlink to a U·R·L.
250 The hyperlink must contain at least one `<´; the content before the
251 last `<` gives the text of the link, and the content after gives the
252 U·R·L that the link points to.
253 If no text is given, the U·R·L will be used instead.
255 • The characters `⸠´ and `⸡´ indicate a strikethru.
257 • The characters `⸤´ and `⸥´ indicate underlining.
259 • The characters `⟦´ and `⟧´ indicate an inline note.
261 • The characters `⸨´ and `⸩´ indicate parenthetical content.
263 • The characters `{U+60}´ and `{U+B4}´ indicate code.
265 • The characters `⟪´ and `⟫´ indicate titles.
267 • The characters `⸶´ and `⸷´ indicate names.
269 • The characters `⟨´ and `⟩´ indicate offset text.
271 • The characters `⦃´ and `⦄´ indicate keyword highlighting.
273 • The characters `☞︎´ and `☜︎´ indicate strong importance.
275 • The characters `⹐´ and `⹑´ indicate emphasis.
277 • The characters `[^´ and `]´ indicate a footnote reference.
278 The characters between these sigils must match the i·d of some
279 footnote which is a sibling to the current paragraph or one of its
282 Once the tree is built as above, it is remediated into its final form
283 by the following steps :—
285 • Blocks of higher level are nested within preceding blocks of lower
286 level, as described above.
288 • Successive list items of the same type are joined into a single list.
290 Finally, any character can be escaped by instead providing its Unicode
291 codepoint in the form `{U+NNNN}´, where `NNNN´ is one or more
293 Multiple codepoints may be provided separated by periods, as in
295 Due to limitations in X·S·L·T, characters cannot be escaped in
296 attributes (including link targets).
300 💄📝 Les·M·L is designed for usage with
301 {🔗⛩📰 书社<https://git.ladys.computer/Shushe/>}.
302 Simply include the `xslt/lesml.xslt´ provided by this repository to
303 ⛩📰 书社 as an additional parser, and `magic/lesml.magic´ as an
304 additional magic file.
306 For simpler usecases, the `bin/lesml´ script can be used to convert a
307 single file (or standard input).
311 This repository conforms to {🔗REUSE<https://reuse.software/spec/>}.
313 The parser is licensed under the terms of the Mozilla Public
314 License, version 2.0.