2 ## @(#)💄📝 Les·M·L README.lesml 2026-03-31T01:28:11Z
3 ## SPDX-FileCopyrightText: 2024, 2025, 2026 Lady <https://www.ladys.computer/about/#lady>
4 ## SPDX-License-Identifier: CC0-1.0
8 💄📝 Les·M·L is a document markup language designed with two goals in
11 № It must be trivial to parse, even with limited tooling such as that
14 № It must be sophisticated enough to handle longform hypertext
15 documents and associated metadata.
17 It is implemented as an X·S·L·T transformation from a
18 `<html:script type="text/lesml">´ element into H·T·M·L
23 ⟨Les·M·L⟩ is an abbreviation of the phrase ⟨Ladys Extremely Simple
30 The first line of any 💄📝 Les·M·L document should be the string
32 A language tag may follow this, beginning with `@´ and terminated with
33 `$´, like so: `#?lesml@en$´.
34 Regardless of whether a language tag is present, this initial line may
35 be terminated by a space‐separated list of properties of the form
37 Only one property is currently permitted—`profile´—whose value should
38 be a U·R·I and identifies the set of conventions that the document is
41 Following the opening line, document metadata may be provided in the
43 Jar<http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>}
44 {@title="Data File Metaformats | The Art of Unix Programming"}
45 format.[*fn_record-jar]
46 The body of the document begins after the last line which begins with
47 the string `%%´, or after the opening line if none exists.
50 The format differs a bit from the Record Jar format specified in the
51 I·E·T·F `draft-phillips-record-jar-02´ draft:
52 There are no restrictions on field names; newlines are a simple line
53 feed; continuation lines insert a space; character escapes are not
55 These differences are negligible for most uses.
57 Multiple documents can be catenated into a single file; a new document
58 is begun on any line which starts with `#?lesml´ or `##´.
59 Documents in the later case inherit the latest preceding `#?lesml´
61 `##´ may be followed by other text; this is treated as an interdocument
66 Document bodies are broken into blocks by blank lines.
67 Empty blocks are ignored.
69 Non·empty blocks (which, to be clear, may still result in empty
70 elements) are classified by the sigils which begin them.
74 A block can begin with any number of `⋮´ characters; these
75 increase the level of the block.
76 Blocks of higher level are nested within blocks of lower level, with
77 the exception that plain blocks cannot be nested as the first
78 children of other plain blocks, and no blocks are nestable within
83 Following this, new blocks are opened for each successive sigil :—
85 • A `•´ sigil indicates an unordered list item.
86 When it is the first sigil in the list, `◦´ may be used as a
87 shorthand for `⋮•´, `▪´ for `⋮⋮•´, and `⁃´ for `⋮⋮⋮•´.
89 • A `№´ sigil indicates an ordered list item.
91 • A `※´ sigil indicates an ordinary note.
93 • A `⯑´ sigil indicates a questioning note.
95 • A `∫´ sigil indicates an abstract or summary.
97 • A `☡´ sigil indicates a cautionary notice.
99 • A `⚠´ sigil indicates a warning notice.
101 • A `🛈´ sigil indicates an informative callout.
103 • A `💡´ sigil indicates a tip.
105 • A `»´ sigil indicates a block quotation.
107 • A `∎´ sigil indicates a footer or caption.
109 A conceptual “plain” block exists at the end of the list of explicit
112 Whitespace characters can appear on either side of each sigil or `⋮´
117 Each block contains a single paragraph, which is classified as
120 • If the paragraph is a single line and consists of only the following
121 section‐break characters, plus any amount of white·space, then it is
122 considered to be a section break.
124 ⋮ The section break characters are :—
126 ⋮ • `U+002A * ASTERISK´
128 ⋮ • `U+002D - HYPHEN-MINUS´
130 ⋮ • `U+002E . FULL STOP´
132 ⋮ • `U+003D = EQUALS SIGN´
134 ⋮ • `U+005F _ LOW LINE´
138 ⋮ • `U+00A0 NO-BREAK SPACE´
140 ⋮ • `U+00B7 · MIDDLE DOT´
142 ⋮ • `U+2024 ․ ONE DOT LEADER´
144 ⋮ • `U+2025 ‥ TWO DOT LEADER´
146 ⋮ • `U+2026 … HORIZONTAL ELLIPSIS´
148 ⋮ • `U+2042 ⁂ ASTERISM´
150 ⋮ • `U+2060 WORD JOINER´
152 ⋮ • `U+22EF ⋯ MIDLINE HORIZONTAL ELLIPSIS´
154 ⋮ • `U+2500 ─ BOX DRAWINGS LIGHT HORIZONTAL´
156 ⋮ • `U+2501 ━ BOX DRAWINGS HEAVY HORIZONTAL´
158 ⋮ • `U+2504 ┄ BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL´
160 ⋮ • `U+2505 ┅ BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL´
162 ⋮ • `U+2508 ┈ BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL´
164 ⋮ • `U+2509 ┉ BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL´
166 ⋮ • `U+254C ╌ BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL´
168 ⋮ • `U+254D ╍ BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL´
170 ⋮ • `U+2550 ═ BOX DRAWINGS DOUBLE HORIZONTAL´
172 ⋮ • `U+2574 ╴ BOX DRAWINGS LIGHT LEFT´
174 ⋮ • `U+2576 ╶ BOX DRAWINGS LIGHT RIGHT´
176 ⋮ • `U+2578 ╸ BOX DRAWINGS HEAVY LEFT´
178 ⋮ • `U+257A ╺ BOX DRAWINGS HEAVY RIGHT´
180 ⋮ • `U+2619 ☙ REVERSED ROTATED FLORAL HEART BULLET´
182 ⋮ • `U+2767 ❧ ROTATED FLORAL HEART BULLET´
184 ⋮ • `U+3000 IDEOGRAPHIC SPACE´
186 ⋮ • `U+30FB ・ KATAKANA MIDDLE DOT´
188 ⋮ • `U+FF0A * FULLWIDTH ASTERISK´
190 ⋮ • `U+FF0D - FULLWIDTH HYPHEN-MINUS´
192 ⋮ • `U+FF0E . FULLWIDTH FULL STOP´
194 ⋮ • `U+FF1D = FULLWIDTH EQUALS SIGN´
196 ⋮ • `U+FF3F _ FULLWIDTH LOW LINE´
198 ⋮ • `U+FF5E ~ FULLWIDTH TILDE´
200 • If the opening string of `⋮´ characters, sigils, and whitespace
201 characters is followed by a `|´, and this full sequence appears at
202 the beginning of each successive line, the paragraph is preformatted.
203 If each `|´ is immediately followed by a `$´, it is a code block.
204 A syntax may be specified for the code block by inserting its name
205 between the `|´ and `$´.
207 • If the paragraph begins with `#´, it is an editorial comment and
208 should not be rendered or processed further.
210 • If the paragraph begins with `⁌´, `§´, `❦´, or `✠´, it is a
211 chapter, section, subsection, or subsubsection heading, respectively.
213 • If the paragraph begins with `^´, it is a footnote.
214 To be reference·able, the footnote must have an identifier, described
216 Footnotes which are not referenced are dropped from the output.
218 • Otherwise, the paragraph is ordinary.
220 Finally, at the beginning of each (noncomment, nonrule) paragraph there
221 may be a `¶´ (optionally preceded by whitespace) followed by zero or
222 more nonwhitespace characters.
223 The characters following the `¶´, if present, give the identifier for
224 the paragraph, which is expected to be unique within a document.
225 This may be suffixed with a language tag beginning with `@´ and
228 The remaining characters in a paragraph form its contents.
229 Markup within paragraphs is delimited with·out exception by pairs of
230 characters, with the following precedence :—
232 • The characters `⌦´ and `⌫´ indicate inline comments.
233 A single character `⌧´ may be used to indicate an “empty” comment
234 (consisting of `U+034F COMBINING GRAPHEME JOINER´ for X·M·L
237 • The characters `{@´ and `"}´ indicate attribute specifications.
238 The attribute specification must contain at least one `="´ which
239 separates the key of the attribute from the value.
240 Attributes attach to the previous element or text node; if there is no
241 such previous element or text node, an empty text node is used
243 Multiple attributes can be given in sequence using multiple
246 • The characters `{🔗´ and `>}´ indicate a hyperlink to a U·R·L.
247 The hyperlink must contain at least one `<´; the content before the
248 last `<` gives the text of the link, and the content after gives the
249 U·R·L that the link points to.
250 If no text is given, the U·R·L will be used instead.
252 • The characters `⸠´ and `⸡´ indicate a strikethru.
254 • The characters `⸤´ and `⸥´ indicate underlining.
256 • The characters `⟦´ and `⟧´ indicate an inline note.
258 • The characters `⸨´ and `⸩´ indicate parenthetical content.
260 • The characters `{U+60}´ and `{U+B4}´ indicate code.
262 • The characters `⟪´ and `⟫´ indicate titles.
264 • The characters `⸶´ and `⸷´ indicate names.
266 • The characters `⟨´ and `⟩´ indicate offset text.
268 • The characters `⦃´ and `⦄´ indicate keyword highlighting.
270 • The characters `☞︎´ and `☜︎´ indicate strong importance.
272 • The characters `⹐´ and `⹑´ indicate emphasis.
274 • The characters `[^´ and `]´ indicate a footnote reference.
275 The characters between these sigils must match the i·d of some
276 footnote which is a sibling to the current paragraph or one of its
279 Once the tree is built as above, it is remediated into its final form
280 by the following steps :—
282 • Blocks of higher level are nested within preceding blocks of lower
283 level, as described above.
285 • Successive list items of the same type are joined into a single list.
287 Finally, any character can be escaped by instead providing its Unicode
288 codepoint in the form `{U+NNNN}´, where `NNNN´ is one or more
290 Multiple codepoints may be provided separated by periods, as in
292 Due to limitations in X·S·L·T, characters cannot be escaped in
293 attributes (including link targets).
297 💄📝 Les·M·L is designed for usage with
298 {🔗⛩📰 书社<https://git.ladys.computer/Shushe/>}.
299 Simply include the `xslt/lesml.xslt´ provided by this repository to
300 ⛩📰 书社 as an additional parser, and `magic/lesml.magic´ as an
301 additional magic file.
303 For simpler usecases, the `bin/lesml´ script can be used to convert a
304 single file (or standard input).
308 This repository conforms to {🔗REUSE<https://reuse.software/spec/>}.
310 The parser is licensed under the terms of the Mozilla Public
311 License, version 2.0.