#!lesml@en$ ## @(#)šŸ’„šŸ“Ā LesĀ·MĀ·L README.lesml 2026-03-31T01:31:16Z ## SPDX-FileCopyrightText: 2024, 2025, 2026 Lady ## SPDX-License-Identifier: CC0-1.0 ⁌ šŸ’„šŸ“Ā LesĀ·MĀ·L šŸ’„šŸ“Ā LesĀ·MĀ·L is a document markup language designed with two goals in mind :⁠— ā„– It must be trivial to parse, even with limited tooling such as that provided by XĀ·SĀ·LĀ·T. ā„– It must be sophisticated enough to handle longform hypertext documents and associated metadata. It is implemented as an XĀ·SĀ·LĀ·T transformation from a `Ā“ element into HĀ·TĀ·MĀ·L (`parser.xsltĀ“). § Nomenclature ⟨LesĀ·MĀ·L⟩ is an abbreviation of the phrase ⟨Ladys Extremely Simple Markup Language⟩. § Markup syntax ā¦ Document headers The first line of any šŸ’„šŸ“Ā LesĀ·MĀ·L document should be the string `#?lesmlĀ“. A language tag may follow this, beginning with `@Ā“ and terminated with `$Ā“, like so: `#?lesml@en$Ā“. Regardless of whether a language tag is present, this initial line may be terminated by a space‐separated list of properties of the form `key=valueĀ“. Only one property is currently permitted—`profile“—whose value should be a UĀ·RĀ·I and identifies the set of conventions that the document is using. Following the opening line, document metadata may be provided in the {šŸ”—Record Jar} {@title="Data File Metaformats | The Art of Unix Programming"} format.[*fn_record-jar] The body of the document begins after the last line which begins with the string `%%Ā“, or after the opening line if none exists. *¶fn_record-jar The format differs a bit from the Record Jar format specified in the IĀ·EĀ·TĀ·F `draft-phillips-record-jar-02Ā“ draft: There are no restrictions on field names; newlines are a simple line feed; continuation lines insert a space; character escapes are not supported. These differences are negligible for most uses. Multiple documents can be catenated into a single file; a new document is begun on any line which starts with `#?lesmlĀ“ or `##Ā“. Documents in the later case inherit the latest preceding `#?lesmlĀ“ declaration. `##Ā“ may be followed by other text; this is treated as an interdocument comment. ā¦ Document bodies Document bodies are broken into blocks by blank lines. Empty blocks are ignored. NonĀ·empty blocks (which, to be clear, may still result in empty elements) are classified by the sigils which begin them. ✠ Block level A block can begin with any number of `⋮“ characters; these increase the level of the block. Blocks of higher level are nested within blocks of lower level, with the exception that plain blocks cannot be nested as the first children of other plain blocks, and no blocks are nestable within comments. ✠ Block sigils Following this, new blocks are opened for each successive sigil :⁠— • A `•“ sigil indicates an unordered list item. When it is the first sigil in the list, `◦“ may be used as a shorthand for `⋮•“, `ā–ŖĀ“ for `⋮⋮•“, and `⁃“ for `⋮⋮⋮•“. • A `℣“ sigil indicates a definition term, and a `ā„ŸĀ“ sigil indicates the corresponding value. • A `ā„–Ā“ sigil indicates an ordered list item. • A `※“ sigil indicates an ordinary note. • A `⯑“ sigil indicates a questioning note. • A `∫“ sigil indicates an abstract or summary. • A `☔“ sigil indicates a cautionary notice. • A `⚠“ sigil indicates a warning notice. • A `šŸ›ˆĀ“ sigil indicates an informative callout. • A `šŸ’”Ā“ sigil indicates a tip. • A `»“ sigil indicates a block quotation. • A `āˆŽĀ“ sigil indicates a footer or caption. A conceptual ā€œplainā€ block exists at the end of the list of explicit blocks. Whitespace characters can appear on either side of each sigil or `⋮“ character. ✠ Paragraph types Each block contains a single paragraph, which is classified as follows :⁠— • If the paragraph is a single line and consists of only the following section‐break characters, plus any amount of whiteĀ·space, then it is considered to be a section break. ā‹® The section break characters are :⁠— ā‹® • `U+002A * ASTERISKĀ“ ā‹® • `U+002D - HYPHEN-MINUSĀ“ ā‹® • `U+002E . FULL STOPĀ“ ā‹® • `U+003D = EQUALS SIGNĀ“ ā‹® • `U+005F _ LOW LINEĀ“ ā‹® • `U+007E ~ TILDEĀ“ ā‹® • `U+00A0 Ā  NO-BREAK SPACEĀ“ ā‹® • `U+00B7 Ā· MIDDLE DOTĀ“ ā‹® • `U+2024 ․ ONE DOT LEADERĀ“ ā‹® • `U+2025   TWO DOT LEADERĀ“ ā‹® • `U+2026 … HORIZONTAL ELLIPSISĀ“ ā‹® • `U+2042 ⁂ ASTERISMĀ“ ā‹® • `U+2060 ⁠ WORD JOINERĀ“ ā‹® • `U+22EF ⋯ MIDLINE HORIZONTAL ELLIPSISĀ“ ā‹® • `U+2500 ─ BOX DRAWINGS LIGHT HORIZONTALĀ“ ā‹® • `U+2501 ━ BOX DRAWINGS HEAVY HORIZONTALĀ“ ā‹® • `U+2504 ┄ BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTALĀ“ ā‹® • `U+2505 ā”… BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTALĀ“ ā‹® • `U+2508 ā”ˆ BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTALĀ“ ā‹® • `U+2509 ┉ BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTALĀ“ ā‹® • `U+254C ā•Œ BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTALĀ“ ā‹® • `U+254D ā• BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTALĀ“ ā‹® • `U+2550 ═ BOX DRAWINGS DOUBLE HORIZONTALĀ“ ā‹® • `U+2574 ā•“ BOX DRAWINGS LIGHT LEFTĀ“ ā‹® • `U+2576 ā•¶ BOX DRAWINGS LIGHT RIGHTĀ“ ā‹® • `U+2578 ╸ BOX DRAWINGS HEAVY LEFTĀ“ ā‹® • `U+257A ╺ BOX DRAWINGS HEAVY RIGHTĀ“ ā‹® • `U+2619 ā˜™ REVERSED ROTATED FLORAL HEART BULLETĀ“ ā‹® • `U+2767 ā§ ROTATED FLORAL HEART BULLETĀ“ ā‹® • `U+3000 怀 IDEOGRAPHIC SPACEĀ“ ā‹® • `U+30FB 惻 KATAKANA MIDDLE DOTĀ“ ā‹® • `U+FF0A * FULLWIDTH ASTERISKĀ“ ā‹® • `U+FF0D ļ¼ FULLWIDTH HYPHEN-MINUSĀ“ ā‹® • `U+FF0E ļ¼Ž FULLWIDTH FULL STOPĀ“ ā‹® • `U+FF1D ļ¼ FULLWIDTH EQUALS SIGNĀ“ ā‹® • `U+FF3F _ FULLWIDTH LOW LINEĀ“ ā‹® • `U+FF5E ļ½ž FULLWIDTH TILDEĀ“ • If the opening string of `⋮“ characters, sigils, and whitespace characters is followed by a `|Ā“, and this full sequence appears at the beginning of each successive line, the paragraph is preformatted. If each `|Ā“ is immediately followed by a `$Ā“, it is a code block. A syntax may be specified for the code block by inserting its name between the `|Ā“ and `$Ā“. • If the paragraph begins with `#Ā“, it is an editorial comment and should not be rendered or processed further. • If the paragraph begins with `⁌“, `§“, `ā¦Ā“, or `✠“, it is a chapter, section, subsection, or subsubsection heading, respectively. • If the paragraph begins with `^Ā“, it is a footnote. To be referenceĀ·able, the footnote must have an identifier, described below. Footnotes which are not referenced are dropped from the output. • Otherwise, the paragraph is ordinary. Finally, at the beginning of each (noncomment, nonrule) paragraph there may be a `¶“ (optionally preceded by whitespace) followed by zero or more nonwhitespace characters. The characters following the `¶“, if present, give the identifier for the paragraph, which is expected to be unique within a document. This may be suffixed with a language tag beginning with `@Ā“ and terminated with `$Ā“. The remaining characters in a paragraph form its contents. Markup within paragraphs is delimited withĀ·out exception by pairs of characters, with the following precedence :⁠— • The characters `⌦“ and `⌫“ indicate inline comments. A single character `⌧“ may be used to indicate an ā€œemptyā€ comment (consisting of `U+034F COMBINING GRAPHEME JOINERĀ“ for XĀ·MĀ·L compatibility). • The characters `{@Ā“ and `"}Ā“ indicate attribute specifications. The attribute specification must contain at least one `="Ā“ which separates the key of the attribute from the value. Attributes attach to the previous element or text node; if there is no such previous element or text node, an empty text node is used instead. Multiple attributes can be given in sequence using multiple specifications. • The characters `{šŸ”—Ā“ and `>}Ā“ indicate a hyperlink to a UĀ·RĀ·L. The hyperlink must contain at least one `<Ā“; the content before the last `<` gives the text of the link, and the content after gives the UĀ·RĀ·L that the link points to. If no text is given, the UĀ·RĀ·L will be used instead. • The characters `āø Ā“ and `⸔“ indicate a strikethru. • The characters `⸤“ and `āø„Ā“ indicate underlining. • The characters `⟦“ and `⟧“ indicate an inline note. • The characters `āøØĀ“ and `āø©Ā“ indicate parenthetical content. • The characters `{U+60}Ā“ and `{U+B4}Ā“ indicate code. • The characters `⟪“ and `⟫“ indicate titles. • The characters `āø¶Ā“ and `āø·Ā“ indicate names. • The characters `⟨“ and `⟩“ indicate offset text. • The characters `⦃“ and `⦄“ indicate keyword highlighting. • The characters `ā˜žļøŽĀ“ and `ā˜œļøŽĀ“ indicate strong importance. • The characters `⹐“ and `⹑“ indicate emphasis. • The characters `[^Ā“ and `]Ā“ indicate a footnote reference. The characters between these sigils must match the iĀ·d of some footnote which is a sibling to the current paragraph or one of its ancestors. Once the tree is built as above, it is remediated into its final form by the following steps :⁠— • Blocks of higher level are nested within preceding blocks of lower level, as described above. • Successive list items of the same type are joined into a single list. Finally, any character can be escaped by instead providing its Unicode codepoint in the form `{U+NNNN}Ā“, where `NNNNĀ“ is one or more hexadecimal digits. Multiple codepoints may be provided separated by periods, as in `{U+WWWW.ZZZZ}Ā“. Due to limitations in XĀ·SĀ·LĀ·T, characters cannot be escaped in attributes (including link targets). § Usage šŸ’„šŸ“Ā LesĀ·MĀ·L is designed for usage with {šŸ”—ā›©šŸ“°Ā ä¹¦ē¤¾}. Simply include the `xslt/lesml.xsltĀ“ provided by this repository to ā›©šŸ“°Ā ä¹¦ē¤¾ as an additional parser, and `magic/lesml.magicĀ“ as an additional magic file. For simpler usecases, the `bin/lesmlĀ“ script can be used to convert a single file (or standard input). § License This repository conforms to {šŸ”—REUSE}. The parser is licensed under the terms of the Mozilla Public License, version 2.0.