#!lesml@en$
## @(#)💄📝 Les·M·L	README.lesml	2026-03-31T01:31:16Z
## SPDX-FileCopyrightText: 2024, 2025, 2026 Lady <https://www.ladys.computer/about/#lady>
## SPDX-License-Identifier: CC0-1.0

⁌ 💄📝 Les·M·L

💄📝 Les·M·L is a document markup language designed with two goals in
  mind :⁠—

№ It must be trivial to parse, even with limited tooling such as that
  provided by X·S·L·T.

№ It must be sophisticated enough to handle longform hypertext
  documents and associated metadata.

It is implemented as an X·S·L·T transformation from a
  `<html:script type="text/lesml">´ element into H·T·M·L
  (`parser.xslt´).

§ Nomenclature

⟨Les·M·L⟩ is an abbreviation of the phrase ⟨Ladys Extremely Simple
  Markup Language⟩.

§ Markup syntax

❦ Document headers

The first line of any 💄📝 Les·M·L document should be the string
  `#?lesml´.
A language tag may follow this, beginning with `@´ and terminated with
  `$´, like so: `#?lesml@en$´.
Regardless of whether a language tag is present, this initial line may
  be terminated by a space‐separated list of properties of the form
  `key=value´.
Only one property is currently permitted—`profile´—whose value should
  be a U·R·I and identifies the set of conventions that the document is
  using.

Following the opening line, document metadata may be provided in the
  {🔗Record
  Jar<http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>}
  {@title="Data File Metaformats | The Art of Unix Programming"}
  format.[*fn_record-jar]
The body of the document begins after the last line which begins with
  the string `%%´, or after the opening line if none exists.

*¶fn_record-jar
The format differs a bit from the Record Jar format specified in the
  I·E·T·F `draft-phillips-record-jar-02´ draft:
There are no restrictions on field names; newlines are a simple line
  feed; continuation lines insert a space; character escapes are not
  supported.
These differences are negligible for most uses.

Multiple documents can be catenated into a single file; a new document
  is begun on any line which starts with `#?lesml´ or `##´.
Documents in the later case inherit the latest preceding `#?lesml´
  declaration.
`##´ may be followed by other text; this is treated as an interdocument
  comment.

❦ Document bodies

Document bodies are broken into blocks by blank lines.
Empty blocks are ignored.

Non·empty blocks (which, to be clear, may still result in empty
  elements) are classified by the sigils which begin them.

✠ Block level

A block can begin with any number of `⋮´ characters; these
  increase the level of the block.
Blocks of higher level are nested within blocks of lower level, with
  the exception that plain blocks cannot be nested as the first
  children of other plain blocks, and no blocks are nestable within
  comments.

✠ Block sigils

Following this, new blocks are opened for each successive sigil :⁠—

• A `•´ sigil indicates an unordered list item.
When it is the first sigil in the list, `◦´ may be used as a
  shorthand for `⋮•´, `▪´ for `⋮⋮•´, and `⁃´ for `⋮⋮⋮•´.

• A `℣´ sigil indicates a definition term, and a `℟´ sigil indicates
  the corresponding value.

• A `№´ sigil indicates an ordered list item.

• A `※´ sigil indicates an ordinary note.

• A `⯑´ sigil indicates a questioning note.

• A `∫´ sigil indicates an abstract or summary.

• A `☡´ sigil indicates a cautionary notice.

• A `⚠´ sigil indicates a warning notice.

• A `🛈´ sigil indicates an informative callout.

• A `💡´ sigil indicates a tip.

• A `»´ sigil indicates a block quotation.

• A `∎´ sigil indicates a footer or caption.

A conceptual “plain” block exists at the end of the list of explicit
  blocks.

Whitespace characters can appear on either side of each sigil or `⋮´
  character.

✠ Paragraph types

Each block contains a single paragraph, which is classified as
  follows :⁠—

• If the paragraph is a single line and consists of only the following
  section‐break characters, plus any amount of white·space, then it is
  considered to be a section break.

⋮ The section break characters are :⁠—

⋮ • `U+002A * ASTERISK´

⋮ • `U+002D - HYPHEN-MINUS´

⋮ • `U+002E . FULL STOP´

⋮ • `U+003D = EQUALS SIGN´

⋮ • `U+005F _ LOW LINE´

⋮ • `U+007E ~ TILDE´

⋮ • `U+00A0   NO-BREAK SPACE´

⋮ • `U+00B7 · MIDDLE DOT´

⋮ • `U+2024 ․ ONE DOT LEADER´

⋮ • `U+2025 ‥ TWO DOT LEADER´

⋮ • `U+2026 … HORIZONTAL ELLIPSIS´

⋮ • `U+2042 ⁂ ASTERISM´

⋮ • `U+2060 ⁠ WORD JOINER´

⋮ • `U+22EF ⋯ MIDLINE HORIZONTAL ELLIPSIS´

⋮ • `U+2500 ─ BOX DRAWINGS LIGHT HORIZONTAL´

⋮ • `U+2501 ━ BOX DRAWINGS HEAVY HORIZONTAL´

⋮ • `U+2504 ┄ BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL´

⋮ • `U+2505 ┅ BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL´

⋮ • `U+2508 ┈ BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL´

⋮ • `U+2509 ┉ BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL´

⋮ • `U+254C ╌ BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL´

⋮ • `U+254D ╍ BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL´

⋮ • `U+2550 ═ BOX DRAWINGS DOUBLE HORIZONTAL´

⋮ • `U+2574 ╴ BOX DRAWINGS LIGHT LEFT´

⋮ • `U+2576 ╶ BOX DRAWINGS LIGHT RIGHT´

⋮ • `U+2578 ╸ BOX DRAWINGS HEAVY LEFT´

⋮ • `U+257A ╺ BOX DRAWINGS HEAVY RIGHT´

⋮ • `U+2619 ☙ REVERSED ROTATED FLORAL HEART BULLET´

⋮ • `U+2767 ❧ ROTATED FLORAL HEART BULLET´

⋮ • `U+3000 　 IDEOGRAPHIC SPACE´

⋮ • `U+30FB ・ KATAKANA MIDDLE DOT´

⋮ • `U+FF0A ＊ FULLWIDTH ASTERISK´

⋮ • `U+FF0D － FULLWIDTH HYPHEN-MINUS´

⋮ • `U+FF0E ． FULLWIDTH FULL STOP´

⋮ • `U+FF1D ＝ FULLWIDTH EQUALS SIGN´

⋮ • `U+FF3F ＿ FULLWIDTH LOW LINE´

⋮ • `U+FF5E ～ FULLWIDTH TILDE´

• If the opening string of `⋮´ characters, sigils, and whitespace
  characters is followed by a `|´, and this full sequence appears at
  the beginning of each successive line, the paragraph is preformatted.
If each `|´ is immediately followed by a `$´, it is a code block.
A syntax may be specified for the code block by inserting its name
  between the `|´ and `$´.

• If the paragraph begins with `#´, it is an editorial comment and
  should not be rendered or processed further.

• If the paragraph begins with `⁌´, `§´, `❦´, or `✠´, it is a
  chapter, section, subsection, or subsubsection heading, respectively.

• If the paragraph begins with `^´, it is a footnote.
To be reference·able, the footnote must have an identifier, described
  below.
Footnotes which are not referenced are dropped from the output.

• Otherwise, the paragraph is ordinary.

Finally, at the beginning of each (noncomment, nonrule) paragraph there
  may be a `¶´ (optionally preceded by whitespace) followed by zero or
  more nonwhitespace characters.
The characters following the `¶´, if present, give the identifier for
  the paragraph, which is expected to be unique within a document.
This may be suffixed with a language tag beginning with `@´ and
  terminated with `$´.

The remaining characters in a paragraph form its contents.
Markup within paragraphs is delimited with·out exception by pairs of
  characters, with the following precedence :⁠—

• The characters `⌦´ and `⌫´ indicate inline comments.
A single character `⌧´ may be used to indicate an “empty” comment
  (consisting of `U+034F COMBINING GRAPHEME JOINER´ for X·M·L
  compatibility).

• The characters `{@´ and `"}´ indicate attribute specifications.
The attribute specification must contain at least one `="´ which
  separates the key of the attribute from the value.
Attributes attach to the previous element or text node; if there is no
  such previous element or text node, an empty text node is used
  instead.
Multiple attributes can be given in sequence using multiple
  specifications.

• The characters `{🔗´ and `>}´ indicate a hyperlink to a U·R·L.
The hyperlink must contain at least one `<´; the content before the
  last `<` gives the text of the link, and the content after gives the
  U·R·L that the link points to.
If no text is given, the U·R·L will be used instead.

• The characters `⸠´ and `⸡´ indicate a strikethru.

• The characters `⸤´ and `⸥´ indicate underlining.

• The characters `⟦´ and `⟧´ indicate an inline note.

• The characters `⸨´ and `⸩´ indicate parenthetical content.

• The characters `{U+60}´ and `{U+B4}´ indicate code.

• The characters `⟪´ and `⟫´ indicate titles.

• The characters `⸶´ and `⸷´ indicate names.

• The characters `⟨´ and `⟩´ indicate offset text.

• The characters `⦃´ and `⦄´ indicate keyword highlighting.

• The characters `☞︎´ and `☜︎´ indicate strong importance.

• The characters `⹐´ and `⹑´ indicate emphasis.

• The characters `[^´ and `]´ indicate a footnote reference.
  The characters between these sigils must match the i·d of some
    footnote which is a sibling to the current paragraph or one of its
    ancestors.

Once the tree is built as above, it is remediated into its final form
  by the following steps :⁠—

• Blocks of higher level are nested within preceding blocks of lower
  level, as described above.

• Successive list items of the same type are joined into a single list.

Finally, any character can be escaped by instead providing its Unicode
  codepoint in the form `{U+NNNN}´, where `NNNN´ is one or more
  hexadecimal digits.
Multiple codepoints may be provided separated by periods, as in
  `{U+WWWW.ZZZZ}´.
Due to limitations in X·S·L·T, characters cannot be escaped in
  attributes (including link targets).

§ Usage

💄📝 Les·M·L is designed for usage with
  {🔗⛩📰 书社<https://git.ladys.computer/Shushe/>}.
Simply include the `xslt/lesml.xslt´ provided by this repository to
  ⛩📰 书社 as an additional parser, and `magic/lesml.magic´ as an
  additional magic file.

For simpler usecases, the `bin/lesml´ script can be used to convert a
  single file (or standard input).

§ License

This repository conforms to {🔗REUSE<https://reuse.software/spec/>}.

The parser is licensed under the terms of the Mozilla Public
  License, version 2.0.