From: Lady Date: Tue, 31 Mar 2026 01:28:11 +0000 (-0400) Subject: New block behaviour and repository layout X-Git-Tag: 0.7.0~1 X-Git-Url: https://git.ladys.computer/LesML/commitdiff_plain/7ddcc7cae1ad36324aa1a3b3c2176bec9b9a8def3bfd67233188ecb50999a5e7?hp=e4d0e426f252fa4b19f4b76dc55494c416082a8022b943dba380531977f96117 New block behaviour and repository layout This commit makes extremely breaking changes to the block‐level semantics of the language, effectively rewriting those aspects of parsing from scratch. At the same time, it restructures the repository and adds test cases. Commits (and tags) prior to this commit have been redone with Les·M·L‐format commit messages, and the repository now uses Sha·256 hashes rather than Sha·1. One could consider this a “version 2” of the Les·M·L format, but it is the softest of reboots: Most files will continue to produce more·or·less exactly the same result after only a replacement of `#!lesml´ with `#?lesml´. (There are some negligible differences in whitespace handling.) The major cases where this is ⹐not⹑ true are those of blockquotes, footers, and continuations. The concepts of bracketed and quoted paragraphs have been removed and footers and blockquotes are now ordinary block types alongside sections and list items. Blocks now have a “level”, indicated by the number of leading `⋮´ characters, and blocks of higher level are nested inside preceding ones of lower level. This makes the existing continuation behaviour more consistent and makes the language more flexible. The character `⋮´ is used instead of `⋯´ because it is semantically more accurate and the latter can also make up horizontal rules. Multiple block sigils can now be used together to create deeply nested block structures that an equal number of continuation symbols are used to continue. For example, a leading `» • •´ introduces a list inside a list inside a blockquote, and `⋮ ⋮ •´ introduces a second list item in the nested list. The higher‐level unordered list characters `◦▪⁃´ are supported only as the first block sigils after any continuation symbols. They are simply a shorthand for `⋮ •´, `⋮ ⋮ •´, and `⋮ ⋮ ⋮ •´, respectively. The ordered list characters have been replaced by a single `№´. Preformatted text requires the same prefix on each line. The prefix, which begins with any continuation characters or block sigils, can end in one of three ways :⁠— № With a pipe, a syntax name, and a dollar sign, such as `|sh$´. The syntax name can consist only of Ascii lowercase letters, numbers, slashes, dots, and hyphen dashes and identifies the preformatted text as code in the corresponding language. The semantics of syntax names are left undefined and up to profiles or applications to determine. № With a pipe and a dollar sign with nothing inbetween. In this case, the preformatted text is code of an unspecified syntax. № With a pipe only. In this case, the text is preformatted, but not code. Footnote references now have the format `[^fn_id]´ and the corresponding footnotes `^¶fn_id´. “Footnotes” which are never referenced in a document are now dropped, meaning that they can be used to fully remove text from the output (unlike most forms of commenting, which produce X·M·L comments). The character `*´ was considered over `^´, but it has too many additional meanings in English text. The character introducing footers has been changed from `]´ to `∎´. The character introducing abstracts has been changed from `@´ to `∫´. A new “tip” section has been introduced, with sigil `💡´. Whitespace is no longer ignored before attribute specifications. In general, more whitespace is preserved literally. The hope at this point is that the Les·M·L format is fairly stable, and no major backwards‐incompatible changes will be made (excepting the minor backwards incompatibility which is inherent in adding new features to a syntax which already accepts all inputs with·out complaint). A couple more features are anticipated, but “Version 1.0.0” will probably come soon. --- diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..05296c1 --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +# SPDX-FileCopyrightText: 2026 Hojarasca +# SPDX-License-Identifier: CC0-1.0 + +/@* +!/@modules.tmp +*%* +.~* diff --git a/README.lesml b/README.lesml new file mode 100644 index 0000000..dfc813b --- /dev/null +++ b/README.lesml @@ -0,0 +1,311 @@ +#!lesml@en$ +## @(#)💄📝 Les·M·L README.lesml 2026-03-31T01:28:11Z +## SPDX-FileCopyrightText: 2024, 2025, 2026 Lady +## SPDX-License-Identifier: CC0-1.0 + +⁌ 💄📝 Les·M·L + +💄📝 Les·M·L is a document markup language designed with two goals in + mind :⁠— + +№ It must be trivial to parse, even with limited tooling such as that + provided by X·S·L·T. + +№ It must be sophisticated enough to handle longform hypertext + documents and associated metadata. + +It is implemented as an X·S·L·T transformation from a + `´ element into H·T·M·L + (`parser.xslt´). + +§ Nomenclature + +⟨Les·M·L⟩ is an abbreviation of the phrase ⟨Ladys Extremely Simple + Markup Language⟩. + +§ Markup syntax + +❦ Document headers + +The first line of any 💄📝 Les·M·L document should be the string + `#?lesml´. +A language tag may follow this, beginning with `@´ and terminated with + `$´, like so: `#?lesml@en$´. +Regardless of whether a language tag is present, this initial line may + be terminated by a space‐separated list of properties of the form + `key=value´. +Only one property is currently permitted—`profile´—whose value should + be a U·R·I and identifies the set of conventions that the document is + using. + +Following the opening line, document metadata may be provided in the + {🔗Record + Jar} + {@title="Data File Metaformats | The Art of Unix Programming"} + format.[*fn_record-jar] +The body of the document begins after the last line which begins with + the string `%%´, or after the opening line if none exists. + +*¶fn_record-jar +The format differs a bit from the Record Jar format specified in the + I·E·T·F `draft-phillips-record-jar-02´ draft: +There are no restrictions on field names; newlines are a simple line + feed; continuation lines insert a space; character escapes are not + supported. +These differences are negligible for most uses. + +Multiple documents can be catenated into a single file; a new document + is begun on any line which starts with `#?lesml´ or `##´. +Documents in the later case inherit the latest preceding `#?lesml´ + declaration. +`##´ may be followed by other text; this is treated as an interdocument + comment. + +❦ Document bodies + +Document bodies are broken into blocks by blank lines. +Empty blocks are ignored. + +Non·empty blocks (which, to be clear, may still result in empty + elements) are classified by the sigils which begin them. + +✠ Block level + +A block can begin with any number of `⋮´ characters; these + increase the level of the block. +Blocks of higher level are nested within blocks of lower level, with + the exception that plain blocks cannot be nested as the first + children of other plain blocks, and no blocks are nestable within + comments. + +✠ Block sigils + +Following this, new blocks are opened for each successive sigil :⁠— + +• A `•´ sigil indicates an unordered list item. +When it is the first sigil in the list, `◦´ may be used as a + shorthand for `⋮•´, `▪´ for `⋮⋮•´, and `⁃´ for `⋮⋮⋮•´. + +• A `№´ sigil indicates an ordered list item. + +• A `※´ sigil indicates an ordinary note. + +• A `⯑´ sigil indicates a questioning note. + +• A `∫´ sigil indicates an abstract or summary. + +• A `☡´ sigil indicates a cautionary notice. + +• A `⚠´ sigil indicates a warning notice. + +• A `🛈´ sigil indicates an informative callout. + +• A `💡´ sigil indicates a tip. + +• A `»´ sigil indicates a block quotation. + +• A `∎´ sigil indicates a footer or caption. + +A conceptual “plain” block exists at the end of the list of explicit + blocks. + +Whitespace characters can appear on either side of each sigil or `⋮´ + character. + +✠ Paragraph types + +Each block contains a single paragraph, which is classified as + follows :⁠— + +• If the paragraph is a single line and consists of only the following + section‐break characters, plus any amount of white·space, then it is + considered to be a section break. + +⋮ The section break characters are :⁠— + +⋮ • `U+002A * ASTERISK´ + +⋮ • `U+002D - HYPHEN-MINUS´ + +⋮ • `U+002E . FULL STOP´ + +⋮ • `U+003D = EQUALS SIGN´ + +⋮ • `U+005F _ LOW LINE´ + +⋮ • `U+007E ~ TILDE´ + +⋮ • `U+00A0 NO-BREAK SPACE´ + +⋮ • `U+00B7 · MIDDLE DOT´ + +⋮ • `U+2024 ․ ONE DOT LEADER´ + +⋮ • `U+2025 ‥ TWO DOT LEADER´ + +⋮ • `U+2026 … HORIZONTAL ELLIPSIS´ + +⋮ • `U+2042 ⁂ ASTERISM´ + +⋮ • `U+2060 ⁠ WORD JOINER´ + +⋮ • `U+22EF ⋯ MIDLINE HORIZONTAL ELLIPSIS´ + +⋮ • `U+2500 ─ BOX DRAWINGS LIGHT HORIZONTAL´ + +⋮ • `U+2501 ━ BOX DRAWINGS HEAVY HORIZONTAL´ + +⋮ • `U+2504 ┄ BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL´ + +⋮ • `U+2505 ┅ BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL´ + +⋮ • `U+2508 ┈ BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL´ + +⋮ • `U+2509 ┉ BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL´ + +⋮ • `U+254C ╌ BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL´ + +⋮ • `U+254D ╍ BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL´ + +⋮ • `U+2550 ═ BOX DRAWINGS DOUBLE HORIZONTAL´ + +⋮ • `U+2574 ╴ BOX DRAWINGS LIGHT LEFT´ + +⋮ • `U+2576 ╶ BOX DRAWINGS LIGHT RIGHT´ + +⋮ • `U+2578 ╸ BOX DRAWINGS HEAVY LEFT´ + +⋮ • `U+257A ╺ BOX DRAWINGS HEAVY RIGHT´ + +⋮ • `U+2619 ☙ REVERSED ROTATED FLORAL HEART BULLET´ + +⋮ • `U+2767 ❧ ROTATED FLORAL HEART BULLET´ + +⋮ • `U+3000 　 IDEOGRAPHIC SPACE´ + +⋮ • `U+30FB ・ KATAKANA MIDDLE DOT´ + +⋮ • `U+FF0A ＊ FULLWIDTH ASTERISK´ + +⋮ • `U+FF0D － FULLWIDTH HYPHEN-MINUS´ + +⋮ • `U+FF0E ． FULLWIDTH FULL STOP´ + +⋮ • `U+FF1D ＝ FULLWIDTH EQUALS SIGN´ + +⋮ • `U+FF3F ＿ FULLWIDTH LOW LINE´ + +⋮ • `U+FF5E ～ FULLWIDTH TILDE´ + +• If the opening string of `⋮´ characters, sigils, and whitespace + characters is followed by a `|´, and this full sequence appears at + the beginning of each successive line, the paragraph is preformatted. +If each `|´ is immediately followed by a `$´, it is a code block. +A syntax may be specified for the code block by inserting its name + between the `|´ and `$´. + +• If the paragraph begins with `#´, it is an editorial comment and + should not be rendered or processed further. + +• If the paragraph begins with `⁌´, `§´, `❦´, or `✠´, it is a + chapter, section, subsection, or subsubsection heading, respectively. + +• If the paragraph begins with `^´, it is a footnote. +To be reference·able, the footnote must have an identifier, described + below. +Footnotes which are not referenced are dropped from the output. + +• Otherwise, the paragraph is ordinary. + +Finally, at the beginning of each (noncomment, nonrule) paragraph there + may be a `¶´ (optionally preceded by whitespace) followed by zero or + more nonwhitespace characters. +The characters following the `¶´, if present, give the identifier for + the paragraph, which is expected to be unique within a document. +This may be suffixed with a language tag beginning with `@´ and + terminated with `$´. + +The remaining characters in a paragraph form its contents. +Markup within paragraphs is delimited with·out exception by pairs of + characters, with the following precedence :⁠— + +• The characters `⌦´ and `⌫´ indicate inline comments. +A single character `⌧´ may be used to indicate an “empty” comment + (consisting of `U+034F COMBINING GRAPHEME JOINER´ for X·M·L + compatibility). + +• The characters `{@´ and `"}´ indicate attribute specifications. +The attribute specification must contain at least one `="´ which + separates the key of the attribute from the value. +Attributes attach to the previous element or text node; if there is no + such previous element or text node, an empty text node is used + instead. +Multiple attributes can be given in sequence using multiple + specifications. + +• The characters `{🔗´ and `>}´ indicate a hyperlink to a U·R·L. +The hyperlink must contain at least one `<´; the content before the + last `<` gives the text of the link, and the content after gives the + U·R·L that the link points to. +If no text is given, the U·R·L will be used instead. + +• The characters `⸠´ and `⸡´ indicate a strikethru. + +• The characters `⸤´ and `⸥´ indicate underlining. + +• The characters `⟦´ and `⟧´ indicate an inline note. + +• The characters `⸨´ and `⸩´ indicate parenthetical content. + +• The characters `{U+60}´ and `{U+B4}´ indicate code. + +• The characters `⟪´ and `⟫´ indicate titles. + +• The characters `⸶´ and `⸷´ indicate names. + +• The characters `⟨´ and `⟩´ indicate offset text. + +• The characters `⦃´ and `⦄´ indicate keyword highlighting. + +• The characters `☞︎´ and `☜︎´ indicate strong importance. + +• The characters `⹐´ and `⹑´ indicate emphasis. + +• The characters `[^´ and `]´ indicate a footnote reference. + The characters between these sigils must match the i·d of some + footnote which is a sibling to the current paragraph or one of its + ancestors. + +Once the tree is built as above, it is remediated into its final form + by the following steps :⁠— + +• Blocks of higher level are nested within preceding blocks of lower + level, as described above. + +• Successive list items of the same type are joined into a single list. + +Finally, any character can be escaped by instead providing its Unicode + codepoint in the form `{U+NNNN}´, where `NNNN´ is one or more + hexadecimal digits. +Multiple codepoints may be provided separated by periods, as in + `{U+WWWW.ZZZZ}´. +Due to limitations in X·S·L·T, characters cannot be escaped in + attributes (including link targets). + +§ Usage + +💄📝 Les·M·L is designed for usage with + {🔗⛩📰 书社}. +Simply include the `xslt/lesml.xslt´ provided by this repository to + ⛩📰 书社 as an additional parser, and `magic/lesml.magic´ as an + additional magic file. + +For simpler usecases, the `bin/lesml´ script can be used to convert a + single file (or standard input). + +§ License + +This repository conforms to {🔗REUSE}. + +The parser is licensed under the terms of the Mozilla Public + License, version 2.0. diff --git a/README.markdown b/README.markdown deleted file mode 100644 index 835285c..0000000 --- a/README.markdown +++ /dev/null @@ -1,320 +0,0 @@ - -# 💄📝 Les·M·L - -Ladys simple markup language. - -💄📝 Les·M·L is a document markup language designed with two goals in - mind :⁠— - -1. It must be trivial to parse, even with limited tooling such as that - provided by X·S·L·T. - -2. It must be sophisticated enough to handle longform hypertext - documents and associated metadata. - -It is implemented as an X·S·L·T transformation from a - `` element into H·T·M·L - (`parser.xslt`). - -## Nomenclature - -Les·M·L is an abbreviation of the phrase “Ladys Extremely Simple - Markup Language”. - -## Markup Syntax - -The first line of any 💄📝 Les·M·L document should be the string - `#!lesml`. -A language tag may follow this, beginning with `@` and terminated with - `$`, like so: -`#!lesml@en$`. -Regardless of whether a language tag is present, the shebang line may - be terminated by a space‐separated list of properties of the form - `key=value`. -Only one property is currently permitted: `profile`, whose value should - be a U·R·I and is translated to the `@data-lesml-profile` attribute - on the resulting `` element. - -Following the shebang line, document metadata may be provided in the - [Record Jar][draft-phillips-record-jar-01] format. -The body of the document begins after the last line which begins with - the string `%%`, or after the shebang line if none exists. - -Multiple documents can be catenated into a single file; a new document - is begun on any line which starts with `#!lesml` or `##`. -Documents in the later case inherit the latest preceding `#!lesml` - declaration. -`##` may be followed by other text; this is treated as an interdocument - comment. - -Documents are broken into paragraphs by blank lines. -Empty paragraphs are ignored. - -If every line in the paragraph begins with (optional white·space - followed by) `»` it is quoted (``); if every line - begins with `]` it is bracketed. -The lines, minus this leading, are then re‐analysed. -Bracketed paragraphs which end quotes are treated as captions - (``); otherwise, they are footers (``). - -Non·empty paragraphs (which, to be clear, may still result in empty - `` elements) are classified as follows :⁠— - -- If the paragraph consists of only the following section‐break - characters, plus any amount of white·space, then it is - considered to be a section break (``). - - The section break characters are :⁠— - - | Character | Codepoint | Unicode Name | - | --------- | --------- | ------------ | - | `*` | `U+002A` | `ASTERISK` | - | `-` | `U+002D` | `HYPHEN-MINUS` | - | `.` | `U+002E` | `FULL STOP` | - | `=` | `U+003D` | `EQUALS SIGN` | - | `_` | `U+005F` | `LOW LINE` | - | `~` | `U+007E` | `TILDE` | - | `·` | `U+00B7` | `MIDDLE DOT` | - | `․` | `U+2024` | `ONE DOT LEADER` | - | `‥` | `U+2025` | `TWO DOT LEADER` | - | `…` | `U+2026` | `HORIZONTAL ELLIPSIS` | - | `⁂` | `U+2042` | `ASTERISM` | - | `⋯` | `U+22EF` | `MIDLINE HORIZONTAL ELLIPSIS` | - | `─` | `U+2500` | `BOX DRAWINGS LIGHT HORIZONTAL` | - | `━` | `U+2501` | `BOX DRAWINGS HEAVY HORIZONTAL` | - | `┄` | `U+2504` | `BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL` | - | `┅` | `U+2505` | `BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL` | - | `┈` | `U+2508` | `BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL` | - | `┉` | `U+2509` | `BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL` | - | `╌` | `U+254C` | `BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL` | - | `╍` | `U+254D` | `BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL` | - | `═` | `U+2550` | `BOX DRAWINGS DOUBLE HORIZONTAL` | - | `╴` | `U+2574` | `BOX DRAWINGS LIGHT LEFT` | - | `╶` | `U+2576` | `BOX DRAWINGS LIGHT RIGHT` | - | `╸` | `U+2578` | `BOX DRAWINGS HEAVY LEFT` | - | `╺` | `U+257A` | `BOX DRAWINGS HEAVY RIGHT` | - | `☙` | `U+2619` | `REVERSED ROTATED FLORAL HEART BULLET` | - | `❧` | `U+2767` | `ROTATED FLORAL HEART BULLET` | - | `　` | `U+3000` | `IDEOGRAPHIC SPACE` | - | `・` | `U+30FB` | `KATAKANA MIDDLE DOT` | - | `＊` | `U+FF0A` | `FULLWIDTH ASTERISK` | - | `－` | `U+FF0D` | `FULLWIDTH HYPHEN-MINUS` | - | `．` | `U+FF0E` | `FULLWIDTH FULL STOP` | - | `＝` | `U+FF1D` | `FULLWIDTH EQUALS SIGN` | - | `＿` | `U+FF3F` | `FULLWIDTH LOW LINE` | - | `～` | `U+FF5E` | `FULLWIDTH TILDE` | - -- If every line in the paragraph begins with zero or more white·space - characters followed by `|`, it is a “preformatted” paragraph and - white·space is not collapsed (``). - -- Otherwise, the paragraph is ordinary. - -After this classification, each ordinary paragraph is further - classified by type based on its first character (which must be - followed by white·space or a pilcrow, or else be the only thing on - the line) :⁠— - -- If the paragraph is preformatted, it is an ordinary paragraph. - -- If the paragraph begins with `⁌`, it is a chapter heading - (``). - -- If the paragraph begins with `§`, it is a section heading - (``). - -- If the paragraph begins with `❦`, it is a subsection heading - (``). - -- If the paragraph begins with `✠`, it is a subsubsection heading - (``). - -- If the paragraph begins with `•` or `🔢`, it is a primary unordered - or ordered list item (`` - or ``). - -- If the paragraph begins with `◦` or `🔠`, it is a secondary unordered - or ordered list item (`` - or ``). - Secondary list items are considered to be nested inside of primary - list items which precede them. - -- If the paragraph begins with `▪` or `🔡`, it is a tertiary unordered - or ordered list item (`` - or ``). - Tertiary list items are considered to be nested inside of primary - and secondary list items which precede them. - -- If the paragraph begins with `⁃` or `🔣`, it is a quaternary - unordered or ordered list item - (`` or - ``). - Quaternary list items are considered to be nested inside of primary, - secondary, and tertiary list items which precede them. - -- If the paragraph begins with `※`, it is an ordinary note - (``). - -- If the paragraph begins with `☡`, it is a cautionary note - (``). - -- If the paragraph begins with `⯑`, it is a questioning note - (``). - -- If the paragraph begins with `@`, it is an abstract - (``). - -- If the paragraph begins with `🛈`, it is a (informative) tip - (``). - -- If the paragraph begins with `⚠︎`, it is a (warning) notice - (``). - -- If the paragraph begins with `^`, it is a footnote - (``). - Footnotes are ignored unless their first paragraph has an i·d - (specified with `¶`) which is referenced by one or more footnote - references. - Footnotes are treated as level 1 ordered list items, so they can - contain nested lists. - - Footnotes are removed from the normal document flow and placed in a - footer (``) in order of first - reference. - It is recommended that the i·d¦s you choose are kept stable, so that - links to footnotes do not break. - -- If the paragraph begins with `#`, it is a comment. - Comments produce X·M·L comment nodes and can be used to break up list - items into separate lists. - -- If the paragraph begins with `⋯`, it is a continuation paragraph. - Continuation paragraphs may be used to continue a preceding note, - footnote, or list item. - If there is no such preceding note, footnote, or list item, they will - attach to adjacent heading elements to form heading groups - (``). - Otherwise, they will be treated as ordinary paragraphs. - -- Otherwise, it is an ordinary paragraph. - -Following this sigil (if any) there may be a `¶` followed by zero or - more non·white·space characters. -The characters following the `¶` give the identifier for the paragraph, - which is expected to be unique within a document. -This may be suffixed with a language tag beginning with `@` and - terminated with `$`. - -When a paragraph produces an `` element “wrapped in” another - kind of element (e·g, a blockquote, section, or list item), the - identifier and language of the first paragraph are applied to the - wrapping element. -If the first paragraph has no other contents, it is deleted. -To apply the identifier or language to the `` element itself, - and not its wrapper, one can simply make the first paragraph empty - (using a literal `¶` with no other contents). -This paragraph will be dropped, but the following paragraphs will still - be processed as non·initial. - -The remaining characters in a paragraph form its contents. -Markup within paragraphs is delimited with·out exception by pairs of - characters, with the following precedence :⁠— - -- The characters `⌦` and `⌫` indicate inline comments. - A single character `⌧` may be used to indicate an “empty” comment - (consisting of `U+034F COMBINING GRAPHEME JOINER` for X·M·L - compatibility). - -- The characters `{@` and `"}` indicate attribute specifications. - The attribute specification must contain at least one `="` which - separates the key of the attribute from the value. - Attributes attach to the previous element or text node, with - white·space‐only text nodes after elements ignored; if there is no - such previous element or text node, an empty text node is used - instead. - Multiple attributes can be given in sequence using multiple - specifications. - Text nodes with attributes are wrapped in ``. - -- The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L - (``). - The hyperlink must contain at least one `<`; the content before the - last `<` gives the text of the link, and the content after gives - the U·R·L that the link points to. - If no text is given, the U·R·L will be used instead. - -- The characters `⸠` and `⸡` indicate a strikethru (``). - -- The characters `⸤` and `⸥` indicate underlining (``). - -- The characters `⟦` and `⟧` indicate an inline note - (``). - -- The characters `⸨` and `⸩` indicate parenthetical content - (``). - -- The characters `` ` `` and `´` indicate code (``). - -- The characters `⟪` and `⟫` indicate titles (``). - -- The characters `⸶` and `⸷` indicate names (``). - -- The characters `⟨` and `⟩` indicate offset text (``). - -- The characters `⦃` and `⦄` indicate keyword highlighting - (``). - -- The characters `☞︎` and `☜︎` indicate strong importance - (``). - -- The characters `⹐` and `⹑` indicate emphasis (``). - -- The characters `^` and `.` indicate a footnote reference - (``). - The characters between these sigils must match the i·d of the first - paragraph of some footnote in the same document. - -Once the tree is built as above, it is remediated into its final form - by the following steps :⁠— - -- Continuation paragraphs are joined with the preceding list items or - sections. - -- List items of a higher level are nested in preceding list items, when - present. - List items of a level greater than 1 can also be nested in preceding - sections (notes, abstracts, ⁊·c…). - -- Successive list items of the same level and class are joined into - a single list. - -- Linebreaks in preformatted paragraphs are replaced with ``. - -Finally, any character can be escaped by instead providing its Unicode - codepoint in the form `{U+NNNN}`, where `NNNN` is one or more - hexadecimal digits. -Multiple codepoints may be provided separated by periods, as in - `{U+WWWW.ZZZZ}`. -Due to limitations in X·S·L·T, characters cannot be escaped in - attributes (including link targets). - -## Usage - -💄📝 Les·M·L is designed for usage with [⛩📰 书社][Shushe]. -Simply include the `parser.xslt` provided by this repository to - ⛩📰 书社 as an additional parser, and `magic` as an additional - magic file. - -## License - -This repository conforms to [REUSE][]. - -The parser is licensed under the terms of the Mozilla Public - License, version 2.0. - -[REUSE]: -[Shushe]: -[draft-phillips-record-jar-01]: diff --git a/TESTING/REUSE.toml b/TESTING/REUSE.toml new file mode 100644 index 0000000..57994d6 --- /dev/null +++ b/TESTING/REUSE.toml @@ -0,0 +1,24 @@ +#?toml + +# This file provides licensing information for the files in this +# (`TESTING´) directory, as it would complicate the tests to include +# license information in the files themselves, and `.license´ files +# would really clutter up the directory. +# In short: +# The input and output files located in the `📥´ and `📤´ +# subdirectories are all dedicated to the public domain. +# +# See {🔗} for more on how +# this file should be understood. + +version = 1 + +[[annotations]] +path = "📥/**" +SPDX-FileCopyrightText = "2026 Lady " +SPDX-License-Identifier = "CC0-1.0" + +[[annotations]] +path = "📤/**" +SPDX-FileCopyrightText = "2026 Lady " +SPDX-License-Identifier = "CC0-1.0" diff --git "a/TESTING/\360\237\223\244/at-id" "b/TESTING/\360\237\223\244/at-id" new file mode 100644 index 0000000..1f69024 --- /dev/null +++ "b/TESTING/\360\237\223\244/at-id" @@ -0,0 +1,2 @@ + +

este tambien

diff --git "a/TESTING/\360\237\223\244/attributes" "b/TESTING/\360\237\223\244/attributes" new file mode 100644 index 0000000..9588778 --- /dev/null +++ "b/TESTING/\360\237\223\244/attributes" @@ -0,0 +1,4 @@ + +

Testing attributes

, after, broken + , after, and separated by +.

Each of these lines begins with a space; how does it affect the result?

 , after, broken
 , after, and
 separated by
 .

{@invalid ="attribute"}; {@invalid data-characters="in an attribute"}; {@data-bad-escape-{U+40}="attribute"}; ;

{@inline="(this shouldn’t work)"}

diff --git "a/TESTING/\360\237\223\244/comments" "b/TESTING/\360\237\223\244/comments" new file mode 100644 index 0000000..36032fc --- /dev/null +++ "b/TESTING/\360\237\223\244/comments" @@ -0,0 +1,2 @@ + +

-----⌫ yep

diff --git "a/TESTING/\360\237\223\244/comments-continuations" "b/TESTING/\360\237\223\244/comments-continuations" new file mode 100644 index 0000000..1cf9bb5 --- /dev/null +++ "b/TESTING/\360\237\223\244/comments-continuations" @@ -0,0 +1,3 @@ + +

This continues nothing.

Here is a continuation of the div containing the comment.

It keeps going on.

diff --git "a/TESTING/\360\237\223\244/empty-para" "b/TESTING/\360\237\223\244/empty-para" new file mode 100644 index 0000000..24ff415 --- /dev/null +++ "b/TESTING/\360\237\223\244/empty-para" @@ -0,0 +1,2 @@ + +

Empty paragraphs are kept around :⁠—

Except when starting containers :⁠—

(but other ones stay)

but conversely…

diff --git "a/TESTING/\360\237\223\244/empty-para-id" "b/TESTING/\360\237\223\244/empty-para-id" new file mode 100644 index 0000000..fb800e4 --- /dev/null +++ "b/TESTING/\360\237\223\244/empty-para-id" @@ -0,0 +1,2 @@ + +

diff --git "a/TESTING/\360\237\223\244/footnotes" "b/TESTING/\360\237\223\244/footnotes" new file mode 100644 index 0000000..db80aa8 --- /dev/null +++ "b/TESTING/\360\237\223\244/footnotes" @@ -0,0 +1,3 @@ + +

real footnote.1 real footnote 2.2 not a real footnote.[^fn3] +also not a real footnote[^bad]footnote] the first footnote again.1

In the source, this footnote comes second, but it should be listed first.
- This footnote contains a nested list.
1 2

In the source, this footnote comes first, but it should be listed second.
This footnote has multiple paragraphs.
1

diff --git "a/TESTING/\360\237\223\244/funky-tags" "b/TESTING/\360\237\223\244/funky-tags" new file mode 100644 index 0000000..ac50a28 --- /dev/null +++ "b/TESTING/\360\237\223\244/funky-tags" @@ -0,0 +1,2 @@ + +

Here are ☞some `funky nested ☜tags.

diff --git "a/TESTING/\360\237\223\244/hgroup" "b/TESTING/\360\237\223\244/hgroup" new file mode 100644 index 0000000..740cff7 --- /dev/null +++ "b/TESTING/\360\237\223\244/hgroup" @@ -0,0 +1,3 @@ + +

Start HGROUP

My Test Document

End HGROUP

+An amazing document

End another HGROUP

diff --git "a/TESTING/\360\237\223\244/hr" "b/TESTING/\360\237\223\244/hr" new file mode 100644 index 0000000..6153276 --- /dev/null +++ "b/TESTING/\360\237\223\244/hr" @@ -0,0 +1,2 @@ + +

diff --git "a/TESTING/\360\237\223\244/keys-and-values" "b/TESTING/\360\237\223\244/keys-and-values" new file mode 100644 index 0000000..7c34284 --- /dev/null +++ "b/TESTING/\360\237\223\244/keys-and-values" @@ -0,0 +1,2 @@ + +

diff --git "a/TESTING/\360\237\223\244/langtags" "b/TESTING/\360\237\223\244/langtags" new file mode 100644 index 0000000..0529982 --- /dev/null +++ "b/TESTING/\360\237\223\244/langtags" @@ -0,0 +1,2 @@ + +

este es en español

diff --git "a/TESTING/\360\237\223\244/links" "b/TESTING/\360\237\223\244/links" new file mode 100644 index 0000000..b413d28 --- /dev/null +++ "b/TESTING/\360\237\223\244/links" @@ -0,0 +1,7 @@ + +

Here is <<< a very good@en$ link@en$ +{🔗 A {🔗 messy link + Even + + This works now +{🔗 Not a link >}

diff --git "a/TESTING/\360\237\223\244/lists" "b/TESTING/\360\237\223\244/lists" new file mode 100644 index 0000000..76deade --- /dev/null +++ "b/TESTING/\360\237\223\244/lists" @@ -0,0 +1,2 @@ + +

List 3
- List 3.4

List 1

Not List 1 Continued

A New List 1
- List 1.2
  List 1.2 Continued
  - List 1.2.4-1
    List 1.2.4-1 Continued
  - List 1.2.4-2
  - List 1.2.3
    List 1.2.3 Continued
List 1 again

Quoted List 1-1
Quoted List 1-2
Quoted List 1-2.4
Quoted List 1-2.4 Continued

Another List 1

List 1 of a different type

This note contains two lists.

The first list.

The second list.
- Nested in the second list.

diff --git "a/TESTING/\360\237\223\244/pre" "b/TESTING/\360\237\223\244/pre" new file mode 100644 index 0000000..fc96236 --- /dev/null +++ "b/TESTING/\360\237\223\244/pre" @@ -0,0 +1,2 @@ + +

Here is some
	poetry
Here is some
code

This is a code sample
Inside of a blockquote

Caption for the code sample.

printf '%s\n' "A shell script" \
  "Stretching across multiple lines"

diff --git "a/TESTING/\360\237\223\244/profile-with-comment" "b/TESTING/\360\237\223\244/profile-with-comment" new file mode 100644 index 0000000..ad20a8c --- /dev/null +++ "b/TESTING/\360\237\223\244/profile-with-comment" @@ -0,0 +1,2 @@ + +

content

diff --git "a/TESTING/\360\237\223\244/quotes" "b/TESTING/\360\237\223\244/quotes" new file mode 100644 index 0000000..37b1c8e --- /dev/null +++ "b/TESTING/\360\237\223\244/quotes" @@ -0,0 +1,2 @@ + +

Nested quote.
— Its caption.

Its real caption

 It can be multiple pars and contain markup.

diff --git "a/TESTING/\360\237\223\244/sections" "b/TESTING/\360\237\223\244/sections" new file mode 100644 index 0000000..aefb24a --- /dev/null +++ "b/TESTING/\360\237\223\244/sections" @@ -0,0 +1,2 @@ + +

Testing the different note kinds.

Hmm…

Watch out!

This is just an ordinary note.

This is an abstract of this document.

It is continued here.

This is some cautionary text.

This is a tip

diff --git "a/TESTING/\360\237\223\244/unicode-escapes" "b/TESTING/\360\237\223\244/unicode-escapes" new file mode 100644 index 0000000..2acdeb5 --- /dev/null +++ "b/TESTING/\360\237\223\244/unicode-escapes" @@ -0,0 +1,2 @@ + +

Testing ❤ some ❤️ things {U+}! ❤❤︎❤️{U+2764.bad}{U+2764..2766}{U+}

diff --git "a/TESTING/\360\237\223\245/at-id" "b/TESTING/\360\237\223\245/at-id" new file mode 100644 index 0000000..541b49c --- /dev/null +++ "b/TESTING/\360\237\223\245/at-id" @@ -0,0 +1,3 @@ +#?lesml@en$ + +• ¶item@example.com@es$ este tambien diff --git "a/TESTING/\360\237\223\245/attributes" "b/TESTING/\360\237\223\245/attributes" new file mode 100644 index 0000000..b7c0e66 --- /dev/null +++ "b/TESTING/\360\237\223\245/attributes" @@ -0,0 +1,20 @@ +#?lesml@en$ + +§ Testing attributes + +{@data-test="at the beginning"}, after{@data-test="some text"}, broken + {@data-test=" by +white space "}, `after´{@data-test="an element"}, and `separated by´ +{@data-test="some whitespace"}. + +Each of these lines begins with a space; how does it affect the result? + +| {@data-test="at the beginning"}, after{@data-test="some text"}, broken +| {@data-test=" by +| white space "}, `after´{@data-test="an element"}, and +| `separated by´ +| {@data-test="some whitespace"}. + +{@invalid {@data-nesting="in an"}="attribute"}; {@invalid data-characters="in an attribute"}; {@data-bad-escape-{U+40}="attribute"}; {@data-ok-escape="{U+22}attribute{U+22}"}; {@data-eqquot-ok="=""} {@data-and-a="second"} + +{@inline="⌦comments"}⌫(this shouldn’t work)"} diff --git "a/TESTING/\360\237\223\245/comments" "b/TESTING/\360\237\223\245/comments" new file mode 100644 index 0000000..adcd1d8 --- /dev/null +++ "b/TESTING/\360\237\223\245/comments" @@ -0,0 +1,3 @@ +#?lesml@en$ + +⌦------⌦--⌫-----⌫ ⌦a second ⌧ comment⌫ ye⌧p diff --git "a/TESTING/\360\237\223\245/comments-continuations" "b/TESTING/\360\237\223\245/comments-continuations" new file mode 100644 index 0000000..f33775c --- /dev/null +++ "b/TESTING/\360\237\223\245/comments-continuations" @@ -0,0 +1,10 @@ +#?lesml@en$ + +⋮ This continues nothing. + +# This is a comment. +It can extend over multiple lines. + +⋮ Here is a continuation of the div containing the comment. + +⋮ It keeps going on. diff --git "a/TESTING/\360\237\223\245/empty-para" "b/TESTING/\360\237\223\245/empty-para" new file mode 100644 index 0000000..30ec4b7 --- /dev/null +++ "b/TESTING/\360\237\223\245/empty-para" @@ -0,0 +1,21 @@ +#?lesml@en$ + +Empty paragraphs are kept around :⁠— + +¶ + +Except when starting containers :⁠— + +» ¶blockquote-id@en$ + +⋮ ¶ + +⋮ (but other ones stay) + +⋮ ¶ + +but conversely… + +» ¶ + +⋮ ¶do-not-assign-to-containing-blockquote@en$ diff --git "a/TESTING/\360\237\223\245/empty-para-id" "b/TESTING/\360\237\223\245/empty-para-id" new file mode 100644 index 0000000..8e6e6d6 --- /dev/null +++ "b/TESTING/\360\237\223\245/empty-para-id" @@ -0,0 +1,3 @@ +#?lesml@en$ + +¶this-id-should-be-preserved diff --git "a/TESTING/\360\237\223\245/footnotes" "b/TESTING/\360\237\223\245/footnotes" new file mode 100644 index 0000000..e3f2423 --- /dev/null +++ "b/TESTING/\360\237\223\245/footnotes" @@ -0,0 +1,14 @@ +#?lesml@en$ + +real footnote.[^fn1] real footnote 2.[^fn2] not a real footnote.[^fn3] +also not a real footnote[^bad]footnote] the first footnote again.[^fn1] + +^¶fn2 In the source, this footnote comes ⹐first⹑, but it should be listed ⹐second⹑. + +⋮ This footnote has multiple paragraphs. + +^¶fn1 In the source, this footnote comes ⹐second⹑, but it should be listed ⹐first⹑. + +◦ This footnote contains a nested list. + +^¶bad]footnote This footnote can¦t be referenced. diff --git "a/TESTING/\360\237\223\245/funky-tags" "b/TESTING/\360\237\223\245/funky-tags" new file mode 100644 index 0000000..860ddbc --- /dev/null +++ "b/TESTING/\360\237\223\245/funky-tags" @@ -0,0 +1,3 @@ +#?lesml@en$ + +Here are ☞some `☞funky☜ nested `☜tags´. diff --git "a/TESTING/\360\237\223\245/hgroup" "b/TESTING/\360\237\223\245/hgroup" new file mode 100644 index 0000000..ab70729 --- /dev/null +++ "b/TESTING/\360\237\223\245/hgroup" @@ -0,0 +1,12 @@ +#?lesml@en$ + +Start HGROUP + +⋮ ⁌¶@en$ My Test Document + +⋮ End HGROUP + +§ +An amazing document + +⋮ End another HGROUP diff --git "a/TESTING/\360\237\223\245/hr" "b/TESTING/\360\237\223\245/hr" new file mode 100644 index 0000000..a8a1863 --- /dev/null +++ "b/TESTING/\360\237\223\245/hr" @@ -0,0 +1,5 @@ +#?lesml@en$ + +»»» *** + +☙ ⁂ ❧ diff --git "a/TESTING/\360\237\223\245/keys-and-values" "b/TESTING/\360\237\223\245/keys-and-values" new file mode 100644 index 0000000..62187d2 --- /dev/null +++ "b/TESTING/\360\237\223\245/keys-and-values" @@ -0,0 +1,15 @@ +#?lesml@en$ profile=about:test +## document comment +## this is allowed following the opening shebang +%% +Key : Value +%%comm +%%ent +Key value +%% +multiline: key + and \ + value +%% + bad spaced : key +%% diff --git "a/TESTING/\360\237\223\245/langtags" "b/TESTING/\360\237\223\245/langtags" new file mode 100644 index 0000000..1910581 --- /dev/null +++ "b/TESTING/\360\237\223\245/langtags" @@ -0,0 +1,5 @@ +#?lesml@en$ + +¶@es$ + +¶@es$ este es en español diff --git "a/TESTING/\360\237\223\245/links" "b/TESTING/\360\237\223\245/links" new file mode 100644 index 0000000..069dc5a --- /dev/null +++ "b/TESTING/\360\237\223\245/links" @@ -0,0 +1,10 @@ +#?lesml@en$ + +# Keep these links in one paragraph as it is important for testing how they interrelate: + +{🔗Here is <<< a ⹐very good⹑@en$ ⟨link⟩@en$ } +{🔗 A {🔗 messy {🔗link } +{🔗 Even <{🔗messier>} +{🔗 } +{🔗 This{@class="important"} works now } +{🔗 Not a link >} diff --git "a/TESTING/\360\237\223\245/lists" "b/TESTING/\360\237\223\245/lists" new file mode 100644 index 0000000..f96e1ac --- /dev/null +++ "b/TESTING/\360\237\223\245/lists" @@ -0,0 +1,49 @@ +#?lesml@en$ + +▪ List 3 + +⁃ List 3.4 + +• List 1 + +# Comments prevent continuations. + +⋮ Not List 1 Continued + +• A New List 1 + +◦ List 1.2 + +⋮⋮ List 1.2 Continued + +⁃ List 1.2.4-1 + +⋮⋮⋮⋮ List 1.2.4-1 Continued + +⁃ List 1.2.4-2 + +▪ List 1.2.3 + +⋮⋮⋮ ¶1.2.3.c List 1.2.3 Continued + +• List 1 again + +» № Quoted List 1-1 + +⋮ № Quoted List 1-2 + +⋮ № Quoted List 1-2.4 + +⋮ ⋮ Quoted List 1-2.4 Continued + +• Another List 1 + +№ List 1 of a different type + +※ This note contains two lists. + +▪ The first list. + +◦ The second list. + +⁃ Nested in the second list. diff --git "a/TESTING/\360\237\223\245/pre" "b/TESTING/\360\237\223\245/pre" new file mode 100644 index 0000000..5faa40e --- /dev/null +++ "b/TESTING/\360\237\223\245/pre" @@ -0,0 +1,14 @@ +#?lesml@en$ + +|Here is some +| poetry +|Here is some +|`code´ + +» |$This is a ☞code☜ sample +» |$Inside of a blockquote + +⋮ ∎ Caption for the code sample. + +|sh$printf '%s\n' "A shell script" \ +|sh$ "Stretching across multiple lines" diff --git "a/TESTING/\360\237\223\245/profile-with-comment" "b/TESTING/\360\237\223\245/profile-with-comment" new file mode 100644 index 0000000..522e699 --- /dev/null +++ "b/TESTING/\360\237\223\245/profile-with-comment" @@ -0,0 +1,4 @@ +#?lesml@en$ profile=about:test +## document comment +## this is allowed following the opening shebang +content diff --git "a/TESTING/\360\237\223\245/quotes" "b/TESTING/\360\237\223\245/quotes" new file mode 100644 index 0000000..823ca9a --- /dev/null +++ "b/TESTING/\360\237\223\245/quotes" @@ -0,0 +1,9 @@ +#?lesml@en$ + +» » Nested quote. + +⋮ ¶foo — Its caption. + +⋮ ∎¶foo2 Its real caption + +⋮ ⋮ | It can be multiple pars and contain markup. diff --git "a/TESTING/\360\237\223\245/sections" "b/TESTING/\360\237\223\245/sections" new file mode 100644 index 0000000..edf3d12 --- /dev/null +++ "b/TESTING/\360\237\223\245/sections" @@ -0,0 +1,17 @@ +#?lesml@en$ + +🛈 Testing the different note kinds. + +⯑ Hmm… + +⚠ Watch out! + +※ This is just an ordinary note. + +∫ This is an abstract of this document. + +⋮ It is continued here. + +☡ This is some cautionary text. + +💡 This is a tip diff --git "a/TESTING/\360\237\223\245/unicode-escapes" "b/TESTING/\360\237\223\245/unicode-escapes" new file mode 100644 index 0000000..25f95b9 --- /dev/null +++ "b/TESTING/\360\237\223\245/unicode-escapes" @@ -0,0 +1,3 @@ +#?lesml@en$ + +Testing {U+2764} some {U+2764.FE0F} things {U+}! {U+2764}{U+2764.FE0E}{U+2764.FE0F}{U+2764.bad}{U+2764..2766}{U+} diff --git a/bin/lesml b/bin/lesml new file mode 120000 index 0000000..40c2f0f --- /dev/null +++ b/bin/lesml @@ -0,0 +1 @@ +../sh/lesml.sh \ No newline at end of file diff --git a/magic b/magic deleted file mode 100644 index b4bb1bc..0000000 --- a/magic +++ /dev/null @@ -1,6 +0,0 @@ -# SPDX-FileCopyrightText: 2024 Lady -# SPDX-License-Identifier: CC0-1.0 - -0 string #!lesml LesML text -!:mime text/lesml -!:strength + 100 diff --git a/magic/lesml.magic b/magic/lesml.magic new file mode 100644 index 0000000..4d7d52e --- /dev/null +++ b/magic/lesml.magic @@ -0,0 +1,7 @@ +# @(#)💄📝 Les·M·L magic/lesml.magic 2026-03-31T01:28:11Z +# SPDX-FileCopyrightText: 2024, 2026 Lady +# SPDX-License-Identifier: CC0-1.0 + +0 string #?lesml LesML text +!:mime text/lesml +!:strength + 100 diff --git a/sh/lesml.sh b/sh/lesml.sh new file mode 100755 index 0000000..d0bedeb --- /dev/null +++ b/sh/lesml.sh @@ -0,0 +1,138 @@ +#!/usr/bin/env sh +# @(#)💄📝 Les·M·L sh/lesml.sh 2026-03-31T01:28:11Z +# SPDX-FileCopyrightText: 2023, 2024, 2026 Lady +# SPDX-License-Identifier: MPL-2.0 + +## ⁌ The 💄📝 Les·M·L shell script +## +## ∎ Copyright © 2023–2024, 2026 Lady [@ Ladys Computer]. +## +## ⋮ This Source Code Form is subject to the terms of the Mozilla +## Public License, version 2.0. +## If a copy of the M·P·L was not distributed with this file, You can +## obtain one at {🔗}. +## +## Usage :⁠— +## +## »|sh$sh ./sh/lesml.sh [--] [filename] +## +## This script provides a simple solution for converting a Les·M·L file +## into X·H·T·M·L. + +set -o 'errexit' + +## `LANG´ and `LC_ALL´ are set to `C´ because this script assumes +## working with U·T·F‐8 strings as opaque series of bytes. + +LANG=C +LC_ALL=C + +## § Configuration +## +## All of the commands used by this script are overridable with your +## own implementations by setting the corresponding `cmd_COMMAND´ +## variable. + +: "${cmd_DIRNAME:=dirname}" +: "${cmd_PRINTF:=printf}" +: "${cmd_REALPATH:=realpath}" +: "${cmd_SED:=sed}" +: "${cmd_TEST:=test}" +: "${cmd_TR:=tr}" +: "${cmd_XSLTPROC:=xsltproc}" + +## In order to run, this program needs to know the location of the +## 💄📝 Les·M·L transform, which is calculated below. +## This location can be manually configured by setting the `path_LESML´ +## or `path_TRANSFORM´ environment variables. + +thisfile="$( + "${cmd_REALPATH}" -- "${0}" +)" +thisdir="$( + "${cmd_DIRNAME}" -- "${thisfile}" +)" + +defaultlesml="$( + "${cmd_REALPATH}" -- "${thisdir}"'/..' +)" +: "${path_LESML:=${defaultlesml}}" + +defaulttransform="${path_LESML}"'/xslt/lesml.xslt' +: "${path_TRANSFORM:=${defaulttransform}}" + +## The (unregistered) media type for Les·M·L files is `text/lesml´. +## By overriding both the media type and the transform, one can re·use +## this script for processing other kinds of files with X·S·L·T. + +: "${name_MEDIATYPE:=text/lesml}" + +## § Parameter processing +## +## A leading `--´ is allowed (and ignored). +## Otherwise, the first parameter is the source filename. + +if "${cmd_TEST}" "${1}" = '--' +then : + shift +fi + +## If no input file is provided, it defaults to `-´ (standard input). + +if "${cmd_TEST}" -n "${1}" && "${cmd_TEST}" "${1}" != '-' +then : + inputfile="$( + "${cmd_REALPATH}" -- "${1}" + )" +else : + inputfile='-' +fi + +## § Implementation +## +## ※ This implementation is derived from +## {🔗⛩📰 书社} code written +## 2023–2024. +## +## The `sanitizeprog´ variable contains a Sed program for sanitizing +## an arbitrary string into characters which are valid in X·M·L. +## Ideally this would be written using dollar‐single‐quote literals, +## but Mac·O·S Dash is outdated and does not support them. + +sanitizeprog="$( + "${cmd_PRINTF}" '%b' \ + 's/]]>/]]]]>/g\n' \ + 's/\0357\0277\0276/�/g\n' \ + 's/\0357\0277\0277/�/g\n' \ + '$!s/\r$//g\n' \ + 's/\r/\\n/g\n' \ + '$!s/\0302\0205$//g\n' \ + 's/\0302\0205/\\n/g\n' \ + 's/\0342\0200\0250/\\n/g\n' \ + 's/[\0001\0002\0003\0004\0005\0006\0007\0010]/�/g\n' \ + 's/[\0016\0017\0020\0021\0022\0023\0024\0025\0026\0027]/�/g\n' \ + 's/[\0031\0032\0033\0034\0035\0036\0037]/�/g' +)" + +## A compound statement is used to group together the wrapping X·M·L +## and the sanitized input into one output which can be piped to +## `xsltproc´. + +{ + "${cmd_PRINTF}" '%s\n%s%s%s' \ + '' \ + '' +} | +"${cmd_XSLTPROC}" --nonet --nowrite --novalid \ + "${path_TRANSFORM}" \ + - diff --git a/parser.xslt b/xslt/lesml.xslt similarity index 59% rename from parser.xslt rename to xslt/lesml.xslt index e5b3bf0..9a60383 100644 --- a/parser.xslt +++ b/xslt/lesml.xslt @@ -1,39 +1,51 @@ - +⁌ The 💄📝 Les·M·L transform + +∎ Copyright © 2024–2026 Lady [@ Ladys Computer] + +⋮ This Source Code Form is subject to the terms of the Mozilla Public + License, v 2.0. +If a copy of the M·P·L was not distributed with this file, You can + obtain one at {🔗}. + +This file implements a transformation, via X·S·L·T, from an H·T·M·L + `