X-Git-Url: https://git.ladys.computer/LesML/blobdiff_plain/608a1404e5b5f27df451d8a4c50c1c356a39a8d5..e50e25b85b0c333d21c518043eca0b50f21e1b39:/README.markdown?ds=sidebyside diff --git a/README.markdown b/README.markdown index adb7e26..1979cd3 100644 --- a/README.markdown +++ b/README.markdown @@ -1,5 +1,5 @@ # 💄📝 Les·M·L @@ -28,12 +28,28 @@ It is implemented as an X·S·L·T transformation from a The first line of any 💄📝 Les·M·L document should be the string `#!lesml`. - -Following the shebang, document metadata may be provided in the [Record - Jar][draft-phillips-record-jar-01] format. +A language tag may follow this, beginning with `@` and terminated with + `$`, like so: +`#!lesml@en$`. +Regardless of whether a language tag is present, the shebang line may + be terminated by a space‐separated list of properties of the form + `key=value`. +Only one property is currently permitted: `profile`, whose value should + be a U·R·I and is translated to the `@data-lesml-profile` attribute + on the resulting `` element. + +Following the shebang line, document metadata may be provided in the + [Record Jar][draft-phillips-record-jar-01] format. The body of the document begins after the last line which begins with the string `%%`, or after the shebang line if none exists. +Multiple documents can be catenated into a single file; a new document + is begun on any line which starts with `#!lesml` or `##`. +Documents in the later case inherit the latest preceding `#!lesml` + declaration. +`##` may be followed by other text; this is treated as an interdocument + comment. + Documents are broken into paragraphs by blank lines. Empty paragraphs are ignored. Non·empty paragraphs are classified as follows :⁠— @@ -46,7 +62,6 @@ Non·empty paragraphs are classified as follows :⁠— | Character | Codepoint | Unicode Name | | --------- | --------- | ------------ | - | `#` | `U+0023` | `NUMBER SIGN` | | `*` | `U+002A` | `ASTERISK` | | `-` | `U+002D` | `HYPHEN-MINUS` | | `.` | `U+002E` | `FULL STOP` | @@ -88,11 +103,18 @@ Non·empty paragraphs are classified as follows :⁠— There is only one level of paragraph quoting; quoted paragraphs may not be quoted again. +- If every line in the paragraph begins with zero or more white·space + characters followed by `|`, it is a “preformatted” paragraph and + white·space is not collapsed (``). + A paragraph may be both quoted and preformatted. + - Otherwise, the paragraph is unquoted. After this classification, each quoted or unquoted paragraph is further classified by type based on its first character (which is must be - followed by white·space to be recognized) :⁠— + followed by white·space, or else the only thing on the line) :⁠— + +- If the paragraph is preformatted, it is an ordinary paragraph. - If the paragraph begins with `⁌`, it is a chapter heading (``). @@ -144,6 +166,10 @@ After this classification, each quoted or unquoted paragraph is further - If the paragraph begins with `⚠︎`, it is a warning note (``). +- If the paragraph begins with `#`, it is a comment. + Comments produce X·M·L comment nodes and can be used to break up list + items into separate lists. + - If the paragraph begins with `⋯`, it is a continuation paragraph (``). Continuation paragraphs may be used to continue a preceding list item @@ -153,15 +179,32 @@ After this classification, each quoted or unquoted paragraph is further - Otherwise, it is an ordinary paragraph. -Following this sigil (if any, including trailing white·space) there may - be a `¶` followed by zero or more non·white·space characters. +Following this sigil (if any) there may be a `¶` followed by zero or + more non·white·space characters. The characters following the `¶` give the identifier for the paragraph, which is expected to be unique within a document. +This may be suffixed with a language tag beginning with `@` and + terminated with `$`. The remaining characters in a paragraph form its contents. Markup within paragraphs is delimited with·out exception by pairs of characters, with the following precedence :⁠— +- The characters `⌦` and `⌫` indicate inline comments. + A single character `⌧` may be used to indicate an “empty” comment + (consisting of `U+034F COMBINING GRAPHEME JOINER` for X·M·L + compatibility). + +- The characters `{@` and `"}` indicate attribute specifications. + The attribute specification must contain at least one `="` which + separates the key of the attribute from the value. + Attributes attach to the previous element or text node, with + white·space‐only text nodes after elements ignored; if there is no + such previous element or text node, an empty text node is used + instead. + Multiple attributes can be given in sequence. + Text nodes with attributes are wrapped in ``. + - The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L (``). The hyperlink must contain at least one `<`; the content before the @@ -179,21 +222,21 @@ Markup within paragraphs is delimited with·out exception by pairs of - The characters `⸨` and `⸩` indicate parenthetical content (``). -- The characters `☞︎` and `☜︎` indicate strong importance - (``). - -- The characters `⹐` and `⹑` indicate emphasis (``). +- The characters `` ` `` and `´` indicate code (``). - The characters `⟪` and `⟫` indicate titles (``). +- The characters `⸶` and `⸷` indicate names (``). + - The characters `⟨` and `⟩` indicate offset text (``). - This may be followed by a `@`, a language tag, and a `$` to provide - the language of the text. - The characters `⦃` and `⦄` indicate keyword highlighting (``). -- The characters `` ` `` and `´` indicate code (``). +- The characters `☞︎` and `☜︎` indicate strong importance + (``). + +- The characters `⹐` and `⹑` indicate emphasis (``). Once the tree is built as above, it is remediated into its final form by the following steps :⁠— @@ -212,11 +255,15 @@ Once the tree is built as above, it is remediated into its final form - Successive list items of the same level and class are joined into a single list. +- Linebreaks in preformatted paragraphs are replaced with ``. + Finally, any character can be escaped by instead providing its Unicode - codepoint in the form ``, where `NNNN` is one or more + codepoint in the form `{U+NNNN}`, where `NNNN` is one or more hexadecimal digits. Multiple codepoints may be provided separated by periods, as in - `` + `{U+WWWW.ZZZZ}`. +Due to limitations in X·S·L·T, characters cannot be escaped in + attributes (including link targets). ## Usage