Drop langtags on offset text but support on paras

[LesML] / README.markdown
diff --git a/README.markdown b/README.markdown

index 2901a5d2a727f574dae63254416971dfb138597c..1979cd30517bf5bec7b8b027ed9a493c543038c8 100644 (file)
--- a/README.markdown
+++ b/README.markdown
@@ -1,5 +1,5 @@
  <!--
-SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
+SPDX-FileCopyrightText: 2024, 2025 Lady <https://www.ladys.computer/about/#lady>
  SPDX-License-Identifier: CC0-1.0
  -->
  # 💄📝 Les·M·L
@@ -103,11 +103,18 @@ Non·empty paragraphs are classified as follows :⁠—
    There is only one level of paragraph quoting; quoted paragraphs may
      not be quoted again.
  
+- If every line in the paragraph begins with zero or more white·space
+    characters followed by `|`, it is a “preformatted” paragraph and
+    white·space is not collapsed (`<html:pre>`).
+  A paragraph may be both quoted and preformatted.
+
  - Otherwise, the paragraph is unquoted.
  
  After this classification, each quoted or unquoted paragraph is further
    classified by type based on its first character (which is must be
-   followed by white·space to be recognized) :⁠—
+   followed by white·space, or else the only thing on the line) :⁠—
+
+- If the paragraph is preformatted, it is an ordinary paragraph.
  
  - If the paragraph begins with `⁌`, it is a chapter heading
      (`<html:h1>`).
@@ -172,15 +179,32 @@ After this classification, each quoted or unquoted paragraph is further
  
  - Otherwise, it is an ordinary paragraph.
  
-Following this sigil (if any, including trailing white·space) there may
-  be a `¶` followed by zero or more non·white·space characters.
+Following this sigil (if any) there may be a `¶` followed by zero or
+  more non·white·space characters.
  The characters following the `¶` give the identifier for the paragraph,
    which is expected to be unique within a document.
+This may be suffixed with a language tag beginning with `@` and
+  terminated with `$`.
  
  The remaining characters in a paragraph form its contents.
  Markup within paragraphs is delimited with·out exception by pairs of
    characters, with the following precedence :⁠—
  
+- The characters `⌦` and `⌫` indicate inline comments.
+  A single character `⌧` may be used to indicate an “empty” comment
+    (consisting of `U+034F COMBINING GRAPHEME JOINER` for X·M·L
+    compatibility).
+
+- The characters `{@` and `"}` indicate attribute specifications.
+  The attribute specification must contain at least one `="` which
+    separates the key of the attribute from the value.
+  Attributes attach to the previous element or text node, with
+    white·space‐only text nodes after elements ignored; if there is no
+    such previous element or text node, an empty text node is used
+    instead.
+  Multiple attributes can be given in sequence.
+  Text nodes with attributes are wrapped in `<html:span>`.
+
  - The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L
      (`<html:a>`).
    The hyperlink must contain at least one `<`; the content before the
@@ -198,21 +222,21 @@ Markup within paragraphs is delimited with·out exception by pairs of
  - The characters `⸨` and `⸩` indicate parenthetical content
      (`<html:small>`).
  
-- The characters `☞︎` and `☜︎` indicate strong importance
-    (`<html:strong>`).
-
-- The characters `⹐` and `⹑` indicate emphasis (`<html:em>`).
+- The characters `` ` `` and `´` indicate code (`<html:code>`).
  
  - The characters `⟪` and `⟫` indicate titles (`<html:cite>`).
  
+- The characters `⸶` and `⸷` indicate names (`<html:u class="name">`).
+
  - The characters `⟨` and `⟩` indicate offset text (`<html:i>`).
-  This may be followed by a `@`, a language tag, and a `$` to provide
-    the language of the text.
  
  - The characters `⦃` and `⦄` indicate keyword highlighting
      (`<html:b>`).
  
-- The characters `` ` `` and `´` indicate code (`<html:code>`).
+- The characters `☞︎` and `☜︎` indicate strong importance
+    (`<html:strong>`).
+
+- The characters `⹐` and `⹑` indicate emphasis (`<html:em>`).
  
  Once the tree is built as above, it is remediated into its final form
    by the following steps :⁠—
@@ -231,11 +255,15 @@ Once the tree is built as above, it is remediated into its final form
  - Successive list items of the same level and class are joined into
      a single list.
  
+- Linebreaks in preformatted paragraphs are replaced with `<html:br>`.
+
  Finally, any character can be escaped by instead providing its Unicode
-  codepoint in the form `<U+NNNN>`, where `NNNN` is one or more
+  codepoint in the form `{U+NNNN}`, where `NNNN` is one or more
    hexadecimal digits.
  Multiple codepoints may be provided separated by periods, as in
-  `<U+WWWW.ZZZZ>`
+  `{U+WWWW.ZZZZ}`.
+Due to limitations in X·S·L·T, characters cannot be escaped in
+  attributes (including link targets).
  
  ## Usage