Documents are broken into paragraphs by blank lines.
Empty paragraphs are ignored.
+
+If every line in the paragraph begins with (optional white·space
+ followed by) `»` it is quoted (`<html:blockquote>`); if every line
+ begins with `]` it is bracketed.
+The lines, minus this leading, are then re‐analysed.
+Bracketed paragraphs which end quotes are treated as captions
+ (`<html:figcaption>`); otherwise, they are footers (`<html:footer>`).
+
Non·empty paragraphs are classified as follows :—
- If the paragraph consists of only the following section‐break
| `_` | `U+FF3F` | `FULLWIDTH LOW LINE` |
| `~` | `U+FF5E` | `FULLWIDTH TILDE` |
-- If every line in the paragraph begins with at least one space, then
- it is considered to be a quoted paragraph (`<html:blockquote>`).
- There is only one level of paragraph quoting; quoted paragraphs may
- not be quoted again.
-
- If every line in the paragraph begins with zero or more white·space
characters followed by `|`, it is a “preformatted” paragraph and
white·space is not collapsed (`<html:pre>`).
- A paragraph may be both quoted and preformatted.
-- Otherwise, the paragraph is unquoted.
+- Otherwise, the paragraph is ordinary.
-After this classification, each quoted or unquoted paragraph is further
+After this classification, each ordinary paragraph is further
classified by type based on its first character (which is must be
- followed by white·space, or else the only thing on the line) :—
+ followed by white·space, a pilcrow, or else the only thing on the
+ line) :—
- If the paragraph is preformatted, it is an ordinary paragraph.
(`<html:h4>`).
- If the paragraph begins with `•` or `🔢`, it is a primary unordered
- or ordered list item (`<html:li class="unordered" data-level="1">`
- or `<html:li class="ordered" data-level="1">`).
+ or ordered list item (`<html:li class="unordered" aria-level="1">`
+ or `<html:li class="ordered" aria-level="1">`).
- If the paragraph begins with `◦` or `🔠`, it is a secondary unordered
- or ordered list item (`<html:li class="unordered" data-level="2">`
- or `<html:li class="ordered" data-level="2">`).
+ or ordered list item (`<html:li class="unordered" aria-level="2">`
+ or `<html:li class="ordered" aria-level="2">`).
Secondary list items are considered to be nested inside of primary
list items which precede them.
- If the paragraph begins with `▪` or `🔡`, it is a tertiary unordered
- or ordered list item (`<html:li class="unordered" data-level="3">`
- or `<html:li class="ordered" data-level="3">`).
+ or ordered list item (`<html:li class="unordered" aria-level="3">`
+ or `<html:li class="ordered" aria-level="3">`).
Tertiary list items are considered to be nested inside of primary
and secondary list items which precede them.
- If the paragraph begins with `⁃` or `🔣`, it is a quaternary
unordered or ordered list item
- (`<html:li class="unordered" data-level="4">` or
- `<html:li class="ordered" data-level="4">`).
+ (`<html:li class="unordered" aria-level="4">` or
+ `<html:li class="ordered" aria-level="4">`).
Quaternary list items are considered to be nested inside of primary,
secondary, and tertiary list items which precede them.
Comments produce X·M·L comment nodes and can be used to break up list
items into separate lists.
-- If the paragraph begins with `⋯`, it is a continuation paragraph
- (`<html:div class="continuation">`).
- Continuation paragraphs may be used to continue a preceding list item
- or quote.
- Note, however, that an unquoted paragraph cannot continue a quoted
- one, or vice·versa.
+- If the paragraph begins with `⋯`, it is a continuation paragraph.
+ Continuation paragraphs may be used to continue a preceding div or
+ list item.
+ If there is no such preceding div or list item, they will attach to
+ adjacent heading elements to form heading groups (`<html:hgroup>`).
+ Otherwise, they will be treated as ordinary paragraphs.
- Otherwise, it is an ordinary paragraph.
white·space‐only text nodes after elements ignored; if there is no
such previous element or text node, an empty text node is used
instead.
- Multiple attributes can be given in sequence.
+ Multiple attributes can be given in sequence using multiple
+ specifications.
Text nodes with attributes are wrapped in `<html:span>`.
- The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L
Once the tree is built as above, it is remediated into its final form
by the following steps :—
-- Successive quoted paragraphs are joined into one quote.
- If the final quoted paragraph is an ordinary paragraph which begins
- with `—` and a space, the quote is wrapped in a `<html:figure>`
- and the final paragraph becomes its `<html:figcaption>`.
-
- Continuation paragraphs are joined with the preceding list items or
- quotes.
+ divs.
- List items of a higher level are nested in preceding list items, when
present.