]> Lady’s Gitweb - LesML/blob - README.markdown
Increase the number of section‐break characters
[LesML] / README.markdown
1 <!--
2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
4 -->
5 # 💄📝 Les·M·L
6
7 <b>Ladys simple markup language.</b>
8
9 💄📝 Les·M·L is a document markup language designed with two goals in
10 mind :⁠—
11
12 1. It must be trivial to parse, even with limited tooling such as that
13 provided by X·S·L·T.
14
15 2. It must be sophisticated enough to handle longform hypertext
16 documents and associated metadata.
17
18 It is implemented as an X·S·L·T transformation from a
19 `<html:script type="text/lesml">` element into H·T·M·L
20 (`parser.xslt`).
21
22 ## Nomenclature
23
24 <i>Les·M·L</i> is an abbreviation of the phrase “Ladys Extremely Simple
25 Markup Language”.
26
27 ## Markup Syntax
28
29 The first line of any 💄📝 Les·M·L document should be the string
30 `#!lesml`.
31
32 Following the shebang, document metadata may be provided in the [Record
33 Jar][draft-phillips-record-jar-01] format.
34 The body of the document begins after the last line which begins with
35 the string `%%`, or after the shebang line if none exists.
36
37 Documents are broken into paragraphs by blank lines.
38 Empty paragraphs are ignored.
39 Non·empty paragraphs are classified as follows :⁠—
40
41 - If the paragraph consists of only the following section‐break
42 characters, plus any amount of white·space, then it is
43 considered to be a section break (`<html:hr>`).
44
45 The section break characters are :⁠—
46
47 | Character | Codepoint | Unicode Name |
48 | --------- | --------- | ------------ |
49 | `#` | `U+0023` | `NUMBER SIGN` |
50 | `*` | `U+002A` | `ASTERISK` |
51 | `-` | `U+002D` | `HYPHEN-MINUS` |
52 | `.` | `U+002E` | `FULL STOP` |
53 | `=` | `U+003D` | `EQUALS SIGN` |
54 | `_` | `U+005F` | `LOW LINE` |
55 | `~` | `U+007E` | `TILDE` |
56 | `·` | `U+00B7` | `MIDDLE DOT` |
57 | `․` | `U+2024` | `ONE DOT LEADER` |
58 | `‥` | `U+2025` | `TWO DOT LEADER` |
59 | `…` | `U+2026` | `HORIZONTAL ELLIPSIS` |
60 | `⁂` | `U+2042` | `ASTERISM` |
61 | `⋯` | `U+22EF` | `MIDLINE HORIZONTAL ELLIPSIS` |
62 | `─` | `U+2500` | `BOX DRAWINGS LIGHT HORIZONTAL` |
63 | `━` | `U+2501` | `BOX DRAWINGS HEAVY HORIZONTAL` |
64 | `┄` | `U+2504` | `BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL` |
65 | `┅` | `U+2505` | `BOX DRAWINGS HEAVY TRIPLE DASH HORIZONTAL` |
66 | `┈` | `U+2508` | `BOX DRAWINGS LIGHT QUADRUPLE DASH HORIZONTAL` |
67 | `┉` | `U+2509` | `BOX DRAWINGS HEAVY QUADRUPLE DASH HORIZONTAL` |
68 | `╌` | `U+254C` | `BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL` |
69 | `╍` | `U+254D` | `BOX DRAWINGS HEAVY DOUBLE DASH HORIZONTAL` |
70 | `═` | `U+2550` | `BOX DRAWINGS DOUBLE HORIZONTAL` |
71 | `╴` | `U+2574` | `BOX DRAWINGS LIGHT LEFT` |
72 | `╶` | `U+2576` | `BOX DRAWINGS LIGHT RIGHT` |
73 | `╸` | `U+2578` | `BOX DRAWINGS HEAVY LEFT` |
74 | `╺` | `U+257A` | `BOX DRAWINGS HEAVY RIGHT` |
75 | `☙` | `U+2619` | `REVERSED ROTATED FLORAL HEART BULLET` |
76 | `❧` | `U+2767` | `ROTATED FLORAL HEART BULLET` |
77 | ` ` | `U+3000` | `IDEOGRAPHIC SPACE` |
78 | `・` | `U+30FB` | `KATAKANA MIDDLE DOT` |
79 | `*` | `U+FF0A` | `FULLWIDTH ASTERISK` |
80 | `-` | `U+FF0D` | `FULLWIDTH HYPHEN-MINUS` |
81 | `.` | `U+FF0E` | `FULLWIDTH FULL STOP` |
82 | `=` | `U+FF1D` | `FULLWIDTH EQUALS SIGN` |
83 | `_` | `U+FF3F` | `FULLWIDTH LOW LINE` |
84 | `~` | `U+FF5E` | `FULLWIDTH TILDE` |
85
86 - If every line in the paragraph begins with at least one space, then
87 it is considered to be a quoted paragraph (`<html:blockquote>`).
88 There is only one level of paragraph quoting; quoted paragraphs may
89 not be quoted again.
90
91 - Otherwise, the paragraph is unquoted.
92
93 After this classification, each quoted or unquoted paragraph is further
94 classified by type based on its first character (which is must be
95 followed by white·space to be recognized) :⁠—
96
97 - If the paragraph begins with `⁌`, it is a chapter heading
98 (`<html:h1>`).
99
100 - If the paragraph begins with `§`, it is a section heading
101 (`<html:h2>`).
102
103 - If the paragraph begins with `❦`, it is a subsection heading
104 (`<html:h3>`).
105
106 - If the paragraph begins with `✠`, it is a subsubsection heading
107 (`<html:h4>`).
108
109 - If the paragraph begins with `•` or `🔢`, it is a primary unordered
110 or ordered list item (`<html:li class="unordered" data-level="1">`
111 or `<html:li class="ordered" data-level="1">`).
112
113 - If the paragraph begins with `◦` or `🔠`, it is a secondary unordered
114 or ordered list item (`<html:li class="unordered" data-level="2">`
115 or `<html:li class="ordered" data-level="2">`).
116 Secondary list items are considered to be nested inside of primary
117 list items which precede them.
118
119 - If the paragraph begins with `▪` or `🔡`, it is a tertiary unordered
120 or ordered list item (`<html:li class="unordered" data-level="3">`
121 or `<html:li class="ordered" data-level="3">`).
122 Tertiary list items are considered to be nested inside of primary
123 and secondary list items which precede them.
124
125 - If the paragraph begins with `⁃` or `🔣`, it is a quaternary
126 unordered or ordered list item
127 (`<html:li class="unordered" data-level="4">` or
128 `<html:li class="ordered" data-level="4">`).
129 Quaternary list items are considered to be nested inside of primary,
130 secondary, and tertiary list items which precede them.
131
132 - If the paragraph begins with `※`, it is an ordinary note
133 (`<html:div role="note" class="note">`).
134
135 - If the paragraph begins with `☡`, it is a cautionary note
136 (`<html:div role="note" class="caution">`).
137
138 - If the paragraph begins with `🛈`, it is an informative note
139 (`<html:div role="note" class="info">`).
140
141 - If the paragraph begins with `⯑`, it is a questioning note
142 (`<html:div role="note" class="query">`).
143
144 - If the paragraph begins with `⚠︎`, it is a warning note
145 (`<html:div role="note" class="warn">`).
146
147 - If the paragraph begins with `⋯`, it is a continuation paragraph
148 (`<html:div class="continuation">`).
149 Continuation paragraphs may be used to continue a preceding list item
150 or quote.
151 Note, however, that an unquoted paragraph cannot continue a quoted
152 one, or vice·versa.
153
154 - Otherwise, it is an ordinary paragraph.
155
156 Following this sigil (if any, including trailing white·space) there may
157 be a `¶` followed by zero or more non·white·space characters.
158 The characters following the `¶` give the identifier for the paragraph,
159 which is expected to be unique within a document.
160
161 The remaining characters in a paragraph form its contents.
162 Markup within paragraphs is delimited with·out exception by pairs of
163 characters, with the following precedence :⁠—
164
165 - The characters `{🔗` and `>}` indicate a hyperlink to a U·R·L
166 (`<html:a>`).
167 The hyperlink must contain at least one `<`; the content before the
168 last `<` gives the text of the link, and the content after gives
169 the U·R·L that the link points to.
170 If no text is given, the U·R·L will be used instead.
171
172 - The characters `⸠` and `⸡` indicate a strikethru (`<html:s>`).
173
174 - The characters `⸤` and `⸥` indicate underlining (`<html:u>`).
175
176 - The characters `⟦` and `⟧` indicate an inline note
177 (`<html:small role="note">`).
178
179 - The characters `⸨` and `⸩` indicate parenthetical content
180 (`<html:small>`).
181
182 - The characters `☞︎` and `☜︎` indicate strong importance
183 (`<html:strong>`).
184
185 - The characters `⹐` and `⹑` indicate emphasis (`<html:em>`).
186
187 - The characters `⟪` and `⟫` indicate titles (`<html:cite>`).
188
189 - The characters `⟨` and `⟩` indicate offset text (`<html:i>`).
190 This may be followed by a `@`, a language tag, and a `$` to provide
191 the language of the text.
192
193 - The characters `⦃` and `⦄` indicate keyword highlighting
194 (`<html:b>`).
195
196 - The characters `` ` `` and `´` indicate code (`<html:code>`).
197
198 Once the tree is built as above, it is remediated into its final form
199 by the following steps :⁠—
200
201 - Successive quoted paragraphs are joined into one quote.
202 If the final quoted paragraph is an ordinary paragraph which begins
203 with `—` and a space, the quote is wrapped in a `<html:figure>`
204 and the final paragraph becomes its `<html:figcaption>`.
205
206 - Continuation paragraphs are joined with the preceding list items or
207 quotes.
208
209 - List items of a higher level are nested in preceding list items, when
210 present.
211
212 - Successive list items of the same level and class are joined into
213 a single list.
214
215 Finally, any character can be escaped by instead providing its Unicode
216 codepoint in the form `<U+NNNN>`, where `NNNN` is one or more
217 hexadecimal digits.
218 Multiple codepoints may be provided separated by periods, as in
219 `<U+WWWW.ZZZZ>`
220
221 ## Usage
222
223 💄📝 Les·M·L is designed for usage with [⛩📰 书社][Shushe].
224 Simply include the `parser.xslt` provided by this repository to
225 ⛩📰 书社 as an additional parser, and `magic` as an additional
226 magic file.
227
228 ## License
229
230 This repository conforms to [REUSE][].
231
232 The parser is licensed under the terms of the <cite>Mozilla Public
233 License, version 2.0</cite>.
234
235 [REUSE]: <https://reuse.software/spec/>
236 [Shushe]: <https://git.ladys.computer/Shushe/>
237 [draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>
This page took 0.060249 seconds and 5 git commands to generate.