Lady [Tue, 31 Mar 2026 01:28:11 +0000 (21:28 -0400)]
New block behaviour and repository layout
This commit makes extremely breaking changes to the block‐level
semantics of the language, effectively rewriting those aspects of
parsing from scratch.
At the same time, it restructures the repository and adds test cases.
Commits (and tags) prior to this commit have been redone with
Les·M·L‐format commit messages, and the repository now uses Sha·256
hashes rather than Sha·1.
One could consider this a “version 2” of the Les·M·L format, but it is
the softest of reboots:
Most files will continue to produce more·or·less exactly the same
result after only a replacement of `#!lesml´ with `#?lesml´.
(There are some negligible differences in whitespace handling.)
The major cases where this is ⹐not⹑ true are those of blockquotes,
footers, and continuations.
The concepts of bracketed and quoted paragraphs have been removed and
footers and blockquotes are now ordinary block types alongside
sections and list items.
Blocks now have a “level”, indicated by the number of leading `⋮´
characters, and blocks of higher level are nested inside preceding
ones of lower level.
This makes the existing continuation behaviour more consistent and
makes the language more flexible.
The character `⋮´ is used instead of `⋯´ because it is semantically
more accurate and the latter can also make up horizontal rules.
Multiple block sigils can now be used together to create deeply nested
block structures that an equal number of continuation symbols are
used to continue.
For example, a leading `» • •´ introduces a list inside a list inside a
blockquote, and `⋮ ⋮ •´ introduces a second list item in the nested
list.
The higher‐level unordered list characters `◦▪⁃´ are supported only as
the first block sigils after any continuation symbols.
They are simply a shorthand for `⋮ •´, `⋮ ⋮ •´, and `⋮ ⋮ ⋮ •´,
respectively.
The ordered list characters have been replaced by a single `№´.
Preformatted text requires the same prefix on each line.
The prefix, which begins with any continuation characters or block
sigils, can end in one of three ways :—
№ With a pipe, a syntax name, and a dollar sign, such as `|sh$´.
The syntax name can consist only of Ascii lowercase letters, numbers,
slashes, dots, and hyphen dashes and identifies the preformatted text
as code in the corresponding language.
The semantics of syntax names are left undefined and up to profiles or
applications to determine.
№ With a pipe and a dollar sign with nothing inbetween.
In this case, the preformatted text is code of an unspecified syntax.
№ With a pipe only.
In this case, the text is preformatted, but not code.
Footnote references now have the format `[^fn_id]´ and the
corresponding footnotes `^¶fn_id´.
“Footnotes” which are never referenced in a document are now dropped,
meaning that they can be used to fully remove text from the output
(unlike most forms of commenting, which produce X·M·L comments).
The character `*´ was considered over `^´, but it has too many
additional meanings in English text.
The character introducing footers has been changed from `]´ to `∎´.
The character introducing abstracts has been changed from `@´ to `∫´.
A new “tip” section has been introduced, with sigil `💡´.
Whitespace is no longer ignored before attribute specifications.
In general, more whitespace is preserved literally.
The hope at this point is that the Les·M·L format is fairly stable, and
no major backwards‐incompatible changes will be made (excepting the
minor backwards incompatibility which is inherent in adding new
features to a syntax which already accepts all inputs with·out
complaint).
A couple more features are anticipated, but “Version 1.0.0” will
probably come soon.
Lady [Sat, 20 Sep 2025 04:51:30 +0000 (00:51 -0400)]
Improve/adjust note/sectioning elements
• Notes are now `<section>´s, not `<div>´s.
• A new “abstract” section type has been added.
• Sections can now contain list items (with a level > 1).
• The `LesML:finalize-footnotes´ stage has been repurposed into a more
general `LesML:finalize´ stage, in part to facilitate the above.
• Paragraph i·d¦s are now passed upwards to the containing element in
all cases, when applied to the first paragraph in a container.
To literally assign an i·d to the first paragraph, add an empty initial
paragraph (just a `¶´), which will be dropped.
The same is true for language tags.
It would be nice to be able to specify titles/labels for sections, but
there isn¦t a good mechanism for doing this as of now.
The heading sigils are probably inappropriate for this, as headings in
sections are conceptually singular and “unleveled”.
There is no need for four heading types when a section can only
contain one heading and it must always be the first paragraph.
But on the H·T·M·L side, section headings ⹐should⹑ have appropriate
levels corresponding to their position in the document.
So this would need to be remediated somehow.
Lady [Wed, 17 Sep 2025 02:57:28 +0000 (22:57 -0400)]
Support attributes inside of links
This commit changes the partitioning behaviour to allow for element
nodes within the first part, assuming that the first part is not
being restricted to an n·c·name.
This is required to enable attribute specifiers inside of the text of
links.
Element nodes are still not allowed in the second part of the
partition (the link target or attribute value).
Lady [Wed, 17 Sep 2025 02:55:22 +0000 (22:55 -0400)]
Fix note divs
The parser used to erroneously discard all divs, including semantic
ones (i·e those with a `@role´).
Also, it used to require a variation selector after `⚠´, which didn¦t
actually work with the implementation and should not have been
required.
Lady [Mon, 28 Apr 2025 01:55:06 +0000 (21:55 -0400)]
Support hgroups with heading continuations
These are a bit interesting because continuation paragraphs can be
placed ⹐before⹑ the heading they “continue” (not just after).
A block comment can be used to separate them from a preceding list or
similar.
Lady [Mon, 28 Apr 2025 00:42:16 +0000 (20:42 -0400)]
Quotes and brackets as multiparagraph divisions
This commit introduces a new concept of multiparagraph divisions (where
the lines within are re·analysed after the prefix is removed) and
transitions block quotations to it.
I resisted this for a long while because it¦s less “simple”, but the
precedent of preformatted text and the old quotation mechanism
(which was kind of messy) was already conceptually close.
And footer support really requires this.
Lady [Sun, 27 Apr 2025 05:21:13 +0000 (01:21 -0400)]
Drop langtags on offset text but support on paras
Now that language attributes can be arbitrarily added to any inline,
a special syntax for them doesn’t make much sense.
However, that syntax would be useful on block‐level elements.
There isn¦t a real technical reason for the explicit `$´ end of the
language tag, since the following whitespace could have been used to
end it also, but it makes the syntax a bit more obvious, matches the
existing format with the shebangs, and is less likely to conflict
with i·d syntaxes (which might include `@´ but probably won¦t also
end in a `$´).
You can still make i·d¦s like `@es$´ if you need by explicitly
language‐tagging: `¶@es$@en$´.
This commit also allows block sigils to not be followed by a space if
they are immediately followed by a pilcrow.
Lady [Sun, 27 Apr 2025 04:31:53 +0000 (00:31 -0400)]
Support attributes
This required a reworking of the link parsing code to enable it to also
be used for attribute parsing (more‐or‐less), and maybe also fixed a
bug where, if there was an end token with no start token, no further
end tokens would be processed (even later ones they did have start
tokens).
Lady [Sun, 27 Apr 2025 04:26:14 +0000 (00:26 -0400)]
Support inline comments
Comments work different from other inlines and have their own
processing rules.
“Empty” comments are useful primarily to break up spans of text into
discrete text nodes.
Lady [Sat, 26 Apr 2025 03:24:11 +0000 (23:24 -0400)]
Wrap results in a div
Even in the case where there is only one document, there may be
additional document comments which should be grouped alongside it and
not intermixed with any comments in the surrounding area.
For the case when there is only one document and nothing else, not
even a comment, it is probably better to still wrap the output for
consistency.
Lady [Sat, 26 Apr 2025 03:16:03 +0000 (23:16 -0400)]
Drop empty shebang documents
Previously, a shebang forced a document to be output, even if it was
empty.
However, this conflicts with the ability to add document comments at
the beginning of a document where the shebang is serving as a magic
number (for example, copyright comments):
This would create two documents, one for the shebang and a second one
for the final comment.
Now, empty documents will be dropped regardless of whether they use the
shebang or comment syntax.
Lady [Sat, 22 Mar 2025 01:45:20 +0000 (21:45 -0400)]
Enable nested tags of the same kind
Previously, the processing rules do not allow nesting an element inside
of itself: ‹ ☞︎nested ☞︎tags☜︎ like ☞︎this☜︎☜︎ › do not work as
expected.
This design choice was to (at least) appropriately handle
‹ ☞︎⟨this☜︎ weird ⟨case☜︎⟩ ›, where each inline should stop at the
first delimiter.
Correct processing starts by looking for end sigils first, not start
sigils, backtracking to the last start sigil which precedes it,
wrapping that text, and then reprocessing the entire set of nodes
until no more end sigils with matching start sigils are found.
This commit implements that behaviour, which is of course a fair bit
more complicated but should improve the results.
It also changes specifying characters by Unicode codepoint to use
curly braces rather than angle brackets, as the latter conflicted
with the angle brackets used in links.
Specifying by Unicode codepoint still isn¦t supported in links, but
the behaviour should be less surprising.
The old Unicode codepoint behaviour was probably broken also, in the
case where the codepoint was not the first character in a paragraph,
but it is fixed now.
Lady [Sat, 19 Oct 2024 17:11:12 +0000 (13:11 -0400)]
Refactor initial chunking to be line‐based
The old parsing mechanism operated primarily on large string chunks,
which were re‐parsed into lines potentially multiple times.
This refactor changes the parsing to use lines and ranges, which is a
little more verbose/complicated from an X·Path perspective (it
requires a lot of `generate-id()´ comparisons) but hopefully, on the
whole, better.
Lady [Sat, 19 Oct 2024 16:43:33 +0000 (12:43 -0400)]
Support language tag and profile
This commit provides initial support for language‐tagged Les·M·L
documents and additional document properties.
Only one property is supported: `profile´.
Language tags are themselves internally treated as properties whose
key contains spaces; property keys cannot ordinarily contain spaces
so there is no concern for confusion.
Lady [Sun, 12 May 2024 07:03:30 +0000 (03:03 -0400)]
Switch symbols for subsection and subsubsection
`❦´ is a stronger symbol than `✠´; its corresponding directional
fences `❧´ and `☙´ indicate section boundaries, while `⹐´ and `⹑´
simply enclose emphasis.