Lady’s Gitweb - LesML/log

Add some documentation to the transform

New block behaviour and repository layout

This commit makes extremely breaking changes to the block‐level
  semantics of the language, effectively rewriting those aspects of
  parsing from scratch.
At the same time, it restructures the repository and adds test cases.
Commits (and tags) prior to this commit have been redone with
  Les·M·L‐format commit messages, and the repository now uses Sha·256
  hashes rather than Sha·1.

One could consider this a “version 2” of the Les·M·L format, but it is
  the softest of reboots:
Most files will continue to produce more·or·less exactly the same
  result after only a replacement of `#!lesml´ with `#?lesml´.
(There are some negligible differences in whitespace handling.)

The major cases where this is ⹐not⹑ true are those of blockquotes,
  footers, and continuations.
The concepts of bracketed and quoted paragraphs have been removed and
  footers and blockquotes are now ordinary block types alongside
  sections and list items.
Blocks now have a “level”, indicated by the number of leading `⋮´
  characters, and blocks of higher level are nested inside preceding
  ones of lower level.
This makes the existing continuation behaviour more consistent and
  makes the language more flexible.
The character `⋮´ is used instead of `⋯´ because it is semantically
  more accurate and the latter can also make up horizontal rules.

Multiple block sigils can now be used together to create deeply nested
  block structures that an equal number of continuation symbols are
  used to continue.
For example, a leading `» • •´ introduces a list inside a list inside a
  blockquote, and `⋮ ⋮ •´ introduces a second list item in the nested
  list.

The higher‐level unordered list characters `◦▪⁃´ are supported only as
  the first block sigils after any continuation symbols.
They are simply a shorthand for `⋮ •´, `⋮ ⋮ •´, and `⋮ ⋮ ⋮ •´,
  respectively.
The ordered list characters have been replaced by a single `№´.

Preformatted text requires the same prefix on each line.
The prefix, which begins with any continuation characters or block
  sigils, can end in one of three ways :⁠—

№ With a pipe, a syntax name, and a dollar sign, such as `|sh$´.
The syntax name can consist only of Ascii lowercase letters, numbers,
  slashes, dots, and hyphen dashes and identifies the preformatted text
  as code in the corresponding language.
The semantics of syntax names are left undefined and up to profiles or
  applications to determine.

№ With a pipe and a dollar sign with nothing inbetween.
In this case, the preformatted text is code of an unspecified syntax.

№ With a pipe only.
In this case, the text is preformatted, but not code.

Footnote references now have the format `[^fn_id]´ and the
  corresponding footnotes `^¶fn_id´.
“Footnotes” which are never referenced in a document are now dropped,
  meaning that they can be used to fully remove text from the output
  (unlike most forms of commenting, which produce X·M·L comments).
The character `*´ was considered over `^´, but it has too many
  additional meanings in English text.

The character introducing footers has been changed from `]´ to `∎´.
The character introducing abstracts has been changed from `@´ to `∫´.
A new “tip” section has been introduced, with sigil `💡´.

Whitespace is no longer ignored before attribute specifications.
In general, more whitespace is preserved literally.

The hope at this point is that the Les·M·L format is fairly stable, and
  no major backwards‐incompatible changes will be made (excepting the
  minor backwards incompatibility which is inherent in adding new
  features to a syntax which already accepts all inputs with·out
  complaint).
A couple more features are anticipated, but “Version 1.0.0” will
  probably come soon.

Switch back to @data-level

Per {🔗<https://github.com/w3c/aria/issues/1369>}, `@aria-level´ has
been deprecated on list items.

Make LesML:split a function

This simplifies the implementation a bit.

Improve/adjust note/sectioning elements

• Notes are now `<section>´s, not `<div>´s.

• A new “abstract” section type has been added.

• Sections can now contain list items (with a level > 1).

• The `LesML:finalize-footnotes´ stage has been repurposed into a more
  general `LesML:finalize´ stage, in part to facilitate the above.

• Paragraph i·d¦s are now passed upwards to the containing element in
  all cases, when applied to the first paragraph in a container.
To literally assign an i·d to the first paragraph, add an empty initial
  paragraph (just a `¶´), which will be dropped.
The same is true for language tags.

It would be nice to be able to specify titles/labels for sections, but
  there isn¦t a good mechanism for doing this as of now.
The heading sigils are probably inappropriate for this, as headings in
  sections are conceptually singular and “unleveled”.
There is no need for four heading types when a section can only
  contain one heading and it must always be the first paragraph.

But on the H·T·M·L side, section headings ⹐should⹑ have appropriate
  levels corresponding to their position in the document.
So this would need to be remediated somehow.

Add support for footnotes

Use @aria-level instead of @data-level

Support attributes inside of links

This commit changes the partitioning behaviour to allow for element
  nodes within the first part, assuming that the first part is not
  being restricted to an n·c·name.
This is required to enable attribute specifiers inside of the text of
  links.
Element nodes are still not allowed in the second part of the
  partition (the link target or attribute value).

Fix note divs

The parser used to erroneously discard all divs, including semantic
  ones (i·e those with a `@role´).
Also, it used to require a variation selector after `⚠´, which didn¦t
  actually work with the implementation and should not have been
  required.

Support hgroups with heading continuations

These are a bit interesting because continuation paragraphs can be
placed ⹐before⹑ the heading they “continue” (not just after).
A block comment can be used to separate them from a preceding list or
similar.

Quotes and brackets as multiparagraph divisions

This commit introduces a new concept of multiparagraph divisions (where
  the lines within are re·analysed after the prefix is removed) and
  transitions block quotations to it.
I resisted this for a long while because it¦s less “simple”, but the
  precedent of preformatted text and the old quotation mechanism
  (which was kind of messy) was already conceptually close.
And footer support really requires this.

Properly escape all comments

Drop langtags on offset text but support on paras

Now that language attributes can be arbitrarily added to any inline,
  a special syntax for them doesn’t make much sense.
However, that syntax would be useful on block‐level elements.

There isn¦t a real technical reason for the explicit `$´ end of the
  language tag, since the following whitespace could have been used to
  end it also, but it makes the syntax a bit more obvious, matches the
  existing format with the shebangs, and is less likely to conflict
  with i·d syntaxes (which might include `@´ but probably won¦t also
  end in a `$´).

You can still make i·d¦s like `@es$´ if you need by explicitly
  language‐tagging: `¶@es$@en$´.

This commit also allows block sigils to not be followed by a space if
  they are immediately followed by a pilcrow.

Support attributes

This required a reworking of the link parsing code to enable it to also
  be used for attribute parsing (more‐or‐less), and maybe also fixed a
  bug where, if there was an end token with no start token, no further
  end tokens would be processed (even later ones they did have start
  tokens).

Support inline comments

Comments work different from other inlines and have their own
processing rules.
“Empty” comments are useful primarily to break up spans of text into
discrete text nodes.

Add support for proper name marks

Wrap results in a div

Even in the case where there is only one document, there may be
  additional document comments which should be grouped alongside it and
  not intermixed with any comments in the surrounding area.
For the case when there is only one document and nothing else, not
  even a comment, it is probably better to still wrap the output for
  consistency.

Drop empty shebang documents

Previously, a shebang forced a document to be output, even if it was
  empty.
However, this conflicts with the ability to add document comments at
  the beginning of a document where the shebang is serving as a magic
  number (for example, copyright comments):
This would create two documents, one for the shebang and a second one
  for the final comment.
Now, empty documents will be dropped regardless of whether they use the
  shebang or comment syntax.

Use set functions rather than generate-id()

These should be faster, and are generally cleaner and more readable
besides.

Don¦t use html: prefix

This is unnecessary and makes the resulting files a lot larger.

Adjust order of precedence for inlines

I think this ordering makes a little bit more sense.

Enable nested tags of the same kind

Previously, the processing rules do not allow nesting an element inside
  of itself: ‹ ☞︎nested ☞︎tags☜︎ like ☞︎this☜︎☜︎ › do not work as
  expected.
This design choice was to (at least) appropriately handle
  ‹ ☞︎⟨this☜︎ weird ⟨case☜︎⟩ ›, where each inline should stop at the
  first delimiter.
Correct processing starts by looking for end sigils first, not start
  sigils, backtracking to the last start sigil which precedes it,
  wrapping that text, and then reprocessing the entire set of nodes
  until no more end sigils with matching start sigils are found.

This commit implements that behaviour, which is of course a fair bit
  more complicated but should improve the results.
It also changes specifying characters by Unicode codepoint to use
  curly braces rather than angle brackets, as the latter conflicted
  with the angle brackets used in links.
Specifying by Unicode codepoint still isn¦t supported in links, but
  the behaviour should be less surprising.

The old Unicode codepoint behaviour was probably broken also, in the
  case where the codepoint was not the first character in a paragraph,
  but it is fixed now.

Don¦t use literal result elements

It¦s simpler and cleaner to always just create elements with
`<xslt:element>´, as these never inherit namespaces from their
surrounding context.

Support preformatted text

Better handle i·d‐less pilcrows

A pilcrow is useful to “force” a paragraph when it would otherwise
start with a sigil.
These pilcrows may not have i·d¦s; this should be supported.

Allow block sigils followed by nothing

Support “comment” paragraphs

Preserve document and record‐jar comments

Support multiple documents per file

Documents may begin with either `#!lesml´ or `##´.

Refactor initial chunking to be line‐based

The old parsing mechanism operated primarily on large string chunks,
  which were re‐parsed into lines potentially multiple times.
This refactor changes the parsing to use lines and ranges, which is a
  little more verbose/complicated from an X·Path perspective (it
  requires a lot of `generate-id()´ comparisons) but hopefully, on the
  whole, better.

Support language tag and profile

This commit provides initial support for language‐tagged Les·M·L
  documents and additional document properties.
Only one property is supported: `profile´.
Language tags are themselves internally treated as properties whose
  key contains spaces; property keys cannot ordinarily contain spaces
  so there is no concern for confusion.

Increase the number of section‐break characters

Switch symbols for subsection and subsubsection

`❦´ is a stronger symbol than `✠´; its corresponding directional
fences `❧´ and `☙´ indicate section boundaries, while `⹐´ and `⹑´
simply enclose emphasis.

Use square instead of triangular bullet for level 3

This matches default H·T·M·L bulleting and avoids potential semantic
connotations that the triangle bullet may seem to have.

Initial implementation