Lady [Sat, 22 Jun 2024 21:54:29 +0000 (17:54 -0400)]
Provide the location of parsed files in metadata
Rather than hardcoding parsed file lookup into `expandmetadata.xslt`
and `catalog2transform.xslt`, provide a file U·R·I for the parsed file
as a part of its metadata and use that instead. It _is_ reasonable for
transforms to want to access the original parsed documents of
dependencies. Note, however, that there is no guarantee that the parsed
document actually exists if it _isn’t_ a dependency for the file.
Lady [Sat, 22 Jun 2024 21:43:40 +0000 (17:43 -0400)]
Disable make·file prerequisites when not needed
Specifically, the `help` and `clean` targets don’t require any
compilation of types, parsers, dependencies, or destinations. G·N·U
Make provides the `$(MAKECMDGOALS)` to check which targets were set as
goals on the commandline; when the only goals are `help` and `clean`,
excessive computation can be disabled.
Lady [Sat, 22 Jun 2024 20:06:53 +0000 (16:06 -0400)]
“Simplify” the restart mechanism using $?
Instead of having two GNUmakefile rules which are present under
different conditions (and which could hypothetically both apply), just
have one and check to see which prerequisites are out·of·date in order
to adjust the behaviour.
In order to prevent unnecessary builds of metadata and parsed files
prior to a type update, wrap the recipes in a check to see if
`$(BUILDDIR)/.update-types` was created over the course of the build.
This needs to happen in the shell, not in Make, because older versions
of Make cache the `$(wildcard)` function. The implementation of
`$(unlesstypeswillupdate)` uses an `if` function instead of a `and`
because the latter seems to trim white·space.
It is no longer an error if the type of a file cannot be determined;
this is required to enable recipe expansion when types are not yet
generated now that the test is happening in the shell. Instead, files
are given a default type of `application/octet-stream`.
Lady [Sat, 22 Jun 2024 17:02:24 +0000 (13:02 -0400)]
Generate dependencies & destinations with metadata
This commit obviates the need for separate `metadata2dependencies` and
`metadata2destinations` transforms and simply bundles their
functionality into `expandmetadata`.
Making this work right is a bit tricky because we are outputting the
main document to `stdout` but want the other result documents to be
output to `BUILDDIR`. The best solution would be to just read in the
build directory inside of the transform and use it to determine the
output location, but unfortunately `exsl:document` does not support
dynamic computation of the destination directory. The current solution
is instead to `cd` into the build directory in a subshell before
calling `xsltproc`.
Lady [Sun, 2 Jun 2024 02:11:10 +0000 (22:11 -0400)]
Simplify metadata using new 书社vocab terms
This commit also changes the behaviour of `@书社:destination`:
Formerly, the value of this attribute needed to be a percent·encoded;
now, it must _not_ be percent·encoded.
Lady [Sun, 2 Jun 2024 01:31:55 +0000 (21:31 -0400)]
Generate destinations and dependencies as metadata
The format of this is subject to change. It might actually be desirable
to have the dependency and destination files be side·effects of the
metadata file generated alongside it (in the same `xsltproc` call)
rather than needing to generate them separately.
Lady [Mon, 27 May 2024 21:38:46 +0000 (17:38 -0400)]
Don’t allow xsltproc to output files
…with the exception of the archive extractor, which is
specially constructed to do so. In all other cases, just output to
standard output and pipe to a file as with other commands, for
consistency and security.
Lady [Mon, 27 May 2024 21:16:23 +0000 (17:16 -0400)]
Provide template identifiers as <书社:id>s
This is definitely going to break existing websites, but it’s much more
sensible and straightforward to deal with, in my opinion. Alternatives
included instead providing them as comments or processing instructions,
which would be much harder to process.
Lady [Mon, 27 May 2024 20:16:44 +0000 (16:16 -0400)]
Do not wrap results which contain no H·T·M·L
It’s much more likely that a result which is not H·T·M·L is intended to
remain that way; wrapping it in `<html:body>` can be used as a
work·around when H·T·M·L wrapping is desired.
Lady [Mon, 27 May 2024 19:09:44 +0000 (15:09 -0400)]
Remove variation selectors from sigil
The choice of whether to render a character as emoji style or text
style is properly a font and typesetting decision, not a character
encoding one. Altho Unicode provides variation selectors for indicating
a preference in plain text, these selectors should not be given formal
significance. To be clear that the canonical encoding for the sigil is
`<U+26E9,U+1F4F0>`, remove them across the board.
Lady [Wed, 22 May 2024 06:37:02 +0000 (02:37 -0400)]
Provide metadata in transform; add attributes there
The metadata actually depends on the parser, so it’s a recursive
dependency to make the parser use it. Instead, just make it available
in transforms and add the various ⛩️📰 书社 attributes during the
expansion phase.
Literally including the metadata R·D·F in the transform greatly
increases its file size, but it “should” be fine.
This commit also brings with it a few other improvements and changes to
transforms :—
- All of the `$书社:*` variables which used to return result tree
fragments now return node sets.
- The `书社:application` mode has been renamed to `书社:apply`, to
match `书社:expand`.
Lady [Wed, 22 May 2024 05:21:45 +0000 (01:21 -0400)]
Drop CKSUM and SRCTIME params; add as attributes
`@书社:cksum`, `@书社:mtime`, and `@书社:identifier` are now all added
during the parsing phase. (`@书社:identifier` used to be added during
the transformation phase, but badly. `@书社:mtime` is new.)
This hardcodes the location of the metadata file for now; ideally the
metadata would be embedded.
Lady [Wed, 22 May 2024 04:44:49 +0000 (00:44 -0400)]
Replace source catalog with metadata file
There are two things which this approach should eventally bring :—
1. Availability of various aspects of file metadata for every file
during the parsing and transformation phases. Right now, only
metadata for the file currently being processed is available, and
while loading the catalog is possible in transforms, it’s probably
not really advisable and hasn’t been extensively tested.
2. Cacheing of file metadata for files which have not changed since the
last time Make was run. This comes with the usual Make drawback that
if a file is changed to be older, rather than newer, it won’t be
recognized as having been changed.
Neither of these things are really implemented at this point, but the
metadata file is created and being used and the old catalog has been
removed. Future commits should refine the behaviour.
Lady [Sat, 22 Jun 2024 16:11:04 +0000 (12:11 -0400)]
Use --nonet on xmllint, too
This option was already being supplied for `xsltproc`. Networking
should always be disabled by default; mechanisms for selectively
enabling it may be added later.
Lady [Mon, 27 May 2024 18:41:47 +0000 (14:41 -0400)]
Use --noent when calling xmllint
The X·Path expressions do not correctly match text provided within
entities if the entities are not expanded. Adding `--noent` ensures
that all entities are expanded and X·Path works correctly.
Lady [Sat, 25 May 2024 04:08:01 +0000 (00:08 -0400)]
Improve SRCDIR/INCLUDEDIR handling; allow dot
- Explicitly allow `find` to match `.`, which otherwise would be
excluded as a dotfile.
- Add special handling to drop the leading `./` that results from the
above, and generate the appropriate local paths without needing it.
- Error when trying to perform certain transformations on file·names
and failing, to better aid diagnosis.
- When a file has no destination (because `destinations` has not been
generated yet), use the fake destination `NOTDEF`. (Make will restart
before this destination is used.)
Lady [Fri, 24 May 2024 05:10:58 +0000 (01:10 -0400)]
Mark GNUmakefile as precious
One would hope that Make wouldn’t ever delete the make·file it is
running, but the documentation doesn’t seem to explicitly give that
guarantee, so it’s good to be explicit about it.
Lady [Fri, 24 May 2024 02:39:49 +0000 (22:39 -0400)]
Allow overriding of cd
Altho `cd` is builtin in most shells, it is not a special builtin
utility as defined by Posix, and the use of an alternative
implementation is conceivable.
Lady [Wed, 22 May 2024 04:29:51 +0000 (00:29 -0400)]
FORCE on type updates
When this was originally implemented, I’m not sure `FORCE` was defined,
but it’s needed now for `diffprereqs` so there’s no reason why not to
use it here as well.
Lady [Tue, 21 May 2024 06:06:07 +0000 (02:06 -0400)]
Use diff to get better dates
See the comments for more information, including Macintosh quirks. This
is a great deal better than using `ls` and allows for the dropping of
the latter as a dependency.
Lady [Tue, 21 May 2024 03:19:48 +0000 (23:19 -0400)]
Support +xml suffix for determining X·M·L files
This gets around operating system extensions to `file` which might
identify S·V·G files (for example) as `image/svg+xml` before
attempting magic detection. If not every X·M·L‐based syntax should be
handled as such by ⛩️📰 书社, redefine `XMLTYPES` to exclude the
`+xml`.
Lady [Sun, 19 May 2024 21:16:42 +0000 (17:16 -0400)]
Remove stat dependency
This is, strictly speaking, a downgrade in functionality, with the
upside of reducing reliance on non·Posix programs (namely `stat`). A
better, Posix‐compliant solution is to archive an empty file with the
correct modification time and then list out the time from that archive;
however, as far as I’m aware, it’s not possible to obtain an
implementation of the `pax` utility which actually supports this. macOS
only supports a very limited subset of the `listopt` option, and only
for pax archives (not tarballs); other implementations don’t seem to
support it at all.
Lady [Sun, 19 May 2024 22:09:06 +0000 (18:09 -0400)]
Force removal of existing directories
…prior to processing results or installing. Mostly, this prevents a
failure when an expanded archive is changed to no longer be expanded.
This also simplifies the installation code a bit.
Lady [Sun, 19 May 2024 19:54:20 +0000 (15:54 -0400)]
Remove sed range expressions
These are technically only Posix in the Posix locale and have undefined
meaning otherwise. It’s not the policy of ⛩️📰 书社 to require the
Posix locale, so the safest thing is to just not use range expressions
here.
(Actually, this policy might be worth revisiting for things which
definitely need to be operating on Unicode text.)
Lady [Fri, 3 May 2024 03:48:12 +0000 (23:48 -0400)]
Use compound commands to join strings, not sed
Previously, the script would sometimes use sed to insert text at the
beginning or end of a string, but it is better to just printf as a
separate step instead.
Lady [Tue, 30 Apr 2024 07:53:58 +0000 (03:53 -0400)]
Support pagination
It’s not clear to me that this is actually a good idea, and this
functionality may be reverted later. It adds a lot of complexity
despite still having significant drawbacks, and the alternative pattern
of just generating archives and later expanding them is a much safer
and more versatile solution.
However, the pattern of e·g needing a paginated feed of all the posts
in one’s blog suggests that something along these lines (backed by this
method or archiving~expanding) *should* be supported by default in
⛩️📰 书社. One alternative might be to add (e·g) a `@书社:expand`
attribute to <书社:archive> elements, tho how exactly this would be
tracked on the resulting archives is unclear, and it would be
restricted to producing folders (with exclusively archive contents)
that could not easily be mixed with other files.
Lady [Sun, 28 Apr 2024 04:02:37 +0000 (00:02 -0400)]
Touch important files before making with xsltproc
When there is nothing to output, `xsltproc` does not create files.
However, Make will find itself in an endless restart loop if these
files are never created.
Lady [Sun, 14 Apr 2024 23:38:07 +0000 (19:38 -0400)]
Allow injecting raw output in X·M·L serialization
It’s not possible to serialize things like entities using the normal
X·S·L·T processes. With this commit, one can instead use something like
`<书社:raw-output>&my-entity;</书社:raw-output>` inside of
`<书社:serialize-xml>` elements to get this result.
Lady [Sun, 14 Apr 2024 19:44:22 +0000 (15:44 -0400)]
Add support for manually serializing X·M·L
This commit adds a transform for a new `<书社:serialize-xml>` element,
which is useful in conjunction with `<书社:raw-text>` to produce a more
finely‐controlled X·M·L output, or in other X·M·L‐y situations where an
escaped X·M·L value is required. The algorithm used for serialization
attempts to closely match the DOM Parsing and Serialization spec,
including such behaviours as mandating an undeclared `xml:` prefix for
the X·M·L name·space and dropping the prefix from elements whose
name·space matches the default, but it probably isn’t exactly the same
(due in part to the fact that the underlying data structure is an X·M·L
infoset, not a potentially dynamically‐modified Dom). No special
allowances are made for elements in the H·T·M·L name·space; this is not
(yet) a suitable polyglot serializer (or intended to be one).
Lady [Wed, 10 Apr 2024 19:53:11 +0000 (15:53 -0400)]
Allow creation of tarballs
This is useful when using ⛩️📰 书社 directly as a static site
generator to provide archive downloads (archives are not compressed; it
is assumed that they will be gzipped over the wire). This requires
a recursive call to Make for each archive file, which performs the
following steps :—
- Extracts all of the elements slated for archiving into separate
files.
- Restarts.
- Processes the resulting extracted files and then archives them.
The extraction step in particular is somewhat convoluted; it requires
dynamically generating a transform which has the appropriate
`<exsl:document>` elements for a given source file, and then applying
that transform in a second call to `xsltproc`.
X·M·L outputs are now passed through an extra call to `xmllint` to
remove any unnecessary namespace attributes instead of just symlinked;
the symlinks weren’t compatible with archiving anyway.
Lady [Mon, 1 Apr 2024 23:38:48 +0000 (19:38 -0400)]
Don’t “compile” assets, just “build”
This reserves the `build/results/` directory for _just_ the results of
transformations, and delays the copying of asset files into the build
directory until the actual “build” step. (Likewise for recursive files,
altho these still just error).
`make all` now builds all installable files, including assets, which
were formerly excluded. A downstream script might expect assets to
appear in `build/public` after a `make all` and shouldn’t require a
`make install` to get them.
Lady [Mon, 1 Apr 2024 21:05:41 +0000 (17:05 -0400)]
Allow ⛩️📰 书社 to produce plain text
This requires adding _another_ build stage; the result of the
transformation step is output to `build/results`, which is then
processed again to create the `build/public` final result. In most
cases, this additional processing just produces a symlink. However,
when the root element is a special value, a derived file will be
produced.
The only special elements supported right now are `<书社:raw-text>`,
which outputs the raw text contents of the text nodes in the result
tree, and `<书社:base64-binary>`, which produces a binary file from the
base64 text contents determined using the same method.
Lady [Mon, 1 Apr 2024 20:45:42 +0000 (16:45 -0400)]
Improve (fix) the T·S·V parser
Although `exslstr:tokenize()` is fast, it should not be used when
splitting the columns of a T·S·V file, as it will collapse empty
columns. Introduce a new transform in `lib/` for splitting, and import
it into the T·S·V parser.
This transform was largely copied from Caudex
<https://git.ladys.computer/Caudex/blob/0.1.1:/lib/split.xslt> and is
likely to be useful downstream as well.
Continue using `exslstr:tokenize()` for splitting the _rows_ of the
T·S·V, as empty rows _should_ be collapsed.
Lady [Wed, 27 Mar 2024 04:09:27 +0000 (00:09 -0400)]
“Support” X·M·L 1.1
The X·M·L 1.1 “support” amounts to deleting the declaration and
replacing any character escapes for C·0 controls with
`U+0091 PRIVATE USE ONE`, which is a valid character in X·M·L 1.0.
This is done entirely in `sed`, so it’s not perfect, but it should be
“good enough”.
Lady [Thu, 8 Feb 2024 03:41:33 +0000 (22:41 -0500)]
Refactor transforms & add 书社:application stage
The main goal of this commit was to add a
`<书社:apply-attributes-to-root>` element, to allow transforms to pass
attributes up to the root element of the result, for example `@lang`
information. This required an extensive refactor of a lot of the
transform infrastructure and the creation of a new transform stage,
`书社:application`, which follows the ordinary transform and solely
handles `<书社:apply-attributes-to-root>` and
`<书社:apply-attributes>`. Other `@书社:*` attributes are removed at
this stage, but it isn’t generally recommended that transforms try to
hook in here.
This commit also makes a number of smaller changes :—
- Use `node()` in place of element wildcards anywhere where
specifically only matching elements wasn’t intended.
In particular, even in places where text is not expected, there may
be comments to preserve.
- Only add `@itemscope` and `@itemtype` attributes to H·T·M·L elements,
since they are only defined for elements in that namespace.
- Provide `@书社:identifier` on documents and embeds to get the
`about:shushe` u·r·i of the resource.
- In transforms which generate transforms, `<xsla:text>` elements which
provide only white·space need `<xslt:text>` children to ensure the
whitespace isn’t stripped. (Note: In the actual source text, `xsla:`
is given the `xslt:` prefix and `xslt:` is the default prefix.)
Similarly, it’s necessary to provide attribute value templates using
a `<xslt:attribute>` element rather than with the literal result
element syntax, to prevent them from being prematurely applied.
Lady [Tue, 6 Feb 2024 03:56:03 +0000 (22:56 -0500)]
Replace GENERATOR and VERSION with THISREV
Instead of replacing existing `<html:meta name="generator">`s, format
them into a comma‐separated list with ⛩️📰 书社 as the final entry.
Don’t allow overriding of ⛩️📰 书社 generator metadata.
Manually specifying `THISREV` is still possible to allow it to be
filled by users running Make without Git, but it should not be
overridden with the version of the calling generator, as it is used to
fill `@书社:version`.
Lady [Sun, 4 Feb 2024 05:11:06 +0000 (00:11 -0500)]
Track catalog & magic prereqs and diff for changes
If a new prerequisite for a catalog (or the compiled magic file) is
added, and it is newer than the last build, then the catalog (or
compiled magic file) will be rebuilt. However, formerly, the file would
not be rebuilt if the added file was older, or if a prerequisite was
removed instead of added, due to limitations in Make.
This commit tracks the list of prerequisites separately, and if it
changes, forces a rebuild of the file regardless of whether the
prerequisites are newer or older than the target.
Lady [Sun, 4 Feb 2024 01:04:49 +0000 (20:04 -0500)]
Add parser.xslt as a prerequisite for parsing
Although this file will generally be generated as a part of the
make·file restart loop, it is possible to wind up with an early error
during dependency generation if files cannot be parsed prior to
resolving make·file dependencies. Depend on it in this case as well,
with the understanding that this will update the types yet again.
If `parser.xslt` _is_ generated as a part of dependency generation
(and, presumably, `.update-types` does not exist) then the dependency
update message should be suppressed, since the other make·file build
script will also be present and active.
Lady [Sun, 4 Feb 2024 01:01:49 +0000 (20:01 -0500)]
Apply @书社:cksum to result when parsing
Because this indiscriminately adds the attribute to the result of
parsing the root node, the checksum should be added for both X·M·L and
plaintext sources.
Lady [Sat, 3 Feb 2024 23:25:56 +0000 (18:25 -0500)]
Just manually parse hexadecimal in Awk
The biggest performance bottleneck in this code was the fact that, for
compatibility reasons, Awk was piping hexadecimal numbers to the shell
in order to parse them. (Macintosh Awk can parse hexadecimal numbers in
`printf`, but G·N·U Awk cannot.)
Because the hexadecimal number is known to be two digits, it’s easy to
just parse it in Awk directly, avoiding the shell pipe and considerably
speeding up the program.
Lady [Sat, 3 Feb 2024 20:25:41 +0000 (15:25 -0500)]
Reduce subshells created by percent·encoding
It’s possible to use `%0A` as a component separator assuming that
file·names will never contain newlines; this allows all filenames to
be processed at once rather than needing a separate subshell for each.
It’s not necessary to encode each path component separately; just
encode the whole path and replace `%2F` with `/` at the end. It’s not
possible for file·names to contain literal `/` characters.
The above two changes should increase the speed of operations such as
building the parser catalog in ⛩️📰 书社 considerably.
Lady [Sat, 3 Feb 2024 01:17:34 +0000 (20:17 -0500)]
Make all the default rule
This better conforms to Make conventions, and the help rule is of
pretty limited utility considering the make·file still has to restart
at least twice to use it.
Lady [Fri, 2 Feb 2024 04:12:35 +0000 (23:12 -0500)]
Pad colons with spaces on both sides
G·N·U Make recognizes `&:` as indicating grouped targets. Ampersands
are allowed in filenames, so it’s best not to place them directly next
to the colons.
Lady [Fri, 2 Feb 2024 04:09:47 +0000 (23:09 -0500)]
Make percent‐decoding awk script portable
This script depended on `printf` having the same behaviour within `awk`
and on the commandline. This doesn’t appear to be true in G·N·U Awk.
Instead, pipe into the shell version from within the Awk script.
Lady [Fri, 2 Feb 2024 04:06:22 +0000 (23:06 -0500)]
Disallow filenames which end in a cloparen
There is a bug in G·N·U Make which causes the `wildcard` function to
ignore files and directories which end in a cloparen (`)`). To be safe,
disallow these files as sources, even though parens are generally 🆗.
Lady [Wed, 31 Jan 2024 05:48:08 +0000 (00:48 -0500)]
Specifiy magic files, not a directory
This is more flexible and matches how parsers and transforms work.
Compiling magic requires these to all be placed in the same directory
at some point, but symbolic linking works for this purpose.
Lady [Wed, 31 Jan 2024 05:40:55 +0000 (00:40 -0500)]
Add EXTRA* variables
It shouldn’t be necessary to know where existing parsers and transforms
are kept or what the default find rules are in order to supply
additional ones.