2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
7 <b>A make·file for X·M·L.</b>
9 <dfn>⛩️📰 书社</dfn> aims to make it easy to generate websites with
10 X·S·L·T and G·N·U Make.
11 It is consequently only a good choice for people who like X·S·L·T and
12 G·N·U Make and wish it were easier to make websites with them.
14 It makes things easier by :—
16 - Automatically identifying source files and characterizing them by
17 type (X·M·L, text, or asset).
19 - Parsing supported text types into X·M·L trees.
21 - Enabling easy inclusion of source files within each other.
23 It aims to do this with zero dependencies beyond the programs already
24 installed on your computer†.
26 † Assuming an operating system with a fairly featureful, and
27 Posix‐compliant, development setup (e·g, macOS).
28 In fact, on Linux you will probably need to install a few programs:
29 `libxml2-utils`, `xsltproc`, `sharutils`, and `pax`.
33 <i lang="cmn-Hans">书社</i> is a Chinese word meaning “publishing
36 The first character, <i lang="cmn-Hans">书</i>, is the simplified form
39 The second character, <i lang="cmn-Hans">社</i>, contemporarily means
40 “association”, but historically referred to the god of the soil and
41 related altars or festivities.
42 In Japanese, it is an alternate spelling for <i lang="ja">やしろ</i>,
43 the word for “Shinto shrine”.
45 The name <i lang="cmn-Hans">书社</i> was chosen to play on this pun, as
46 it is intended as a publishing program for webshrines.
48 In Ascii environments, ⛩️📰 书社 should be written `Shushe`, following
49 the pinyin transliteration.
53 In most cases, ⛩️📰 书社 aims to require only functionality which is
54 present in all Posix‐compliant operating systems.
55 There are a few exceptions.
56 Details on particular programs are given below; if a program is not
57 listed, it is assumed that any Posix‐compliant implementation will
62 This is a Posix utility, but ⛩️📰 书社 currently depends on
63 unspecified behaviour.
64 It requires support for the following additional options :—
66 - **`-C`**, when supplied with `-m`, must be useable to compile a
67 `.mgc` magicfile for use with future invocations of `file`.
69 - **`--files-from`** must be useable to provide a file that `file`
70 should read file·names from, and `-` must be useable in this
71 context to specify the standard input.
73 - **`--mime-type`** must cause `file` to print the internet media type
74 of the file with no charset parameter.
76 - **`--separator`** must be useable to set the separator that `file`
77 uses to separate file names from types.
79 These options are implemented by the
80 [Fine Free File Command](https://darwinsys.com/file/), which is used
81 by most operating systems.
85 This is not a Posix utility.
86 Usage of `git` is optional, but recommended (and activated by default).
87 To disable it, set `GIT=`.
91 This is a Posix utility, but ⛩️📰 书社 currently depends on
92 unspecified behaviour.
93 ⛩️📰 书社 requires specifically the G·N·U version of `make`, and
94 depends on functionality present in version 3.81 or later.
95 It is not expected to work in previous versions, or with other
96 implementations of Make.
100 This is a Posix utility, but not included in the Linux Standard Base or
101 installed by default in many distributions.
102 Only `ustar` format support is required.
104 ### `uudecode` and `uuencode`
106 These are Posix utilities, but not included in the Linux Standard Base
107 or installed by default in many distributions.
108 The G·N·U [Sharutils](https://www.gnu.org/software/sharutils/) package
109 can be installed to access them.
111 ### `xmlcatalog` and `xmllint`
113 These are not a Posix utilities.
114 They is a part of `libxml2`, but may need to be installed separately
115 (e·g by the name `libxml2-utils`).
119 This is not a Posix utility.
120 It is a part of `libxslt`, but may need to be installed separately.
124 Place source files in `sources/` and run `make install` to compile
125 the result to `public/`.
126 Compilation involves the following steps :—
128 1. ⛩️📰 书社 compiles all of the magic files in `magic/` into a single
129 file, `build/magic.mgc`.
131 2. ⛩️📰 书社 processes all of the parsers in `parsers/` and determines
132 the list of supported plaintext types.
134 3. ⛩️📰 书社 identifies all of the source files and includes and uses
135 `build/magic.mgc` to classify them by media type.
137 4. ⛩️📰 书社 parses all plaintext and X·M·L source files and includes
138 and then builds a dependency tree between them.
140 5. ⛩️📰 书社 uses the dependency tree to establish prerequisites for
143 6. ⛩️📰 书社 compiles each output file to `build/result`.
145 7. ⛩️📰 书社 copies most output files from `build/result` to
146 `build/public`, but it does some additional processing instead on
147 those which indicate a non‐X·M·L desired final output form.
149 8. ⛩️📰 书社 copies the final resulting files to `public`.
151 You can use `make list` to list each identified source file or include
152 alongside its computed type and dependencies.
153 As this is a Make‐based program, steps will only be run if the
154 corresponding buildfile or output file is older than its
159 The ⛩️📰 书社 name·space is `urn:fdc:ladys.computer:20231231:Shu1She4`.
161 This document uses a few name·space prefixes, with the following
164 | Prefix | Expansion |
165 | ---------: | :-------------------------------------------- |
166 | `catalog:` | `urn:oasis:names:tc:entity:xmlns:xml:catalog` |
167 | `exsl:` | `http://exslt.org/common` |
168 | `exslstr:` | `http://exslt.org/strings` |
169 | `html:` | `http://www.w3.org/1999/xhtml` |
170 | `svg:` | `http://www.w3.org/2000/svg` |
171 | `xlink:` | `http://www.w3.org/1999/xlink` |
172 | `xslt:` | `http://www.w3.org/1999/XSL/Transform` |
173 | `书社:` | `urn:fdc:ladys.computer:20231231:Shu1She4` |
175 ## Setup and Configuration
177 ⛩️📰 书社 depends on the following programs to run.
178 In every case, you may supply your own implementation by overriding the
179 corresponding (allcaps) variable (e·g, set `MKDIR` to supply your own
180 `mkdir` implementation).
189 - `git` (optional; set `GIT=` to disable)
196 - `pax` (only when generating archives)
207 - `xmlcatalog` (provided by `libxml2`)
208 - `xmllint` (provided by `libxml2`)
209 - `xsltproc` (provided by `libxslt`)
211 The following additional variables can be used to control the behaviour
215 The location of the source files (default: `sources`).
216 Multiple source directories can be provided, so long as the same
217 file subpath doesn’t exist in more than one of them.
220 The location of source includes (default: `sources/includes`).
221 This can be inside of `SRCDIR`, but needn’t be.
222 Multiple include directories can be provided, so long as the same
223 file subpath doesn’t exist in more than one of them.
226 The location of the (temporary) build directory (default: `build`).
227 `make clean` will delete this, and it is recommended that it not be
228 used for programs aside from ⛩️📰 书社.
231 The location of directory to output files to (default: `public`).
232 `make install` will overwrite files in this directory which
233 correspond to those in `SRCDIR`.
234 It *will not* touch other files, including those generated from files
235 in `SRCDIR` which have since been deleted.
237 Files are first compiled to `$(BUILDDIR)/public` before they are
238 copied to `DESTDIR`, so this folder is relatively quick and
239 inexpensive to re·create.
240 It’s reasonable to simply delete it before every `make install` to
241 ensure stale content is removed.
244 The location of the ⛩️📰 书社 `GNUmakefile`.
245 This should be set automatically when calling Make and shouldn’t ever
246 need to be set manually.
247 This variable is used to find the ⛩️📰 书社 `lib/` folder, which is
248 expected to be in the same location.
251 A white·space‐separated list of magic files to use (default:
252 `$(THISDIR)/magic/*`).
255 The value of this variable is appended to `MAGIC` by default, to
256 enable additional magic files without overriding the existing ones.
259 Rules to use with `find` when searching for source files.
260 The default ignores files that start with a period or hyphen‐minus,
261 those which end with a cloparen, and those which contain a hash,
262 buck, percent, asterisk, colon, semi, eroteme, bracket, backslash,
265 - **`EXTRAFINDRULES`:**
266 The value of this variable is appended to `FINDRULES` by default, to
267 enable additional rules without overriding the existing ones.
269 - **`FINDINCLUDERULES`:**
270 Rules to use with `find` when searching for includes (default:
273 - **`EXTRAFINDINCLUDERULES`:**
274 The value of this variable is appended to `FINDINCLUDERULES` by
275 default, to enable additional rules without overriding the existing
279 A white·space‐separated list of parsers to use (default:
280 `$(THISDIR)/parsers/*.xslt`).
282 - **`EXTRAPARSERS`:**
283 The value of this variable is appended to `PARSERS` by default, to
284 enable additional parsers without overriding the existing ones.
287 A white·space‐separated list of transforms to use (default:
288 `$(THISDIR)/transforms/*.xslt`).
290 - **`EXTRATRANSFORMS`:**
291 The value of this variable is appended to `TRANSFORMS` by default, to
292 enable additional transforms without overriding the existing ones.
295 A white·space‐separated list of media types to consider X·M·L
296 (default: `application/xml text/xml`).
299 The current version of ⛩️📰 书社 (default: derived from the current
300 git tag/branch/commit).
303 The current version of the source files (default: derived from the
304 current git tag/branch/commit).
307 If this variable has a value, every recipe instruction will be
308 printed when it runs (default: empty).
309 This is helpful for debugging, but typically too noisy for general
314 Source files may be placed in `SRCDIR` in any manner; the file
315 structure used there will match the output.
316 The type of source files is *not* determined by file extension, but
317 rather by magic number; this means that files **must** begin with
318 something recognizable.
319 Supported magic numbers include :—
321 - `<?xml` for `application/xml` files
322 - `#!js` for `text/javascript` files
323 - `@charset "` for `text/css` files
324 - `#!tsv` for `text/tab-separated-values` files
325 - `%%` for `text/record-jar` files (unregistered; see
326 [[draft-phillips-record-jar-01][]])
328 Text formats with associated X·S·L·T parsers are wrapped in a H·T·M·L
329 `<script>` element whose `@type` gives its media type, and then
330 passed to the parser to process.
331 Source files whose media type does not have an associated X·S·L·T
332 parser are considered “assets” and will not be transformed.
334 **☡ For compatibility with this program, source file·names must not
335 contain Ascii white·space, colons (`:`), semis (`;`), pipes (`|`),
336 bucks (`$`), percents (`%`), hashes (`#`), asterisks (`*`), brackets
337 (`[` or `]`), erotemes (`?`), backslashes (`\`), or control
338 characters, must not begin with a hyphen‐minus (`-`), and must not end
339 with a cloparen (`)`).**
340 The former characters have the potential to conflict with make syntax,
341 a leading hyphen‐minus is confusable for a commandline argument, and a
342 trailing cloparen [activates a bug in G·N·U Make
343 3.81](https://stackoverflow.com/questions/17148468/capturing-filenames-including-parentheses-with-gnu-makes-wildcard-function#comment24825307_17148894).
347 Parsers are used to convert plaintext files into X·M·L trees, as well
348 as convert plaintext formats which are already included inline in
349 existing source X·M·L documents.
350 ⛩️📰 书社 comes with some parsers; namely :—
352 - **`parsers/plain.xslt`:**
353 Wraps `text/plain` contents in a `<html:pre>` element.
355 - **`parsers/record-jar.xslt`:**
356 Converts `text/record-jar` contents into a `<html:div>` of
357 `<html:dl>` elements (one for each record).
359 - **`parsers/tsv.xslt`:**
360 Converts `text/tab-separated-values` contents into an `<html:table>`
363 New ⛩️📰 书社 parsers which target plaintext formats should have an
364 `<xslt:template>` element with no `@name` or `@mode` and whose
367 - Starts with an appropriately‐name·spaced qualified name for a
368 `<html:script>` element.
370 - Follows this with the string `[@type=`.
372 - Follows this with a quoted string giving a media type supported by
374 Media type parameters are *not* supported.
376 - Follows this with the string `]`.
378 For example, the trivial `text/plain` parser is defined as follows :—
381 <?xml version="1.0"?>
383 xmlns="http://www.w3.org/1999/XSL/Transform"
384 xmlns:html="http://www.w3.org/1999/xhtml"
385 xmlns:书社="urn:fdc:ladys.computer:20231231:Shu1She4"
388 <书社:id>example:text/plain</书社:id>
389 <template match="html:script[@type='text/plain']">
390 <html:pre><value-of select="."/></html:pre>
395 ⛩️📰 书社 will scan the provided parsers for this pattern to determine
396 the set of allowed plaintext file types.
397 Multiple such `<xslt:template>` elements may be provided in a single
398 parser, for example if the parser supports multiple media types.
399 Alternatively, you can set the `@书社:supported-media-types` attribute
400 on the root element of the parser to override media type support
403 Even when `@书社:supported-media-types` is set, it is a requirement
404 that each parser transform any `<html:script>` elements with a
405 `@type` which matches their registered types into something else.
406 Otherwise the parser will be stuck in an endless loop.
407 The result tree of applying the transform to the `<html:script>`
408 element will be reparsed (in case any new `<html:script>` elements
409 were added in its subtree), and a `@书社:parsed-by` attribute will be
410 added to each toplevel element in the result.
411 The value of this attribute will be the value of the `<书社:id>`
412 toplevel element in the parser.
414 It is possible for parsers to support zero plaintext types.
415 This is useful when targeting specific dialects of X·M·L; parsers in
416 this sense operate on the same basic principles as transforms
418 The major distinction between X·M·L parsers and transforms is where in
419 the process the transformation happens:
420 Parsers are applied *prior* to embedding (and can be used to generate
421 embeds); transforms are applied *after*.
423 It is **strongly recommended** that auxillary templates in parsers be
424 name·spaced (by `@name` or `@mode`) whenever possible, to avoid
425 conflicts between parsers.
427 ### Attributes added during parsing
429 ⛩️📰 书社 will add a few attributes to the output of the parsing step,
432 - A `@书社:cksum` attribute on toplevel result elements, giving the
433 `cksum` checksum of the corresponding source file.
435 - For the elements which result from parsing plaintext `<html:script>`
438 - A `@书社:parsed-by` attribute, giving a space‐separated list of
439 parsers which parsed the node.
440 (Generally, this will be a list of one, but it is possible for the
441 result of a parse to be another plaintext node, which may be
442 parsed by a different parser.)
444 - A `@书社:media-type` attribute, giving the identified media type of
449 Documents can be embedded in other documents using a `<书社:link>`
450 element with `@xlink:show="embed"`.
451 The `@xlink:href`s of these elements should have the format
452 `about:shushe?source=<path>`, where `<path>` provides the path to the
453 file within `SRCDIR`.
454 Includes, which do not generate outputs of their own but may still be
455 freely embedded, instead use the format
456 `about:shushe?include=<path>`, where `<path>` provides the path
459 Embeds are replaced with the parsed contents of a file, unless the file
460 is an asset, in which case an `<html:object>` element is produced
461 instead (with the contents of the asset file provided as a base64
463 Embed replacements will be given a `@书社:identifier` attribute whose
464 value will match the `@xlink:href` of the embed.
466 Embedding takes place after parsing but before transformation, so
467 parsers are able to generate their own embeds.
468 ⛩️📰 书社 is able to detect the transitive embed dependencies of files
469 and update them accordingly; it will signal an error if the
470 dependencies are recursive.
472 ## Output Redirection
474 By default, ⛩️📰 书社 installs files to the same location in `DESTDIR`
475 as they were placed in their `SRCDIR`.
476 This behaviour can be customized by setting the `@书社:destination`
477 attribute on the root element, whose value can give a different path.
478 This attribute is read after parsing, but before transformation (where
479 it is silently dropped).
483 Transforms are used to convert X·M·L files into their final output,
484 after all necessary parsing and embedding has taken place.
485 ⛩️📰 书社 comes with some transforms; namely :—
487 - **`transforms/asset.xslt`:**
488 Converts `<html:object>` elements which correspond to recognized
489 media types into the appropriate H·T·M·L elements, and deletes
490 `<html:style>` elements from the body of the document and moves
493 - **`transforms/metadata.xslt`:**
494 Provides basic `<html:head>` metadata.
495 This metadata is generated from `<html:meta>` elements with one of
496 the following `@itemprop` attributes :—
498 - **`urn:fdc:ladys.computer:20231231:Shu1She4:title`:**
499 Provides the title of the page.
501 ⛩️📰 书社 automatically encapsulates H·T·M·L embeds so that their
502 metadata does not propogate up to the embedding document.
503 To undo this behaviour, remove the `@itemscope` and `@itemtype`
504 attributes from the embed during the transformation phase.
506 - **`transforms/serialization.xslt`:**
507 Replaces `<书社:serialize-xml>` elements with the (escaped)
508 serialized X·M·L of their contents.
509 This replacement happens during the application phase, after most
510 other transformations have taken place.
512 If a `@with-namespaces` attribute is provided, any name·space nodes
513 on the toplevel serialized elements whose U·R·I’s correspond to the
514 definitions of the provided prefixes, as defined for the
515 `<书社:serialize-xml>` element, will be declared using name·space
516 attributes on the serialized elements.
517 Otherwise, only name·space nodes which _differ_ from the definitions
518 on the `<书社:serialize-xml>` element will be declared.
519 The string `#default` may be used to represent the default
521 Multiple prefixes may be provided, separated by white·space.
523 When it comes to name·spaces used internally by ⛩️📰 书社, the
524 prefix used by ⛩️📰 书社 may be declared _in addition to_ the
525 prefix(es) used in the source document(s).
526 It is not possible to selectively only declare one prefix for a
527 name·space to the exclusion of others.
529 `<书社:raw-output>` elements may be used inside of
530 `<书社:serialize-xml>` elements to inject raw output into the
533 The following are recommendations on effective creation of
536 - Make template matchers as specific as possible.
537 It is likely an error if two transforms have templates which match
538 the same element (unless the templates have different priority).
540 - Name·space templates (with `@name` or `@mode`) whenever possible.
542 - Set `@exclude-result-prefixes` on the root `xslt:transform` element
543 to reduce the number of declared name·spaces in the final result.
547 The following params are made available globally in parsers and
554 The checksum of the source file (⅌ `cksum`).
557 The ⛩️📰 书社 identifier of the source file (a u·r·i beginning with
561 The value of the `SRCREV` variable (if present).
564 The time at which the source file was last modified.
565 Due to limitations in Posix, this time will only have minute
566 precision if the file was modified in the last six months, and will
567 only have day precision if the file is older.
568 Users should not expect this value to be particularly stable.
571 The value of the `THISREV` variable (if present).
573 The following params are only available in transforms :—
576 The path of the catalog file (within `BUILDDIR`).
579 The path of the output file (within `DESTDIR`).
583 ⛩️📰 书社 will wrap the final output of the transforms in appropriate
584 `<html:html>` and `<html:body>` elements, so it is not necessary for
585 transforms to do this explicitly.
586 After performing the initial transform, ⛩️📰 书社 will match the root
587 node of the result in the following modes to fill in areas of the
591 The result of matching in this mode is prepended into the
592 `<html:body>` of the output (before the transformation result).
595 The result of matching in this mode is appended into the
596 `<html:body>` of the output (after the transformation result).
599 The result of matching in this mode is inserted into the
600 `<html:head>` of the output.
602 In addition to being called with the transform result, each of these
603 modes will additionally be called with a `<xslt:include>` element
604 corresponding to each transform.
605 If a transform has a `<书社:id>` top‐level element whose value is an
606 i·r·i, its `<xslt:include>` element will have a corresponding
608 This mechanism can be used to allow transforms to insert content
609 without matching any elements in the result; for example, the
610 following transform adds a link to a stylesheet to the `<html:head>`
614 <?xml version="1.0"?>
616 xmlns="http://www.w3.org/1999/XSL/Transform"
617 xmlns:html="http://www.w3.org/1999/xhtml"
618 xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
619 xmlns:书社="urn:fdc:ladys.computer:20231231:Shu1She4"
620 exclude-result-prefixes="书社"
623 <书社:id>example:add-stylesheet-links.xslt</书社:id>
624 <template match="xslt:include[@书社:id='example:add-stylesheet-links.xslt']" mode="书社:metadata">
625 <html:link rel="stylesheet" type="text/css" href="/style.css"/>
630 Output wrapping can be entirely disabled by adding a
631 `@书社:disable-output-wrapping` attribute to the top‐level element in
633 It will not be performed on outputs whose root elements are
634 `<书社:archive>`, `<书社:base64-binary>`, or `<书社:raw-text>`
637 ## Applying Attributes
639 The `<书社:apply-attributes>` element will apply any attributes on the
640 element to the element(s) it wraps.
641 It is especially useful in combination with embeds.
643 The `<书社:apply-attributes-to-root>` element will apply any attributes
644 on the element to the root node of the final transformation result.
645 It is especially useful in combination with output wrapping.
647 In both cases, attributes from various sources are combined with
648 white·space between them.
649 Attribute application takes place after all ordinary transforms have
652 Both elements ignore attributes in the `xml:` name·space, except for
653 `@xml:lang`, which ignores all but the first definition (including
654 any already present on the root element).
655 On H·T·M·L and S·V·G elements, `@lang` has the same behaviour as
658 ## Other Kinds of Output
660 There are a few special elements in the `书社:` name·space which, if
661 they appear as the toplevel element in a transformation result, cause
662 ⛩️📰 书社 to produce something other than an X·M·L file.
665 - **`<书社:archive>`:**
666 Each child element with a `@书社:archived-as` attribute will be
667 archived as a separate file in a resulting tarball (this attribute
668 gives the file name).
669 These elements will be processed the same as the root elements of any
670 other file (e·g, they will be wrapped; they can themselves specify
671 non X·M·L output types, ⁊·c).
672 Other child elements will be ignored.
674 If the `<书社:archive>` element is given an `@书社:expanded`
675 attribute, rather than producing a tarball ⛩️📰 书社 will output
676 the directory which expanding the tarball would produce.
677 This mechanism can be used to generate multiple files from a single
678 source, provided all of the files are contained with·in the same
681 - **`<书社:base64-binary>`:**
682 The text nodes in the transformation result will, after removing all
683 Ascii whitespace, be treated as a Base·64 string, which is then
686 - **`<书社:raw-text>`:**
687 A plaintext (U·T·F‐8) file will be produced from the text nodes in
688 the transformation result.
692 This repository conforms to [REUSE][].
694 Most source files are licensed under the terms of the <cite>Mozilla
695 Public License, version 2.0</cite>.
697 [REUSE]: <https://reuse.software/spec/>
698 [draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>