2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
7 <b>A make·file for X·M·L.</b>
9 <dfn>⛩️📰 书社</dfn> aims to make it easy to generate websites with
10 X·S·L·T and G·N·U Make.
11 It is consequently only a good choice for people who like X·S·L·T and
12 G·N·U Make and wish it were easier to make websites with them.
14 It makes things easier by :—
16 - Automatically identifying source files and characterizing them by
17 type (X·M·L, text, or asset).
19 - Parsing supported text types into X·M·L trees.
21 - Enabling easy inclusion of source files within each other.
23 It aims to do this with zero dependencies beyond the programs already
24 installed on your computer†.
26 † Assuming an operating system with a fairly featureful, and
27 Posix‐compliant, development setup (e·g, macOS).
28 In fact, on Linux you will probably need to install a few programs:
29 `libxml2-utils`, `xsltproc`, `sharutils`, and `pax`.
33 <i lang="cmn-Hans">书社</i> is a Chinese word meaning “publishing
36 The first character, <i lang="cmn-Hans">书</i>, is the simplified form
39 The second character, <i lang="cmn-Hans">社</i>, contemporarily means
40 “association”, but historically referred to the god of the soil and
41 related altars or festivities.
42 In Japanese, it is an alternate spelling for <i lang="ja">やしろ</i>,
43 the word for “Shinto shrine”.
45 The name <i lang="cmn-Hans">书社</i> was chosen to play on this pun, as
46 it is intended as a publishing program for webshrines.
48 In Ascii environments, ⛩️📰 书社 should be written `Shushe`, following
49 the pinyin transliteration.
53 In most cases, ⛩️📰 书社 aims to require only functionality which is
54 present in all Posix‐compliant operating systems.
55 There are a few exceptions.
56 Details on particular programs are given below; if a program is not
57 listed, it is assumed that any Posix‐compliant implementation will
62 This is a Posix utility, but ⛩️📰 书社 currently depends on
63 unspecified behaviour.
64 It requires support for the following additional options :—
66 - **`-C`**, when supplied with `-m`, must be useable to compile a
67 `.mgc` magicfile for use with future invocations of `file`.
69 - **`--files-from`** must be useable to provide a file that `file`
70 should read file·names from, and `-` must be useable in this
71 context to specify the standard input.
73 - **`--mime-type`** must cause `file` to print the internet media type
74 of the file with no charset parameter.
76 - **`--separator`** must be useable to set the separator that `file`
77 uses to separate file names from types.
79 These options are implemented by the
80 [Fine Free File Command](https://darwinsys.com/file/), which is used
81 by most operating systems.
85 This is not a Posix utility.
86 Usage of `git` is optional, but recommended (and activated by default).
87 To disable it, set `GIT=`.
91 This is a Posix utility, but ⛩️📰 书社 currently depends on
92 unspecified behaviour.
93 ⛩️📰 书社 requires specifically the G·N·U version of `make`, and
94 depends on functionality present in version 3.81 or later.
95 It is not expected to work in previous versions, or with other
96 implementations of Make.
100 This is a Posix utility, but not included in the Linux Standard Base or
101 installed by default in many distributions.
102 Only `ustar` format support is required.
104 ### `uudecode` and `uuencode`
106 These are Posix utilities, but not included in the Linux Standard Base
107 or installed by default in many distributions.
108 The G·N·U [Sharutils](https://www.gnu.org/software/sharutils/) package
109 can be installed to access them.
111 ### `xmlcatalog` and `xmllint`
113 These are not a Posix utilities.
114 They is a part of `libxml2`, but may need to be installed separately
115 (e·g by the name `libxml2-utils`).
119 This is not a Posix utility.
120 It is a part of `libxslt`, but may need to be installed separately.
124 Place source files in `sources/` and run `make install` to compile
125 the result to `public/`.
126 Compilation involves the following steps :—
128 1. ⛩️📰 书社 compiles all of the magic files in `magic/` into a single
129 file, `build/magic.mgc`.
131 2. ⛩️📰 书社 processes all of the parsers in `parsers/` and determines
132 the list of supported plaintext types.
134 3. ⛩️📰 书社 identifies all of the source files and includes and uses
135 `build/magic.mgc` to classify them by media type.
137 4. ⛩️📰 书社 parses all plaintext and X·M·L source files and includes
138 and then builds a dependency tree between them.
140 5. ⛩️📰 书社 uses the dependency tree to establish prerequisites for
143 6. ⛩️📰 书社 compiles each output file to `build/result`.
145 7. ⛩️📰 书社 copies most output files from `build/result` to
146 `build/public`, but it does some additional processing instead on
147 those which indicate a non‐X·M·L desired final output form.
149 8. ⛩️📰 书社 copies the final resulting files to `public`.
151 You can use `make list` to list each identified source file or include
152 alongside its computed type and dependencies.
153 As this is a Make‐based program, steps will only be run if the
154 corresponding buildfile or output file is older than its
159 The ⛩️📰 书社 name·space is `urn:fdc:ladys.computer:20231231:Shu1She4`.
161 This document uses a few name·space prefixes, with the following
164 | Prefix | Expansion |
165 | ---------: | :-------------------------------------------- |
166 | `catalog:` | `urn:oasis:names:tc:entity:xmlns:xml:catalog` |
167 | `exsl:` | `http://exslt.org/common` |
168 | `exslstr:` | `http://exslt.org/strings` |
169 | `html:` | `http://www.w3.org/1999/xhtml` |
170 | `svg:` | `http://www.w3.org/2000/svg` |
171 | `xlink:` | `http://www.w3.org/1999/xlink` |
172 | `xslt:` | `http://www.w3.org/1999/XSL/Transform` |
173 | `书社:` | `urn:fdc:ladys.computer:20231231:Shu1She4` |
175 ## Setup and Configuration
177 ⛩️📰 书社 depends on the following programs to run.
178 In every case, you may supply your own implementation by overriding the
179 corresponding (allcaps) variable (e·g, set `MKDIR` to supply your own
180 `mkdir` implementation).
190 - `git` (optional; set `GIT=` to disable)
196 - `pax` (only when generating archives)
207 - `xmlcatalog` (provided by `libxml2`)
208 - `xmllint` (provided by `libxml2`)
209 - `xsltproc` (provided by `libxslt`)
211 The following additional variables can be used to control the behaviour
215 The location of the source files (default: `sources`).
216 Multiple source directories can be provided, so long as the same
217 file subpath doesn’t exist in more than one of them.
220 The location of source includes (default: `sources/includes`).
221 This can be inside of `SRCDIR`, but needn’t be.
222 Multiple include directories can be provided, so long as the same
223 file subpath doesn’t exist in more than one of them.
226 The location of the (temporary) build directory (default: `build`).
227 `make clean` will delete this, and it is recommended that it not be
228 used for programs aside from ⛩️📰 书社.
231 The location of directory to output files to (default: `public`).
232 `make install` will overwrite files in this directory which
233 correspond to those in `SRCDIR`.
234 It *will not* touch other files, including those generated from files
235 in `SRCDIR` which have since been deleted.
237 Files are first compiled to `$(BUILDDIR)/public` before they are
238 copied to `DESTDIR`, so this folder is relatively quick and
239 inexpensive to re·create.
240 It’s reasonable to simply delete it before every `make install` to
241 ensure stale content is removed.
244 The location of the ⛩️📰 书社 `GNUmakefile`.
245 This should be set automatically when calling Make and shouldn’t ever
246 need to be set manually.
247 This variable is used to find the ⛩️📰 书社 `lib/` folder, which is
248 expected to be in the same location.
251 A white·space‐separated list of magic files to use (default:
252 `$(THISDIR)/magic/*`).
255 The value of this variable is appended to `MAGIC` by default, to
256 enable additional magic files without overriding the existing ones.
259 Rules to use with `find` when searching for source files.
260 The default ignores files that start with a period or hyphen‐minus,
261 those which end with a cloparen, and those which contain a hash,
262 buck, percent, asterisk, colon, semi, eroteme, bracket, backslash,
265 - **`EXTRAFINDRULES`:**
266 The value of this variable is appended to `FINDRULES` by default, to
267 enable additional rules without overriding the existing ones.
269 - **`FINDINCLUDERULES`:**
270 Rules to use with `find` when searching for includes (default:
273 - **`EXTRAFINDINCLUDERULES`:**
274 The value of this variable is appended to `FINDINCLUDERULES` by
275 default, to enable additional rules without overriding the existing
279 A white·space‐separated list of parsers to use (default:
280 `$(THISDIR)/parsers/*.xslt`).
282 - **`EXTRAPARSERS`:**
283 The value of this variable is appended to `PARSERS` by default, to
284 enable additional parsers without overriding the existing ones.
287 A white·space‐separated list of transforms to use (default:
288 `$(THISDIR)/transforms/*.xslt`).
290 - **`EXTRATRANSFORMS`:**
291 The value of this variable is appended to `TRANSFORMS` by default, to
292 enable additional transforms without overriding the existing ones.
295 A white·space‐separated list of media types or media type suffixes to
296 consider X·M·L (default: `application/xml text/xml +xml`).
299 The current version of ⛩️📰 书社 (default: derived from the current
300 git tag/branch/commit).
303 The current version of the source files (default: derived from the
304 current git tag/branch/commit).
307 If this variable has a value, informative messages will not be
308 printed (default: empty).
309 Informative messages print to stderr, not stdout, so disabling them
310 usually shouldn’t be necessary.
311 This does not (cannot) disable messages from Make itself, for which
312 the `-s`, `--silent` ∕ `--quiet` Make option is more likely to be
316 If this variable has a value, every recipe instruction will be
317 printed when it runs (default: empty).
318 This is helpful for debugging, but typically too noisy for general
323 Source files may be placed in `SRCDIR` in any manner; the file
324 structure used there will match the output.
325 The type of source files is *not* determined by file extension, but
326 rather by magic number; this means that files **must** begin with
327 something recognizable.
328 Supported magic numbers include :—
330 - `<?xml` for `application/xml` files
331 - `#!js` for `text/javascript` files
332 - `@charset "` for `text/css` files
333 - `#!tsv` for `text/tab-separated-values` files
334 - `%%` for `text/record-jar` files (unregistered; see
335 [[draft-phillips-record-jar-01][]])
337 Text formats with associated X·S·L·T parsers are wrapped in a H·T·M·L
338 `<script>` element whose `@type` gives its media type, and then
339 passed to the parser to process.
340 Source files whose media type does not have an associated X·S·L·T
341 parser are considered “assets” and will not be transformed.
343 **☡ For compatibility with this program, source file·names must not
344 contain Ascii white·space, colons (`:`), semis (`;`), pipes (`|`),
345 bucks (`$`), percents (`%`), hashes (`#`), asterisks (`*`), brackets
346 (`[` or `]`), erotemes (`?`), backslashes (`\`), or control
347 characters, must not begin with a hyphen‐minus (`-`), and must not end
348 with a cloparen (`)`).**
349 The former characters have the potential to conflict with make syntax,
350 a leading hyphen‐minus is confusable for a commandline argument, and a
351 trailing cloparen [activates a bug in G·N·U Make
352 3.81](https://stackoverflow.com/questions/17148468/capturing-filenames-including-parentheses-with-gnu-makes-wildcard-function#comment24825307_17148894).
356 Parsers are used to convert plaintext files into X·M·L trees, as well
357 as convert plaintext formats which are already included inline in
358 existing source X·M·L documents.
359 ⛩️📰 书社 comes with some parsers; namely :—
361 - **`parsers/plain.xslt`:**
362 Wraps `text/plain` contents in a `<html:pre>` element.
364 - **`parsers/record-jar.xslt`:**
365 Converts `text/record-jar` contents into a `<html:div>` of
366 `<html:dl>` elements (one for each record).
368 - **`parsers/tsv.xslt`:**
369 Converts `text/tab-separated-values` contents into an `<html:table>`
372 New ⛩️📰 书社 parsers which target plaintext formats should have an
373 `<xslt:template>` element with no `@name` or `@mode` and whose
376 - Starts with an appropriately‐name·spaced qualified name for a
377 `<html:script>` element.
379 - Follows this with the string `[@type=`.
381 - Follows this with a quoted string giving a media type supported by
383 Media type parameters are *not* supported.
385 - Follows this with the string `]`.
387 For example, the trivial `text/plain` parser is defined as follows :—
390 <?xml version="1.0"?>
392 xmlns="http://www.w3.org/1999/XSL/Transform"
393 xmlns:html="http://www.w3.org/1999/xhtml"
394 xmlns:书社="urn:fdc:ladys.computer:20231231:Shu1She4"
397 <书社:id>example:text/plain</书社:id>
398 <template match="html:script[@type='text/plain']">
399 <html:pre><value-of select="."/></html:pre>
404 ⛩️📰 书社 will scan the provided parsers for this pattern to determine
405 the set of allowed plaintext file types.
406 Multiple such `<xslt:template>` elements may be provided in a single
407 parser, for example if the parser supports multiple media types.
408 Alternatively, you can set the `@书社:supported-media-types` attribute
409 on the root element of the parser to override media type support
412 Even when `@书社:supported-media-types` is set, it is a requirement
413 that each parser transform any `<html:script>` elements with a
414 `@type` which matches their registered types into something else.
415 Otherwise the parser will be stuck in an endless loop.
416 The result tree of applying the transform to the `<html:script>`
417 element will be reparsed (in case any new `<html:script>` elements
418 were added in its subtree), and a `@书社:parsed-by` attribute will be
419 added to each toplevel element in the result.
420 The value of this attribute will be the value of the `<书社:id>`
421 toplevel element in the parser.
423 It is possible for parsers to support zero plaintext types.
424 This is useful when targeting specific dialects of X·M·L; parsers in
425 this sense operate on the same basic principles as transforms
427 The major distinction between X·M·L parsers and transforms is where in
428 the process the transformation happens:
429 Parsers are applied *prior* to embedding (and can be used to generate
430 embeds); transforms are applied *after*.
432 It is **strongly recommended** that auxillary templates in parsers be
433 name·spaced (by `@name` or `@mode`) whenever possible, to avoid
434 conflicts between parsers.
436 ### Attributes added during parsing
438 ⛩️📰 书社 will add a few attributes to the output of the parsing step,
441 - A `@书社:cksum` attribute on toplevel result elements, giving the
442 `cksum` checksum of the corresponding source file.
444 - For the elements which result from parsing plaintext `<html:script>`
447 - A `@书社:parsed-by` attribute, giving a space‐separated list of
448 parsers which parsed the node.
449 (Generally, this will be a list of one, but it is possible for the
450 result of a parse to be another plaintext node, which may be
451 parsed by a different parser.)
453 - A `@书社:media-type` attribute, giving the identified media type of
458 Documents can be embedded in other documents using a `<书社:link>`
459 element with `@xlink:show="embed"`.
460 The `@xlink:href`s of these elements should have the format
461 `about:shushe?source=<path>`, where `<path>` provides the path to the
462 file within `SRCDIR`.
463 Includes, which do not generate outputs of their own but may still be
464 freely embedded, instead use the format
465 `about:shushe?include=<path>`, where `<path>` provides the path
468 Embeds are replaced with the parsed contents of a file, unless the file
469 is an asset, in which case an `<html:object>` element is produced
470 instead (with the contents of the asset file provided as a base64
472 Embed replacements will be given a `@书社:identifier` attribute whose
473 value will match the `@xlink:href` of the embed.
475 Embedding takes place after parsing but before transformation, so
476 parsers are able to generate their own embeds.
477 ⛩️📰 书社 is able to detect the transitive embed dependencies of files
478 and update them accordingly; it will signal an error if the
479 dependencies are recursive.
481 ## Output Redirection
483 By default, ⛩️📰 书社 installs files to the same location in `DESTDIR`
484 as they were placed in their `SRCDIR`.
485 This behaviour can be customized by setting the `@书社:destination`
486 attribute on the root element, whose value can give a different path.
487 This attribute is read after parsing, but before transformation (where
488 it is silently dropped).
492 Transforms are used to convert X·M·L files into their final output,
493 after all necessary parsing and embedding has taken place.
494 ⛩️📰 书社 comes with some transforms; namely :—
496 - **`transforms/asset.xslt`:**
497 Converts `<html:object>` elements which correspond to recognized
498 media types into the appropriate H·T·M·L elements, and deletes
499 `<html:style>` elements from the body of the document and moves
502 - **`transforms/metadata.xslt`:**
503 Provides basic `<html:head>` metadata.
504 This metadata is generated from `<html:meta>` elements with one of
505 the following `@itemprop` attributes :—
507 - **`urn:fdc:ladys.computer:20231231:Shu1She4:title`:**
508 Provides the title of the page.
510 ⛩️📰 书社 automatically encapsulates H·T·M·L embeds so that their
511 metadata does not propogate up to the embedding document.
512 To undo this behaviour, remove the `@itemscope` and `@itemtype`
513 attributes from the embed during the transformation phase.
515 - **`transforms/serialization.xslt`:**
516 Replaces `<书社:serialize-xml>` elements with the (escaped)
517 serialized X·M·L of their contents.
518 This replacement happens during the application phase, after most
519 other transformations have taken place.
521 If a `@with-namespaces` attribute is provided, any name·space nodes
522 on the toplevel serialized elements whose U·R·I’s correspond to the
523 definitions of the provided prefixes, as defined for the
524 `<书社:serialize-xml>` element, will be declared using name·space
525 attributes on the serialized elements.
526 Otherwise, only name·space nodes which _differ_ from the definitions
527 on the `<书社:serialize-xml>` element will be declared.
528 The string `#default` may be used to represent the default
530 Multiple prefixes may be provided, separated by white·space.
532 When it comes to name·spaces used internally by ⛩️📰 书社, the
533 prefix used by ⛩️📰 书社 may be declared _in addition to_ the
534 prefix(es) used in the source document(s).
535 It is not possible to selectively only declare one prefix for a
536 name·space to the exclusion of others.
538 `<书社:raw-output>` elements may be used inside of
539 `<书社:serialize-xml>` elements to inject raw output into the
542 The following are recommendations on effective creation of
545 - Make template matchers as specific as possible.
546 It is likely an error if two transforms have templates which match
547 the same element (unless the templates have different priority).
549 - Name·space templates (with `@name` or `@mode`) whenever possible.
551 - Set `@exclude-result-prefixes` on the root `xslt:transform` element
552 to reduce the number of declared name·spaces in the final result.
556 The following params are made available globally in parsers and
563 The checksum of the source file (⅌ `cksum`).
566 The ⛩️📰 书社 identifier of the source file (a u·r·i beginning with
570 The value of the `SRCREV` variable (if present).
573 The time at which the source file was last modified.
576 The value of the `THISREV` variable (if present).
578 The following params are only available in transforms :—
581 The path of the catalog file (within `BUILDDIR`).
584 The path of the output file (within `DESTDIR`).
588 ⛩️📰 书社 will wrap the final output of the transforms in appropriate
589 `<html:html>` and `<html:body>` elements, so it is not necessary for
590 transforms to do this explicitly.
591 After performing the initial transform, ⛩️📰 书社 will match the root
592 node of the result in the following modes to fill in areas of the
596 The result of matching in this mode is prepended into the
597 `<html:body>` of the output (before the transformation result).
600 The result of matching in this mode is appended into the
601 `<html:body>` of the output (after the transformation result).
604 The result of matching in this mode is inserted into the
605 `<html:head>` of the output.
607 In addition to being called with the transform result, each of these
608 modes will additionally be called with a `<xslt:include>` element
609 corresponding to each transform.
610 If a transform has a `<书社:id>` top‐level element whose value is an
611 i·r·i, its `<xslt:include>` element will have a corresponding
613 This mechanism can be used to allow transforms to insert content
614 without matching any elements in the result; for example, the
615 following transform adds a link to a stylesheet to the `<html:head>`
619 <?xml version="1.0"?>
621 xmlns="http://www.w3.org/1999/XSL/Transform"
622 xmlns:html="http://www.w3.org/1999/xhtml"
623 xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
624 xmlns:书社="urn:fdc:ladys.computer:20231231:Shu1She4"
625 exclude-result-prefixes="书社"
628 <书社:id>example:add-stylesheet-links.xslt</书社:id>
629 <template match="xslt:include[@书社:id='example:add-stylesheet-links.xslt']" mode="书社:metadata">
630 <html:link rel="stylesheet" type="text/css" href="/style.css"/>
635 Output wrapping can be entirely disabled by adding a
636 `@书社:disable-output-wrapping` attribute to the top‐level element in
638 It will not be performed on outputs whose root elements are
639 `<书社:archive>`, `<书社:base64-binary>`, or `<书社:raw-text>`
642 ## Applying Attributes
644 The `<书社:apply-attributes>` element will apply any attributes on the
645 element to the element(s) it wraps.
646 It is especially useful in combination with embeds.
648 The `<书社:apply-attributes-to-root>` element will apply any attributes
649 on the element to the root node of the final transformation result.
650 It is especially useful in combination with output wrapping.
652 In both cases, attributes from various sources are combined with
653 white·space between them.
654 Attribute application takes place after all ordinary transforms have
657 Both elements ignore attributes in the `xml:` name·space, except for
658 `@xml:lang`, which ignores all but the first definition (including
659 any already present on the root element).
660 On H·T·M·L and S·V·G elements, `@lang` has the same behaviour as
663 ## Other Kinds of Output
665 There are a few special elements in the `书社:` name·space which, if
666 they appear as the toplevel element in a transformation result, cause
667 ⛩️📰 书社 to produce something other than an X·M·L file.
670 - **`<书社:archive>`:**
671 Each child element with a `@书社:archived-as` attribute will be
672 archived as a separate file in a resulting tarball (this attribute
673 gives the file name).
674 These elements will be processed the same as the root elements of any
675 other file (e·g, they will be wrapped; they can themselves specify
676 non X·M·L output types, ⁊·c).
677 Other child elements will be ignored.
679 If the `<书社:archive>` element is given an `@书社:expanded`
680 attribute, rather than producing a tarball ⛩️📰 书社 will output
681 the directory which expanding the tarball would produce.
682 This mechanism can be used to generate multiple files from a single
683 source, provided all of the files are contained with·in the same
686 - **`<书社:base64-binary>`:**
687 The text nodes in the transformation result will, after removing all
688 Ascii whitespace, be treated as a Base·64 string, which is then
691 - **`<书社:raw-text>`:**
692 A plaintext (U·T·F‐8) file will be produced from the text nodes in
693 the transformation result.
697 This repository conforms to [REUSE][].
699 Most source files are licensed under the terms of the <cite>Mozilla
700 Public License, version 2.0</cite>.
702 [REUSE]: <https://reuse.software/spec/>
703 [draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>