]> Lady’s Gitweb - Shushe/blob - README.markdown
e99255b94091ff5b1c8e595172d78ec2301e3102
[Shushe] / README.markdown
1 <!--
2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
4 -->
5 # ⛩️📰 书社
6
7 <b>A make·file for X·M·L.</b>
8
9 <dfn>⛩️📰 书社</dfn> aims to make it easy to generate websites with
10 X·S·L·T and G·N·U Make.
11 It is consequently only a good choice for people who like X·S·L·T and
12 G·N·U Make and wish it were easier to make websites with them.
13
14 It makes things easier by :⁠—
15
16 - Automatically identifying source files and characterizing them by
17 type (X·M·L, text, or asset).
18
19 - Parsing supported text types into X·M·L trees.
20
21 - Enabling easy inclusion of source files within each other.
22
23 It aims to do this with zero dependencies beyond the programs already
24 installed on your computer†.
25
26 † The only non‐Posix programs‡ required are those provided by `libxml2`
27 and `libxslt` (which most operating systems provide), but on Linux
28 machines the commandline utilities may need to be installed
29 separately as **`libxml2-utils`** and **`xsltproc`**.
30 Additionally, not all Linux distributions bundle all necessary Posix
31 programs; on Debian (for example) you may need to separately install
32 **`sharutils`** for `uudecode` and `uuencode` and **`pax`** for
33 archiving.
34
35 ‡ This make·file also currently depends on non‐Posix `stat` but
36 attempts to handle both the G·N·U and B·S·D variants.
37 It expects `xargs` to accept a `-0` option, which, while widely
38 supported, is not a part of the Posix standard.
39
40 **Note:**
41 ⛩️📰 书社 requires functionality present in G·N·U Make 3.81 (or later)
42 and will not work in previous versions, or other implementations of
43 Make.
44 Compatibility with later versions of G·N·U Make is assumed, but not
45 tested.
46
47 ## Nomenclature
48
49 <i lang="cmn-Hans">书社</i> is a Chinese word meaning “publishing
50 house”.
51
52 The first character, <i lang="cmn-Hans">书</i>, is the simplified form
53 of “document”.
54
55 The second character, <i lang="cmn-Hans">社</i>, contemporarily means
56 “association”, but historically referred to the god of the soil and
57 related altars or festivities.
58 In Japanese, it is an alternate spelling for <i lang="ja">やしろ</i>,
59 the word for “Shinto shrine”.
60
61 The name <i lang="cmn-Hans">书社</i> was chosen to play on this pun, as
62 it is intended as a publishing program for webshrines.
63
64 In Ascii environments, ⛩️📰 书社 should be written `Shushe`, following
65 the pinyin transliteration.
66
67 ## Basic Usage
68
69 Place source files in `sources/` and run `make install` to compile
70 the result to `public/`.
71 Compilation involves the following steps :⁠—
72
73 1. ⛩️📰 书社 compiles all of the magic files in `magic/` into a single
74 file, `build/magic.mgc`.
75
76 2. ⛩️📰 书社 processes all of the parsers in `parsers/` and determines
77 the list of supported plaintext types.
78
79 3. ⛩️📰 书社 identifies all of the source files and includes and uses
80 `build/magic.mgc` to classify them by media type.
81
82 4. ⛩️📰 书社 parses all plaintext and X·M·L source files and includes
83 and then builds a dependency tree between them.
84
85 5. ⛩️📰 书社 uses the dependency tree to establish prerequisites for
86 each output file.
87
88 6. ⛩️📰 书社 compiles each output file to `build/result`.
89
90 7. ⛩️📰 书社 copies most output files from `build/result` to
91 `build/public`, but it does some additional processing instead on
92 those which indicate a non‐X·M·L desired final output form.
93
94 8. ⛩️📰 书社 copies the final resulting files to `public`.
95
96 You can use `make list` to list each identified source file or include
97 alongside its computed type and dependencies.
98 As this is a Make‐based program, steps will only be run if the
99 corresponding buildfile or output file is older than its
100 prerequisites.
101
102 ## Name·spaces
103
104 The ⛩️📰 书社 name·space is `urn:fdc:ladys.computer:20231231:Shu1She4`.
105
106 This document uses a few name·space prefixes, with the following
107 meanings :⁠—
108
109 | Prefix | Expansion |
110 | ---------: | :-------------------------------------------- |
111 | `catalog:` | `urn:oasis:names:tc:entity:xmlns:xml:catalog` |
112 | `exsl:` | `http://exslt.org/common` |
113 | `exslstr:` | `http://exslt.org/strings` |
114 | `html:` | `http://www.w3.org/1999/xhtml` |
115 | `svg:` | `http://www.w3.org/2000/svg` |
116 | `xlink:` | `http://www.w3.org/1999/xlink` |
117 | `xslt:` | `http://www.w3.org/1999/XSL/Transform` |
118 | `书社:` | `urn:fdc:ladys.computer:20231231:Shu1She4` |
119
120 ## Setup and Configuration
121
122 ⛩️📰 书社 depends on the following programs to run.
123 In every case, you may supply your own implementation by overriding the
124 corresponding (allcaps) variable (e·g, set `MKDIR` to supply your own
125 `mkdir` implementation).
126
127 - `awk`
128 - `cat`
129 - `cksum`
130 - `cp`
131 - `date`
132 - `echo`
133 - `file`
134 - `find`
135 - `git` (optional; set `GIT=` to disable)
136 - `grep`
137 - `ln`
138 - `mkdir`
139 - `mv`
140 - `od`
141 - `pax` (only when generating archives)
142 - `printf`
143 - `rm`
144 - `sed`
145 - `sleep`
146 - `stat` (BSD *or* GNU)
147 - `test`
148 - `touch`
149 - `tr`
150 - `uuencode`
151 - `uudecode`
152 - `xargs` (requires support for `-0`)
153 - `xmlcatalog` (provided by `libxml2`)
154 - `xmllint` (provided by `libxml2`)
155 - `xsltproc` (provided by `libxslt`)
156
157 The following additional variables can be used to control the behaviour
158 of ⛩️📰 书社 :⁠—
159
160 - **`SRCDIR`:**
161 The location of the source files (default: `sources`).
162 Multiple source directories can be provided, so long as the same
163 file subpath doesn’t exist in more than one of them.
164
165 - **`INCLUDEDIR`:**
166 The location of source includes (default: `sources/includes`).
167 This can be inside of `SRCDIR`, but needn’t be.
168 Multiple include directories can be provided, so long as the same
169 file subpath doesn’t exist in more than one of them.
170
171 - **`BUILDDIR`:**
172 The location of the (temporary) build directory (default: `build`).
173 `make clean` will delete this, and it is recommended that it not be
174 used for programs aside from ⛩️📰 书社.
175
176 - **`DESTDIR`:**
177 The location of directory to output files to (default: `public`).
178 `make install` will overwrite files in this directory which
179 correspond to those in `SRCDIR`.
180 It *will not* touch other files, including those generated from files
181 in `SRCDIR` which have since been deleted.
182
183 Files are first compiled to `$(BUILDDIR)/public` before they are
184 copied to `DESTDIR`, so this folder is relatively quick and
185 inexpensive to re·create.
186 It’s reasonable to simply delete it before every `make install` to
187 ensure stale content is removed.
188
189 - **`THISDIR`:**
190 The location of the ⛩️📰 书社 `GNUmakefile`.
191 This should be set automatically when calling Make and shouldn’t ever
192 need to be set manually.
193 This variable is used to find the ⛩️📰 书社 `lib/` folder, which is
194 expected to be in the same location.
195
196 - **`MAGIC`:**
197 A white·space‐separated list of magic files to use (default:
198 `$(THISDIR)/magic/*`).
199
200 - **`EXTRAMAGIC`:**
201 The value of this variable is appended to `MAGIC` by default, to
202 enable additional magic files without overriding the existing ones.
203
204 - **`FINDRULES`:**
205 Rules to use with `find` when searching for source files.
206 The default ignores files that start with a period or hyphen‐minus,
207 those which end with a cloparen, and those which contain a hash,
208 buck, percent, asterisk, colon, semi, eroteme, bracket, backslash,
209 or pipe.
210
211 - **`EXTRAFINDRULES`:**
212 The value of this variable is appended to `FINDRULES` by default, to
213 enable additional rules without overriding the existing ones.
214
215 - **`FINDINCLUDERULES`:**
216 Rules to use with `find` when searching for includes (default:
217 `$(FINDRULES)`).
218
219 - **`EXTRAFINDINCLUDERULES`:**
220 The value of this variable is appended to `FINDINCLUDERULES` by
221 default, to enable additional rules without overriding the existing
222 ones.
223
224 - **`PARSERS`:**
225 A white·space‐separated list of parsers to use (default:
226 `$(THISDIR)/parsers/*.xslt`).
227
228 - **`EXTRAPARSERS`:**
229 The value of this variable is appended to `PARSERS` by default, to
230 enable additional parsers without overriding the existing ones.
231
232 - **`TRANSFORMS`:**
233 A white·space‐separated list of transforms to use (default:
234 `$(THISDIR)/transforms/*.xslt`).
235
236 - **`EXTRATRANSFORMS`:**
237 The value of this variable is appended to `TRANSFORMS` by default, to
238 enable additional transforms without overriding the existing ones.
239
240 - **`XMLTYPES`:**
241 A white·space‐separated list of media types to consider X·M·L
242 (default: `application/xml text/xml`).
243
244 - **`THISREV`:**
245 The current version of ⛩️📰 书社 (default: derived from the current
246 git tag/branch/commit).
247
248 - **`SRCREV`:**
249 The current version of the source files (default: derived from the
250 current git tag/branch/commit).
251
252 - **`VERBOSE`:**
253 If this variable has a value, every recipe instruction will be
254 printed when it runs (default: empty).
255 This is helpful for debugging, but typically too noisy for general
256 usage.
257
258 ## Source Files
259
260 Source files may be placed in `SRCDIR` in any manner; the file
261 structure used there will match the output.
262 The type of source files is *not* determined by file extension, but
263 rather by magic number; this means that files **must** begin with
264 something recognizable.
265 Supported magic numbers include :⁠—
266
267 - `<?xml` for `application/xml` files
268 - `#!js` for `text/javascript` files
269 - `@charset "` for `text/css` files
270 - `#!tsv` for `text/tab-separated-values` files
271 - `%%` for `text/record-jar` files (unregistered; see
272 [[draft-phillips-record-jar-01][]])
273
274 Text formats with associated X·S·L·T parsers are wrapped in a H·T·M·L
275 `<script>` element whose `@type` gives its media type, and then
276 passed to the parser to process.
277 Source files whose media type does not have an associated X·S·L·T
278 parser are considered “assets” and will not be transformed.
279
280 **☡ For compatibility with this program, source file·names must not
281 contain Ascii white·space, colons (`:`), semis (`;`), pipes (`|`),
282 bucks (`$`), percents (`%`), hashes (`#`), asterisks (`*`), brackets
283 (`[` or `]`), erotemes (`?`), backslashes (`\`), or control
284 characters, must not begin with a hyphen‐minus (`-`), and must not
285 end with a cloparen (`)`).**
286 The former characters have the potential to conflict with make syntax,
287 a leading hyphen‐minus is confusable for a command‐line argument, and
288 a trailing cloparen [activates a bug in G·N·U Make
289 3.81](https://stackoverflow.com/questions/17148468/capturing-filenames-including-parentheses-with-gnu-makes-wildcard-function#comment24825307_17148894).
290
291 ## Parsers
292
293 Parsers are used to convert plaintext files into X·M·L trees, as well
294 as convert plaintext formats which are already included inline in
295 existing source X·M·L documents.
296 ⛩️📰 书社 comes with some parsers; namely :⁠—
297
298 - **`parsers/plain.xslt`:**
299 Wraps `text/plain` contents in a `<html:pre>` element.
300
301 - **`parsers/record-jar.xslt`:**
302 Converts `text/record-jar` contents into a `<html:div>` of
303 `<html:dl>` elements (one for each record).
304
305 - **`parsers/tsv.xslt`:**
306 Converts `text/tab-separated-values` contents into an `<html:table>`
307 element.
308
309 New ⛩️📰 书社 parsers which target plaintext formats should have an
310 `<xslt:template>` element with no `@name` or `@mode` and whose
311 `@match` attribute…
312
313 - Starts with an appropriately‐name·spaced qualified name for a
314 `<html:script>` element.
315
316 - Follows this with the string `[@type=`.
317
318 - Follows this with a quoted string giving a media type supported by
319 the parser.
320 Media type parameters are *not* supported.
321
322 - Follows this with the string `]`.
323
324 For example, the trivial `text/plain` parser is defined as follows :⁠—
325
326 ```xml
327 <?xml version="1.0"?>
328 <transform
329 xmlns="http://www.w3.org/1999/XSL/Transform"
330 xmlns:html="http://www.w3.org/1999/xhtml"
331 xmlns:书社="urn:fdc:ladys.computer:20231231:Shu1She4"
332 version="1.0"
333 >
334 <书社:id>example:text/plain</书社:id>
335 <template match="html:script[@type='text/plain']">
336 <html:pre><value-of select="."/></html:pre>
337 </template>
338 </transform>
339 ```
340
341 ⛩️📰 书社 will scan the provided parsers for this pattern to determine
342 the set of allowed plaintext file types.
343 Multiple such `<xslt:template>` elements may be provided in a single
344 parser, for example if the parser supports multiple media types.
345 Alternatively, you can set the `@书社:supported-media-types` attribute
346 on the root element of the parser to override media type support
347 detection.
348
349 Even when `@书社:supported-media-types` is set, it is a requirement
350 that each parser transform any `<html:script>` elements with a
351 `@type` which matches their registered types into something else.
352 Otherwise the parser will be stuck in an endless loop.
353 The result tree of applying the transform to the `<html:script>`
354 element will be reparsed (in case any new `<html:script>` elements
355 were added in its subtree), and a `@书社:parsed-by` attribute will be
356 added to each toplevel element in the result.
357 The value of this attribute will be the value of the `<书社:id>`
358 toplevel element in the parser.
359
360 It is possible for parsers to support zero plaintext types.
361 This is useful when targeting specific dialects of X·M·L; parsers in
362 this sense operate on the same basic principles as transforms
363 (described below).
364 The major distinction between X·M·L parsers and transforms is where in
365 the process the transformation happens:
366 Parsers are applied *prior* to embedding (and can be used to generate
367 embeds); transforms are applied *after*.
368
369 It is **strongly recommended** that auxillary templates in parsers be
370 name·spaced (by `@name` or `@mode`) whenever possible, to avoid
371 conflicts between parsers.
372
373 ### Attributes added during parsing
374
375 ⛩️📰 书社 will add a few attributes to the output of the parsing step,
376 namely :⁠—
377
378 - A `@书社:cksum` attribute on toplevel result elements, giving the
379 `cksum` checksum of the corresponding source file.
380
381 - For the elements which result from parsing plaintext `<html:script>`
382 elements :⁠—
383
384 - A `@书社:parsed-by` attribute, giving a space‐separated list of
385 parsers which parsed the node.
386 (Generally, this will be a list of one, but it is possible for the
387 result of a parse to be another plaintext node, which may be
388 parsed by a different parser.)
389
390 - A `@书社:media-type` attribute, giving the identified media type of
391 the plaintext node.
392
393 ## Embedding
394
395 Documents can be embedded in other documents using a `<书社:link>`
396 element with `@xlink:show="embed"`.
397 The `@xlink:href`s of these elements should have the format
398 `about:shushe?source=<path>`, where `<path>` provides the path to the
399 file within `SRCDIR`.
400 Includes, which do not generate outputs of their own but may still be
401 freely embedded, instead use the format
402 `about:shushe?include=<path>`, where `<path>` provides the path
403 within `INCLUDEDIR`.
404
405 Embeds are replaced with the parsed contents of a file, unless the file
406 is an asset, in which case an `<html:object>` element is produced
407 instead (with the contents of the asset file provided as a base64
408 `data:` u·r·i).
409 Embed replacements will be given a `@书社:identifier` attribute whose
410 value will match the `@xlink:href` of the embed.
411
412 Embedding takes place after parsing but before transformation, so
413 parsers are able to generate their own embeds.
414 ⛩️📰 书社 is able to detect the transitive embed dependencies of files
415 and update them accordingly; it will signal an error if the
416 dependencies are recursive.
417
418 ## Output Redirection
419
420 By default, ⛩️📰 书社 installs files to the same location in `DESTDIR`
421 as they were placed in their `SRCDIR`.
422 This behaviour can be customized by setting the `@书社:destination`
423 attribute on the root element, whose value can give a different path.
424 This attribute is read after parsing, but before transformation (where
425 it is silently dropped).
426
427 ## Transforms
428
429 Transforms are used to convert X·M·L files into their final output,
430 after all necessary parsing and embedding has taken place.
431 ⛩️📰 书社 comes with some transforms; namely :⁠—
432
433 - **`transforms/asset.xslt`:**
434 Converts `<html:object>` elements which correspond to recognized
435 media types into the appropriate H·T·M·L elements, and deletes
436 `<html:style>` elements from the body of the document and moves
437 them to the head.
438
439 - **`transforms/metadata.xslt`:**
440 Provides basic `<html:head>` metadata.
441 This metadata is generated from `<html:meta>` elements with one of
442 the following `@itemprop` attributes :⁠—
443
444 - **`urn:fdc:ladys.computer:20231231:Shu1She4:title`:**
445 Provides the title of the page.
446
447 ⛩️📰 书社 automatically encapsulates H·T·M·L embeds so that their
448 metadata does not propogate up to the embedding document.
449 To undo this behaviour, remove the `@itemscope` and `@itemtype`
450 attributes from the embed during the transformation phase.
451
452 - **`transforms/serialization.xslt`:**
453 Replaces `<书社:serialize-xml>` elements with the (escaped)
454 serialized X·M·L of their contents.
455 This replacement happens during the application phase, after most
456 other transformations have taken place.
457
458 If a `@with-namespaces` attribute is provided, any name·space nodes
459 on the toplevel serialized elements whose U·R·I’s correspond to the
460 definitions of the provided prefixes, as defined for the
461 `<书社:serialize-xml>` element, will be declared using name·space
462 attributes on the serialized elements.
463 Otherwise, only name·space nodes which _differ_ from the definitions
464 on the `<书社:serialize-xml>` element will be declared.
465 The string `#default` may be used to represent the default
466 name·space.
467 Multiple prefixes may be provided, separated by white·space.
468
469 When it comes to name·spaces used internally by ⛩️📰 书社, the
470 prefix used by ⛩️📰 书社 may be declared _in addition to_ the
471 prefix(es) used in the source document(s).
472 It is not possible to selectively only declare one prefix for a
473 name·space to the exclusion of others.
474
475 `<书社:raw-output>` elements may be used inside of
476 `<书社:serialize-xml>` elements to inject raw output into the
477 serialized X·M·L.
478
479 The following are recommendations on effective creation of
480 transforms :⁠—
481
482 - Make template matchers as specific as possible.
483 It is likely an error if two transforms have templates which match
484 the same element (unless the templates have different priority).
485
486 - Name·space templates (with `@name` or `@mode`) whenever possible.
487
488 - Set `@exclude-result-prefixes` on the root `xslt:transform` element
489 to reduce the number of declared name·spaces in the final result.
490
491 ## Global Params
492
493 The following params are made available globally in parsers and
494 transforms :⁠—
495
496 - **`BUILDTIME`:**
497 The current time.
498
499 - **`CKSUM`:**
500 The checksum of the source file (⅌ `cksum`).
501
502 - **`IDENTIFIER`:**
503 The ⛩️📰 书社 identifier of the source file (a u·r·i beginning with
504 `about:shushe`).
505
506 - **`SRCREV`:**
507 The value of the `SRCREV` variable (if present).
508
509 - **`SRCTIME`:**
510 The time at which the source file was last modified.
511
512 - **`THISREV`:**
513 The value of the `THISREV` variable (if present).
514
515 The following params are only available in transforms :⁠—
516
517 - **`CATALOG`:**
518 The path of the catalog file (within `BUILDDIR`).
519
520 - **`PATH`:**
521 The path of the output file (within `DESTDIR`).
522
523 ## Output Wrapping
524
525 ⛩️📰 书社 will wrap the final output of the transforms in appropriate
526 `<html:html>` and `<html:body>` elements, so it is not necessary for
527 transforms to do this explicitly.
528 After performing the initial transform, ⛩️📰 书社 will match the root
529 node of the result in the following modes to fill in areas of the
530 wrapper :⁠—
531
532 - **`书社:header`:**
533 The result of matching in this mode is prepended into the
534 `<html:body>` of the output (before the transformation result).
535
536 - **`书社:footer`:**
537 The result of matching in this mode is appended into the
538 `<html:body>` of the output (after the transformation result).
539
540 - **`书社:metadata`:**
541 The result of matching in this mode is inserted into the
542 `<html:head>` of the output.
543
544 In addition to being called with the transform result, each of these
545 modes will additionally be called with a `<xslt:include>` element
546 corresponding to each transform.
547 If a transform has a `<书社:id>` top‐level element whose value is an
548 i·r·i, its `<xslt:include>` element will have a corresponding
549 `@书社:id` attribute.
550 This mechanism can be used to allow transforms to insert content
551 without matching any elements in the result; for example, the
552 following transform adds a link to a stylesheet to the `<html:head>`
553 of every page :⁠—
554
555 ```xml
556 <?xml version="1.0"?>
557 <transform
558 xmlns="http://www.w3.org/1999/XSL/Transform"
559 xmlns:html="http://www.w3.org/1999/xhtml"
560 xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
561 xmlns:书社="urn:fdc:ladys.computer:20231231:Shu1She4"
562 exclude-result-prefixes="书社"
563 version="1.0"
564 >
565 <书社:id>example:add-stylesheet-links.xslt</书社:id>
566 <template match="xslt:include[@书社:id='example:add-stylesheet-links.xslt']" mode="书社:metadata">
567 <html:link rel="stylesheet" type="text/css" href="/style.css"/>
568 </template>
569 </transform>
570 ```
571
572 Output wrapping can be entirely disabled by adding a
573 `@书社:disable-output-wrapping` attribute to the top‐level element in
574 the result tree.
575 It will not be performed on outputs whose root elements are
576 `<书社:archive>`, `<书社:base64-binary>`, or `<书社:raw-text>`
577 (described below).
578
579 ## Applying Attributes
580
581 The `<书社:apply-attributes>` element will apply any attributes on the
582 element to the element(s) it wraps.
583 It is especially useful in combination with embeds.
584
585 The `<书社:apply-attributes-to-root>` element will apply any attributes
586 on the element to the root node of the final transformation result.
587 It is especially useful in combination with output wrapping.
588
589 In both cases, attributes from various sources are combined with
590 white·space between them.
591 Attribute application takes place after all ordinary transforms have
592 completed.
593
594 Both elements ignore attributes in the `xml:` name·space, except for
595 `@xml:lang`, which ignores all but the first definition (including
596 any already present on the root element).
597 On H·T·M·L and S·V·G elements, `@lang` has the same behaviour as
598 `@xml:lang`.
599
600 ## Other Kinds of Output
601
602 There are a few special elements in the `书社:` name·space which, if
603 they appear as the toplevel element in a transformation result, cause
604 ⛩️📰 书社 to produce something other than an X·M·L file.
605 They are :⁠—
606
607 - **`<书社:archive>`:**
608 Each child element with a `@书社:archived-as` attribute will be
609 archived as a separate file in a resulting tarball (this attribute
610 gives the file name).
611 These elements will be processed the same as the root elements of any
612 other file (e·g, they will be wrapped; they can themselves specify
613 non X·M·L output types, ⁊·c).
614 Other child elements will be ignored.
615
616 - **`<书社:base64-binary>`:**
617 The text nodes in the transformation result will, after removing all
618 Ascii whitespace, be treated as a Base·64 string, which is then
619 decoded.
620
621 - **`<书社:raw-text>`:**
622 A plaintext (U·T·F‐8) file will be produced from the text nodes in
623 the transformation result.
624
625 ## License
626
627 This repository conforms to [REUSE][].
628
629 Most source files are licensed under the terms of the <cite>Mozilla
630 Public License, version 2.0</cite>.
631
632 [REUSE]: <https://reuse.software/spec/>
633 [draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>
This page took 0.130093 seconds and 3 git commands to generate.