Remove classes from parse results

[Shushe] / README.markdown
diff --git a/README.markdown b/README.markdown

index 893ac2c47230e55c4e2b59e93f31553dc25d5145..400679f090097a8bbb2368fc438a55c6a5c57084 100644 (file)
--- a/README.markdown
+++ b/README.markdown
@@ -91,21 +91,27 @@ In every case, you may supply your own implementation by overriding the
    corresponding (allcaps) variable (e·g, set `MKDIR` to supply your own
    `mkdir` implementation).
  
    corresponding (allcaps) variable (e·g, set `MKDIR` to supply your own
    `mkdir` implementation).
  
+- `awk`
  - `cat`
  - `cp`
  - `cat`
  - `cp`
+- `date`
  - `echo`
  - `file`
  - `find`
  - `echo`
  - `file`
  - `find`
+- `git` (optional; set `GIT=` to disable)
  - `mkdir` (requires support for `-p`)
  - `mv`
  - `mkdir` (requires support for `-p`)
  - `mv`
+- `od` (requires support for `-t x1`)
  - `printf`
  - `rm`
  - `sed`
  - `sleep`
  - `printf`
  - `rm`
  - `sed`
  - `sleep`
+- `stat`
  - `test`
  - `touch`
  - `tr` (requires support for `-d`)
  - `uuencode` (requires support for `-m` and `-r`)
  - `test`
  - `touch`
  - `tr` (requires support for `-d`)
  - `uuencode` (requires support for `-m` and `-r`)
+- `xargs` (requires support for `-0`)
  - `xmlcatalog` (provided by `libxml2`)
  - `xmllint` (provided by `libxml2`)
  - `xsltproc` (provided by `libxslt`)
  - `xmlcatalog` (provided by `libxml2`)
  - `xmllint` (provided by `libxml2`)
  - `xsltproc` (provided by `libxslt`)
@@ -115,16 +121,32 @@ The following additional variables can be used to control the behaviour
  
  - **`SRCDIR`:**
    The location of the source files (default: `sources`).
  
  - **`SRCDIR`:**
    The location of the source files (default: `sources`).
+  Multiple source directories can be provided, so long as the same
+    file subpath doesn’t exist in more than one of them.
  
  - **`INCLUDEDIR`:**
  
  - **`INCLUDEDIR`:**
-  The location of the source files (default: `sources/includes`).
+  The location of source includes (default: `sources/includes`).
    This can be inside of `SRCDIR`, but needn’t be.
    This can be inside of `SRCDIR`, but needn’t be.
+  Multiple include directories can be provided, so long as the same
+    file subpath doesn’t exist in more than one of them.
  
  - **`BUILDDIR`:**
    The location of the (temporary) build directory (default: `build`).
  
  - **`BUILDDIR`:**
    The location of the (temporary) build directory (default: `build`).
+  `make clean` will delete this, and it is recommended that it not be
+    used for programs aside from ⛩️📰 书社.
  
  - **`DESTDIR`:**
    The location of directory to output files to (default: `public`).
  
  - **`DESTDIR`:**
    The location of directory to output files to (default: `public`).
+  `make install` will overwrite files in this directory which
+    correspond to those in `SRCDIR`.
+  It *will not* touch other files, including those generated from files
+    in `SRCDIR` which have since been deleted.
+
+  Files are first compiled to `$(BUILDDIR)/public` before they are
+    copied to `DESTDIR`, so this folder is relatively quick and
+    inexpensive to re·create.
+  It’s reasonable to simply delete it before every `make install` to
+    ensure stale content is removed.
  
  - **`THISDIR`:**
    The location of the ⛩️📰 书社 `GNUmakefile`.
  
  - **`THISDIR`:**
    The location of the ⛩️📰 书社 `GNUmakefile`.
@@ -136,13 +158,15 @@ The following additional variables can be used to control the behaviour
  - **`MAGICDIR`:**
    The location of the magic files to use (default: `$(THISDIR)/magic`).
  
  - **`MAGICDIR`:**
    The location of the magic files to use (default: `$(THISDIR)/magic`).
  
-- **`FINDOPTS`:**
-  Options to pass to `find` when searching for source files (default:
-    `-LE`).
-
  - **`FINDRULES`:**
  - **`FINDRULES`:**
-  Rules to use with `find` when searching for source files (default:
-    `-flags -nohidden -and -not -name '.*'`).
+  Rules to use with `find` when searching for source files.
+  The default ignores hidden files, those that start with a period or
+    hyphen‐minus, and those which contain a pipe, buck, percent, or
+    colon.
+
+- **`FINDINCLUDERULES`:**
+  Rules to use with `find` when searching for includes (default:
+    `$(FINDRULES)`).
  
  - **`PARSERS`:**
    A white·space‐separated list of parsers to use (default:
  
  - **`PARSERS`:**
    A white·space‐separated list of parsers to use (default:
@@ -175,6 +199,8 @@ Supported magic numbers include :⁠—
  - `#!js` for `text/javascript` files
  - `@charset "` for `text/css` files
  - `#!tsv` for `text/tab-separated-values` files
  - `#!js` for `text/javascript` files
  - `@charset "` for `text/css` files
  - `#!tsv` for `text/tab-separated-values` files
+- `%%` for `text/record-jar` files (unregistered; see
+    [[draft-phillips-record-jar-01][]])
  
  Text formats with associated X·S·L·T parsers are wrapped in a H·T·M·L
    `<script>` element whose `@type` gives its media type, and then
  
  Text formats with associated X·S·L·T parsers are wrapped in a H·T·M·L
    `<script>` element whose `@type` gives its media type, and then
@@ -182,18 +208,12 @@ Text formats with associated X·S·L·T parsers are wrapped in a H·T·M·L
  Source files whose media type does not have an associated X·S·L·T
    parser are considered “assets” and will not be transformed.
  
  Source files whose media type does not have an associated X·S·L·T
    parser are considered “assets” and will not be transformed.
  
-For compatibility with this program, source filenames should conform to
-  the following rules :⁠—
-
-- They should not start with a hyphen‐minus.
-  This is to prevent confusion between filenames and options on the
-    commandline.
-
-- They should not contain spaces, colons, percent signs, backticks,
-    question marks, hashes, or backslashes.
-
-In general, filenames should be such that they do not require
-  percent‐encoding in the path component of an i·r·i.
+**☡ For compatibility with this program, source filenames must not
+  contain Ascii whitespace, colons (`:`), pipes (`|`), bucks (`$`),
+  percents (`%`) or control characters, and must not begin with a
+  hyphen‐minus (`-`).**
+The former characters have the potential to conflict with make syntax,
+  and a leading hyphen‐minus is confusable for a command‐line argument.
  
  ## Parsers
  
  
  ## Parsers
  
@@ -203,14 +223,20 @@ Parsers are used to convert plaintext files into X·M·L trees, as well
  ⛩️📰 书社 comes with some parsers; namely :⁠—
  
  - **`parsers/plain.xslt`:**
  ⛩️📰 书社 comes with some parsers; namely :⁠—
  
  - **`parsers/plain.xslt`:**
-  Wraps `text/plain` contents in a `<html:pre>` element.
+  Wraps `text/plain` contents in a `<html:pre class="plain">` element.
+
+- **`parsers/record-jar.xslt`:**
+  Converts `text/record-jar` contents into a
+    `<html:div class="record-jar">` of `<html:dl>` elements (one for
+    each record).
  
  - **`parsers/tsv.xslt`:**
  
  - **`parsers/tsv.xslt`:**
-  Converts `text/tab-separated-values` contents into an `<html:table>`
-    element.
+  Converts `text/tab-separated-values` contents into an
+    `<html:table class="tsv">` element.
  
  
-New ⛩️📰 书社 parsers should have a `<xslt:template>` element with no
-  `@name` or `@mode` and whose `@match` attribute…
+New ⛩️📰 书社 parsers which target plaintext formats should have an
+  `<xslt:template>` element with no `@name` or `@mode` and whose
+  `@match` attribute…
  
  - Starts with an appropriately‐namespaced qualified name for a
      `<html:script>` element.
  
  - Starts with an appropriately‐namespaced qualified name for a
      `<html:script>` element.
@@ -230,8 +256,10 @@ For example, the trivial `text/plain` parser is defined as follows :⁠—
  <transform
    xmlns="http://www.w3.org/1999/XSL/Transform"
    xmlns:html="http://www.w3.org/1999/xhtml"
  <transform
    xmlns="http://www.w3.org/1999/XSL/Transform"
    xmlns:html="http://www.w3.org/1999/xhtml"
+  xmlns:书社="urn:fdc:ladys.computer:20231231:Shu1She4"
    version="1.0"
  >
    version="1.0"
  >
+  <书社:id>example:text/plain</书社:id>
    <template match="html:script[@type='text/plain']">
      <html:pre><value-of select="."/></html:pre>
    </template>
    <template match="html:script[@type='text/plain']">
      <html:pre><value-of select="."/></html:pre>
    </template>
@@ -242,10 +270,33 @@ For example, the trivial `text/plain` parser is defined as follows :⁠—
    the set of allowed plaintext file types.
  Multiple such `<xslt:template>` elements may be provided in a single
    parser, for example if the parser supports multiple media types.
    the set of allowed plaintext file types.
  Multiple such `<xslt:template>` elements may be provided in a single
    parser, for example if the parser supports multiple media types.
-
-It is **strongly recommended** that all templates in parsers other than
-  those described above be namespaced (by `@name` or `@mode`), to avoid
-  conflicts between templates in multiple parsers.
+Alternatively, you can set the `@书社:supported-media-types` attribute
+  on the root element of the parser to override media type support
+  detection.
+
+Even when `@书社:supported-media-types` is set, it is a requirement
+  that each parser transform any `<html:script>` elements with a
+  `@type` which matches their registered types into something else.
+Otherwise the parser will be stuck in an endless loop.
+The result tree of applying the transform to the `<html:script>`
+  element will be reparsed (in case any new `<html:script>` elements
+  were added in its subtree), and a `@书社:parsed-by` attribute will be
+  added to each toplevel element in the result.
+The value of this attribute will be the value of the `<书社:id>`
+  toplevel element in the parser.
+
+It is possible for parsers to support zero plaintext types.
+This is useful when targeting specific dialects of X·M·L; parsers in
+  this sense operate on the same basic principles as transforms
+  (described below).
+The major distinction between X·M·L parsers and transforms is where in
+  the process the transformation happens:
+Parsers are applied *prior* to embedding (and can be used to generate
+  embeds); transforms are applied *after*.
+
+It is **strongly recommended** that auxillary templates in parsers be
+  namespaced (by `@name` or `@mode`) whenever possible, to avoid
+  conflicts between parsers.
  
  ## Embedding
  
  
  ## Embedding
  
@@ -270,24 +321,38 @@ Embedding takes place after parsing but before transformation, so
    and update them accordingly; it will signal an error if the
    dependencies are recursive.
  
    and update them accordingly; it will signal an error if the
    dependencies are recursive.
  
+## Output Redirection
+
+By default, ⛩️📰 书社 installs files to the same location in `DESTDIR`
+  as they were placed in their `SRCDIR`.
+This behaviour can be customized by setting the `@书社:destination`
+  attribute on the root element, whose value can give a different path.
+This attribute is read after parsing, but before transformation (where
+  it is silently dropped).
+
  ## Transforms
  
  Transforms are used to convert X·M·L files into their final output,
    after all necessary parsing and embedding has taken place.
  ⛩️📰 书社 comes with some transforms; namely :⁠—
  
  ## Transforms
  
  Transforms are used to convert X·M·L files into their final output,
    after all necessary parsing and embedding has taken place.
  ⛩️📰 书社 comes with some transforms; namely :⁠—
  
+- **`transforms/attributes.xslt`:**
+  Applies transforms to the children of any `<书社:apply-attributes>`
+    elements, and then applies the attributes of the
+    `<书社:apply-attributes>` to each result child, replacing the
+    element with the result.
+  This is useful in combination with image embeds to apply alt‐text to
+    the resulting `<html:img>`.
+
  - **`transforms/asset.xslt`:**
  - **`transforms/asset.xslt`:**
-  Converts `<html:object type="text/css">` elements into corresponding
-    `<html:link rel="stylesheet">` elements and
-    `<html:object type="text/javascript">` elements into corresponding
-    `<html:script>` elements.
-  This transform enables embedding of `text/css` and `text/javascript`
-    files, which ordinarily are considered assets (as they lack
-    associated parsers).
+  Converts `<html:object>` elements which correspond to recognized
+    media types into the appropriate H·T·M·L elements, and deletes
+    `<html:style>` elements from the body of the document and moves
+    them to the head.
  
  - **`transforms/metadata.xslt`:**
    Provides basic `<html:head>` metadata.
  
  - **`transforms/metadata.xslt`:**
    Provides basic `<html:head>` metadata.
-  This metadata is generated from `<html:meta>` elements with one o.
+  This metadata is generated from `<html:meta>` elements with one of
      the following `@itemprop` attributes :⁠—
  
    - **`urn:fdc:ladys.computer:20231231:Shu1She4:title`:**
      the following `@itemprop` attributes :⁠—
  
    - **`urn:fdc:ladys.computer:20231231:Shu1She4:title`:**
@@ -310,6 +375,33 @@ The following are recommendations on effective creation of
  - Set `@exclude-result-prefixes` on the root `xslt:transform` element
      to reduce the number of declared namespaces in the final result.
  
  - Set `@exclude-result-prefixes` on the root `xslt:transform` element
      to reduce the number of declared namespaces in the final result.
  
+## Global Params
+
+The following params are made available globally in parsers and
+  transforms :⁠—
+
+- **`BUILDTIME`:**
+  The current time.
+
+- **`SRCREV`:**
+  The tag or hash of the current commit in the working directory (if
+    `GIT` is defined and `./.git` exists).
+
+- **`SRCTIME`:**
+  The time at which the source file was last modified.
+
+- **`VERSION`:**
+  The tag or hash of the current commit in `THISDIR` (if `GIT` is
+    defined and `$(THISDIR)/.git` exists).
+
+The following params are only available in transforms :⁠—
+
+- **`CATALOG`:**
+  The path of the catalog file (within `BUILDDIR`).
+
+- **`PATH`:**
+  The path of the output file (within `DESTDIR`).
+
  ## Output Wrapping
  
  ⛩️📰 书社 will wrap the final output of the transforms in appropriate
  ## Output Wrapping
  
  ⛩️📰 书社 will wrap the final output of the transforms in appropriate
@@ -335,7 +427,7 @@ In addition to being called with the transform result, each of these
    modes will additionally be called with a `<xslt:include>` element
    corresponding to each transform.
  If a transform has a `<书社:id>` top‐level element whose value is an
    modes will additionally be called with a `<xslt:include>` element
    corresponding to each transform.
  If a transform has a `<书社:id>` top‐level element whose value is an
-  i·r·i, its `<xslt:import>` element will have a corresponding
+  i·r·i, its `<xslt:include>` element will have a corresponding
    `@书社:id` attribute.
  This mechanism can be used to allow transforms to insert content
    without matching any elements in the result; for example, the
    `@书社:id` attribute.
  This mechanism can be used to allow transforms to insert content
    without matching any elements in the result; for example, the
@@ -368,3 +460,5 @@ Output wrapping can be entirely disabled by adding a
  Source files are licensed under the terms of the <cite>Mozilla Public
    License, version 2.0</cite>.
  For more information, see [LICENSE](./LICENSE).
  Source files are licensed under the terms of the <cite>Mozilla Public
    License, version 2.0</cite>.
  For more information, see [LICENSE](./LICENSE).
+
+[draft-phillips-record-jar-01]: <https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-01>
+\ No newline at end of file