Support parsed metadata

[Shushe] / README.markdown
diff --git a/README.markdown b/README.markdown

index 503a16338dda83163e59c655e8c9a02cf00d66ec..0b4c4b4ef135c7375368a9f78e859964998e4cf2 100644 (file)
--- a/README.markdown
+++ b/README.markdown
@@ -1,5 +1,5 @@
  <!--
-SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
+SPDX-FileCopyrightText: 2024, 2025 Lady <https://www.ladys.computer/about/#lady>
  SPDX-License-Identifier: CC0-1.0
  -->
  # ⛩📰 书社
@@ -188,6 +188,7 @@ This document uses a few name·space prefixes, with the following
  |    `exsl:` | `http://exslt.org/common`                     |
  | `exslstr:` | `http://exslt.org/strings`                    |
  |    `html:` | `http://www.w3.org/1999/xhtml`                |
+|     `rdf:` | `http://www.w3.org/1999/02/22-rdf-syntax-ns#` |
  |     `svg:` | `http://www.w3.org/2000/svg`                  |
  |   `xlink:` | `http://www.w3.org/1999/xlink`                |
  |    `xslt:` | `http://www.w3.org/1999/XSL/Transform`        |
@@ -307,6 +308,12 @@ The following additional variables can be used to control the behaviour
      default, to enable additional rules without overriding the existing
      ones.
  
+- **`DATAOPTS`:**
+  Additional options to use when calling Make during the first stage of a two‐stage build using `DATADIR`.
+
+  This can be used to override variables which are only applicable during the second stage.
+  Note that when supplying this variable on the shell, it will need to be double‐quoted.
+
  - **`DATAEXT`:**
    A list of file extensions which signify “data” files during a two‐stage build using `DATADIR`.
  
@@ -319,6 +326,32 @@ The following additional variables can be used to control the behaviour
      default, to enable additional rules without overriding the existing
      ones.
  
+- **`FINDFILTERONLY`:**
+  A semicolon‐separated list of regular expressions, at least one of which the paths for sources and includes are required to match, unless empty (default: empty).
+
+- **`FINDFILTEROUT`:**
+  A semicolon‐separated list of regular expressions, each of which matches paths that should _not_ be considered sources or includes (default: empty).
+
+- **`FINDINCLUDEFILTERONLY`:**
+  A semicolon‐separated list of regular expressions, at least one of which the paths for includes are required to match, unless empty (default: empty).
+
+  Note that only paths which already match `FINDFILTERONLY` are considered.
+
+- **`FINDINCLUDEFILTEROUT`:**
+  A semicolon‐separated list of regular expressions, each of which matches paths that should _not_ be considered includes, but may still be considered sources (default: empty).
+
+- **`FINDFILTERONLYEXTENDED`:**
+  If non·empty, `FINDFILTERONLY` is an extended regular expression; otherwise, it is basic (default: empty).
+
+- **`FINDFILTEROUTEXTENDED`:**
+  If non·empty, `FINDFILTEROUT` is an extended regular expression; otherwise, it is basic (default: matches `FINDFILTERONLYEXTENDED`).
+
+- **`FINDINCLUDEFILTERONLYEXTENDED`:**
+  If non·empty, `FINDINCLUDEFILTERONLY` is an extended regular expression; otherwise, it is basic (default: matches `FINDFILTERONLYEXTENDED`).
+
+- **`FINDINCLUDEFILTEROUTEXTENDED`:**
+  If non·empty, `FINDINCLUDEFILTEROUT` is an extended regular expression; otherwise, it is basic (default: `1` if either `FINDFILTEROUTEXTENDED` or `FINDINCLUDEFILTERONLYEXTENDED` is non·empty).
+
  - **`PARSERS`:**
    A white·space‐separated list of parsers to use (default:
      `$(THISDIR)/parsers/*.xslt`).
@@ -515,6 +548,18 @@ These include :⁠—
  - A `@书社:media-type` attribute, giving the identified media type of
      the plaintext node.
  
+### Parsed metadata
+
+It is possible to extract metadata from a document at the same time as
+  it is being parsed.
+This is done by creating result elements in the `书社:about` mode;
+  these should be R·D·F property elements which apply to the conceptual
+  entity that is the document being parsed.
+
+During transformation, metadata for the file with identifier `$FILE`
+  can be read from the children of
+  `$书社:about//*[@rdf:about=$FILE]/nie:interpretedAs/*`.
+
  ## Output Redirection
  
  By default, ⛩📰 书社 installs files to the same location in `DESTDIR`
@@ -585,7 +630,7 @@ Transforms are used to convert X·M·L files into their final output,
      media types into the appropriate H·T·M·L elements, and deletes
      `<html:style>` elements from the body of the document and moves
      them to the head.
-  This conversion happens during the application phase, after the main
+  This conversion happens during the finalization phase, after the main
      transformation.
  
  - **`transforms/metadata.xslt`:**
@@ -604,7 +649,7 @@ Transforms are used to convert X·M·L files into their final output,
  - **`transforms/serialization.xslt`:**
    Replaces `<书社:serialize-xml>` elements with the (escaped)
      serialized X·M·L of their contents.
-  This replacement happens during the application phase, after most
+  This replacement happens during the finalization phase, after most
      other transformations have taken place.
  
    If a `@with-namespaces` attribute is provided, any name·space nodes
@@ -658,6 +703,25 @@ The following params are made available globally in parsers and
  - **`THISREV`:**
    The value of the `THISREV` variable (if present).
  
+In transforms, the following params are additionally available :⁠—
+
+- **`书社:about`:**
+  R·D·F metadata about all of the documents ⛩📰 书社 knows about.
+  Use `$书社:about//*[@rdf:about=$IDENTIFIER]` to get the metadata for
+    the current document.
+
+- **`书社:source`:**
+  The parsed source document being transformed, prior to any expansion.
+
+- **`书社:expansion`:**
+  The document after the all embeds have been expanded.
+  Unavailable during the `书社:expand` stage.
+
+- **`书社:result`:**
+  The document after the main set of transformations have been applied.
+  Only available during the `书社:finalize` stage, where it is used to
+    apply output wrapping and other clean·up.
+
  ## Output Wrapping
  
  Provided at least one toplevel result element belongs to the H·T·M·L
@@ -730,8 +794,8 @@ It is especially useful in combination with output wrapping.
  
  In both cases, attributes from various sources are combined with
    white·space between them.
-Attribute application takes place after all ordinary transforms have
-  completed.
+Attribute application takes place after each stage of the
+  transformation, including after the initial embedding phase.
  
  Both elements ignore attributes in the `xml:` name·space, except for
    `@xml:lang`, which ignores all but the first definition (including