2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
 
   3 SPDX-License-Identifier: CC0-1.0
 
   9 Each language is given a directory inside of `data/`, named by language
 
  11 Within this directory the following subdirectories may exist :—
 
  14   An X·M·L file providing basic information about the language, its
 
  15     encoding, and its variants.
 
  18   Codex entries for the language, in the manner of
 
  22   Prose documentation for the language.
 
  25   Files from which those in `VARIANT/` and `docs/` were derived.
 
  26   By convention, all X·M·L files have a `encoding` component to their
 
  27     X·M·L declaration, which is used to identify them as “assets” and
 
  28     avoid further processing.
 
  31   Extant texts written in the language.
 
  34   A directory of lexemes for a given `VARIANT`, which has the form of a
 
  35     variant subtag (a digit followed by three to seven characters, or
 
  36     a letter followed by four to seven characters).
 
  37   When formulating language tags using these variants, they must be
 
  38     preceded by the singleton `x`, as they are not registered.
 
  40   Variants which begin with the string `block` are _blocks_, intended
 
  41     to partition the language semantically to make it easier to work
 
  43   This is the common form of variant for actively‐developed languages.
 
  45   Other variants are not partitions, and instead denote different
 
  46     versions of the language thru time or space.
 
  47   For example, the variant `qho-0001` denotes the version of the
 
  48     language which precedes `qho-0002`.
 
  50 Each variant directory, itself, contains the following files :—
 
  53   A single lexeme with·in the variant.
 
  54   `LEXEME` is an Ascii representation of the lemma form of the lexeme
 
  55     which matches the following regular expression :—
 
  57       [&'=@0-9A-Za-z~-]\.?(_?[&'=@0-9A-Za-z~-]\.?)*__[1-9][0-9]*
 
  59 ## Languages, Scripts, and Tags
 
  61 Each language developed in this repository is assigned a (private·use)
 
  62   primary language subtag in the range `qga`‥`qpz`.
 
  63 This is outside of the range reserved by Unicode (`qaa`‥`qfy`) and
 
  64   leaves the tags `qfz` and `qqa`‥`qtz` for implementations.
 
  65 The current list of assigned primary language subtags is as
 
  68 | Language Subtag | Language Name |
 
  69 | :-------------: | ------------- |
 
  73 | `qjx` | Pre‐Zheshwi |
 
  78 This repository also reserves the script subtags `Qaaq`‥`Qabp`,
 
  79   leaving aside `Qaaa`‥`Qaap` for Unicode and `Qabq`‥`Qabx` for
 
  81 The current list of assigned script tags is as follows :—
 
  83 | Script Subtag | Script Name |
 
  84 | :-----------: | ----------- |
 
  85 | `Qabj` | Jastugay Syllables |
 
  87 ## Crossreferences and Identifiers
 
  89 This repository assigns identifiers in the
 
  90   `urn:fdc:langdev.ladys.computer:2024:` namespace.
 
  91 Most of these identifiers can be dereferenced on the Web by prepending
 
  92   `https://langdev.ladys.computer/` to them.
 
  93 (Identifier resolution is handled thru server redirects, not as part of
 
  96 ### Codex Entry Identifiers
 
  98 Codex entries for a language with primary language subtag `PLS` are
 
  99   assigned identifiers of the form :—
 
 101     urn:fdc:langdev.ladys.computer:2024:PLS:cdex:ENTRYID
 
 103 —: where `ENTRYID` is the identifier of the entry within the codex.
 
 105 These identifiers resolve to the files at `/PLS/cdex/ENTRYID.xhtml`.
 
 107 ### Documentation Identifiers
 
 109 Documentation files for a language with primary language subtag `PLS`
 
 110   are assigned identifiers of the form :—
 
 112     urn:fdc:langdev.ladys.computer:2024:PLS:docs:DOCID
 
 114 —: where `DOCID` is some local identifier for the documentation.
 
 116 These identifiers resolve to the files at `/PLS/docs/DOCID/`.
 
 118 ### Source Identifiers
 
 120 Source entries for a language with primary language subtag `PLS` are
 
 121   assigned identifiers of the form :—
 
 123     urn:fdc:langdev.ladys.computer:2024:PLS:srcs:SOURCEID
 
 125 —: where `SOURCEID` is some local identifier for the documentation.
 
 127 These identifiers resolve to the files at `/PLS/srcs/SOURCEID/`.
 
 131 Texts written in a language with primary language subtag `PLS` are
 
 132   assigned identifiers of the form :—
 
 134     urn:fdc:langdev.ladys.computer:2024:PLS:txts:TEXTID
 
 136 —: where `TEXTID` is some local identifier for the text.
 
 138 These identifiers resolve to the files at `/PLS/txts/TEXTID/`.
 
 140 ### Lexeme Identifiers
 
 142 An identifier for a given lexeme can be constructed from its language,
 
 143   variant, and Ascii representation.
 
 144 Given primary language subtag `PLS`, variant subtag `VARIANT`, and
 
 145   Ascii representation `LEXEME`, the resulting identifier is as
 
 148     urn:fdc:langdev.ladys.computer:2024:PLS:VARIANT:LEXEME
 
 150 Within this repository, lexemes reference each other according to the
 
 152 These identifiers resolve to the files at
 
 153   `/PLS/#PLS-VARIANT--LEXEME`.
 
 155 A less‐universal identifier, suitable for use as an X·M·L `ID`, is :—
 
 159 ## Encoding Principles
 
 161 Dictionary information is expressed in a constrained R·D·F format which
 
 162   conforms to the `DTD` in this repository.
 
 163 This D·T·D should not be considered stable and should be inspected for
 
 164   changes when·ever pulling in new data.
 
 168 The `site/` directory contains documentation and data used for building
 
 169   the Langdev website (<https://langdev.ladys.computer/>).
 
 170 Files and directories which are not meant to reference language subtags
 
 171   will be given names which are either :—
 
 173 - Of any length, containing at least one apostrophe, hyphen,
 
 174     underscore, or period.
 
 176 - Exactly four lowercase alphabetic letters (distinguishable from a
 
 177     script subtag as it conventionally begins with a capital letter).
 
 179 - More than four letters or numbers, starting with a capital letter
 
 180     (distinguishable from a variant subtag as it conventionally begins
 
 181     with a lowercase letter).
 
 183 - More than eight characters (all language subtags are eight or fewer
 
 184     characters in length).
 
 186 The site is built using [⛩📰 书社][Shushe].
 
 188 [Caudex]: <https://git.ladys.computer/Caudex>
 
 189 [Shushe]: <https://git.ladys.computer/Shushe>