2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
9 Each language is given a directory inside of `data/`, named by language
11 Within this directory the following subdirectories may exist :—
14 An X·M·L file providing basic information about the language, its
15 encoding, and its variants.
18 Codex entries for the language, in the manner of
22 Prose documentation for the language.
25 Files from which those in `VARIANT/` and `docs/` were derived.
26 By convention, all X·M·L files have a `encoding` component to their
27 X·M·L declaration, which is used to identify them as “assets” and
28 avoid further processing.
31 Extant texts written in the language.
34 A directory of lexemes for a given `VARIANT`, which has the form of a
35 variant subtag (a digit followed by three to seven characters, or
36 a letter followed by four to seven characters).
37 When formulating language tags using these variants, they must be
38 preceded by the singleton `x`, as they are not registered.
40 Variants which begin with the string `block` are _blocks_, intended
41 to partition the language semantically to make it easier to work
43 This is the common form of variant for actively‐developed languages.
45 Other variants are not partitions, and instead denote different
46 versions of the language thru time or space.
47 For example, the variant `qho-0001` denotes the version of the
48 language which precedes `qho-0002`.
50 Each variant directory, itself, contains the following files :—
53 A single lexeme with·in the variant.
54 `LEXEME` is an Ascii representation of the lemma form of the lexeme
55 which matches the following regular expression :—
57 [&'=@0-9A-Za-z~-]\.?(_?[&'=@0-9A-Za-z~-]\.?)*__[1-9][0-9]*
59 ## Languages, Scripts, and Tags
61 Each language developed in this repository is assigned a (private·use)
62 primary language subtag in the range `qga`‥`qpz`.
63 This is outside of the range reserved by Unicode (`qaa`‥`qfy`) and
64 leaves the tags `qfz` and `qqa`‥`qtz` for implementations.
65 The current list of assigned primary language subtags is as
68 | Language Subtag | Language Name |
69 | :-------------: | ------------- |
73 | `qjx` | Pre‐Zheshwi |
78 This repository also reserves the script subtags `Qaaq`‥`Qabp`,
79 leaving aside `Qaaa`‥`Qaap` for Unicode and `Qabq`‥`Qabx` for
81 The current list of assigned script tags is as follows :—
83 | Script Subtag | Script Name |
84 | :-----------: | ----------- |
85 | `Qabj` | Jastugay Syllables |
87 ## Crossreferences and Identifiers
89 This repository assigns identifiers in the
90 `urn:fdc:langdev.ladys.computer:2024:` namespace.
91 Most of these identifiers can be dereferenced on the Web by prepending
92 `https://langdev.ladys.computer/` to them.
93 (Identifier resolution is handled thru server redirects, not as part of
96 ### Codex Entry Identifiers
98 Codex entries for a language with primary language subtag `PLS` are
99 assigned identifiers of the form :—
101 urn:fdc:langdev.ladys.computer:2024:PLS:cdex:ENTRYID
103 —: where `ENTRYID` is the identifier of the entry within the codex.
105 These identifiers resolve to the files at `/PLS/cdex/ENTRYID.xhtml`.
107 ### Documentation Identifiers
109 Documentation files for a language with primary language subtag `PLS`
110 are assigned identifiers of the form :—
112 urn:fdc:langdev.ladys.computer:2024:PLS:docs:DOCID
114 —: where `DOCID` is some local identifier for the documentation.
116 These identifiers resolve to the files at `/PLS/docs/DOCID/`.
118 ### Source Identifiers
120 Source entries for a language with primary language subtag `PLS` are
121 assigned identifiers of the form :—
123 urn:fdc:langdev.ladys.computer:2024:PLS:srcs:SOURCEID
125 —: where `SOURCEID` is some local identifier for the documentation.
127 These identifiers resolve to the files at `/PLS/srcs/SOURCEID/`.
131 Texts written in a language with primary language subtag `PLS` are
132 assigned identifiers of the form :—
134 urn:fdc:langdev.ladys.computer:2024:PLS:txts:TEXTID
136 —: where `TEXTID` is some local identifier for the text.
138 These identifiers resolve to the files at `/PLS/txts/TEXTID/`.
140 ### Lexeme Identifiers
142 An identifier for a given lexeme can be constructed from its language,
143 variant, and Ascii representation.
144 Given primary language subtag `PLS`, variant subtag `VARIANT`, and
145 Ascii representation `LEXEME`, the resulting identifier is as
148 urn:fdc:langdev.ladys.computer:2024:PLS:VARIANT:LEXEME
150 Within this repository, lexemes reference each other according to the
152 These identifiers resolve to the files at
153 `/PLS/#PLS-VARIANT--LEXEME`.
155 A less‐universal identifier, suitable for use as an X·M·L `ID`, is :—
159 ## Encoding Principles
161 Dictionary information is expressed in a constrained R·D·F format which
162 conforms to the `DTD` in this repository.
163 This D·T·D should not be considered stable and should be inspected for
164 changes when·ever pulling in new data.
168 The `site/` directory contains documentation and data used for building
169 the Langdev website (<https://langdev.ladys.computer/>).
170 Files and directories which are not meant to reference language subtags
171 will be given names which are either :—
173 - Of any length, containing at least one apostrophe, hyphen,
174 underscore, or period.
176 - Exactly four lowercase alphabetic letters (distinguishable from a
177 script subtag as it conventionally begins with a capital letter).
179 - More than four letters or numbers, starting with a capital letter
180 (distinguishable from a variant subtag as it conventionally begins
181 with a lowercase letter).
183 - More than eight characters (all language subtags are eight or fewer
184 characters in length).
186 The site is built using [⛩📰 书社][Shushe].
188 [Caudex]: <https://git.ladys.computer/Caudex>
189 [Shushe]: <https://git.ladys.computer/Shushe>