]> Lady’s Gitweb - Langdev/blob - README.markdown
Initial commit with docs/srcs for early langs
[Langdev] / README.markdown
1 <!--
2 SPDX-FileCopyrightText: 2024 Lady <https://www.ladys.computer/about/#lady>
3 SPDX-License-Identifier: CC0-1.0
4 -->
5 # Langdev
6
7 ## Directory Structure
8
9 Each language is given a directory inside of `data/`, named by language
10 code.
11 Within this directory the following subdirectories may exist :⁠—
12
13 - **`info`:**
14 An X·M·L file providing basic information about the language, its
15 encoding, and its variants.
16
17 - **`cdex/`:**
18 Codex entries for the language, in the manner of
19 [🪾📰 Caudex][Caudex].
20
21 - **`docs/`:**
22 Prose documentation for the language.
23
24 - **`srcs/`:**
25 Files from which those in `VARIANT/` and `docs/` were derived.
26 By convention, all X·M·L files have a `encoding` component to their
27 X·M·L declaration, which is used to identify them as “assets” and
28 avoid further processing.
29
30 - **`txts/`:**
31 Extant texts written in the language.
32
33 - **`VARIANT/`:**
34 A directory of lexemes for a given `VARIANT`, which has the form of a
35 variant subtag (a digit followed by three to seven characters, or
36 a letter followed by four to seven characters).
37 When formulating language tags using these variants, they must be
38 preceded by the singleton `x`, as they are not registered.
39
40 Variants which begin with the string `block` are _blocks_, intended
41 to partition the language semantically to make it easier to work
42 with.
43 This is the common form of variant for actively‐developed languages.
44
45 Other variants are not partitions, and instead denote different
46 versions of the language thru time or space.
47 For example, the variant `qho-0001` denotes the version of the
48 language which precedes `qho-0002`.
49
50 Each variant directory, itself, contains the following files :⁠—
51
52 - **`LEXEME`:**
53 A single lexeme with·in the variant.
54 `LEXEME` is an Ascii representation of the lemma form of the lexeme
55 which matches the following regular expression :⁠—
56
57 [&'=@0-9A-Za-z~-]\.?(_?[&'=@0-9A-Za-z~-]\.?)*__[1-9][0-9]*
58
59 ## Languages, Scripts, and Tags
60
61 Each language developed in this repository is assigned a (private·use)
62 primary language subtag in the range `qga`‥`qpz`.
63 This is outside of the range reserved by Unicode (`qaa`‥`qfy`) and
64 leaves the tags `qfz` and `qqa`‥`qtz` for implementations.
65 The current list of assigned primary language subtags is as
66 follows :⁠—
67
68 | Language Subtag | Language Name |
69 | :-------------: | ------------- |
70 | `qho` | Eho |
71 | `qjl` | Jastulae |
72 | `qjt` | Jastugay |
73 | `qjx` | Pre‐Zheshwi |
74 | `qjz` | Zheshwi |
75 | `qlr` | Elrex |
76 | `qpt` | Fizonal |
77
78 This repository also reserves the script subtags `Qaaq`‥`Qabp`,
79 leaving aside `Qaaa`‥`Qaap` for Unicode and `Qabq`‥`Qabx` for
80 implementations.
81 The current list of assigned script tags is as follows :⁠—
82
83 | Script Subtag | Script Name |
84 | :-----------: | ----------- |
85 | `Qabj` | Jastugay Syllables |
86
87 ## Crossreferences and Identifiers
88
89 This repository assigns identifiers in the
90 `urn:fdc:langdev.ladys.computer:2024:` namespace.
91 Most of these identifiers can be dereferenced on the Web by prepending
92 `https://langdev.ladys.computer/` to them.
93 (Identifier resolution is handled thru server redirects, not as part of
94 the build process.)
95
96 ### Codex Entry Identifiers
97
98 Codex entries for a language with primary language subtag `PLS` are
99 assigned identifiers of the form :⁠—
100
101 urn:fdc:langdev.ladys.computer:2024:PLS:cdex:ENTRYID
102
103 —⁠: where `ENTRYID` is the identifier of the entry within the codex.
104
105 These identifiers resolve to the files at `/PLS/cdex/ENTRYID.xhtml`.
106
107 ### Documentation Identifiers
108
109 Documentation files for a language with primary language subtag `PLS`
110 are assigned identifiers of the form :⁠—
111
112 urn:fdc:langdev.ladys.computer:2024:PLS:docs:DOCID
113
114 —⁠: where `DOCID` is some local identifier for the documentation.
115
116 These identifiers resolve to the files at `/PLS/docs/DOCID/`.
117
118 ### Source Identifiers
119
120 Source entries for a language with primary language subtag `PLS` are
121 assigned identifiers of the form :⁠—
122
123 urn:fdc:langdev.ladys.computer:2024:PLS:srcs:SOURCEID
124
125 —⁠: where `SOURCEID` is some local identifier for the documentation.
126
127 These identifiers resolve to the files at `/PLS/srcs/SOURCEID/`.
128
129 ### Text Identifiers
130
131 Texts written in a language with primary language subtag `PLS` are
132 assigned identifiers of the form :⁠—
133
134 urn:fdc:langdev.ladys.computer:2024:PLS:txts:TEXTID
135
136 —⁠: where `TEXTID` is some local identifier for the text.
137
138 These identifiers resolve to the files at `/PLS/txts/TEXTID/`.
139
140 ### Lexeme Identifiers
141
142 An identifier for a given lexeme can be constructed from its language,
143 variant, and Ascii representation.
144 Given primary language subtag `PLS`, variant subtag `VARIANT`, and
145 Ascii representation `LEXEME`, the resulting identifier is as
146 follows :⁠—
147
148 urn:fdc:langdev.ladys.computer:2024:PLS:VARIANT:LEXEME
149
150 Within this repository, lexemes reference each other according to the
151 U·R·I scheme above.
152 These identifiers resolve to the files at
153 `/PLS/#PLS-VARIANT--LEXEME`.
154
155 A less‐universal identifier, suitable for use as an X·M·L `ID`, is :⁠—
156
157 PLS-VARIANT--LEXEME
158
159 ## Encoding Principles
160
161 Dictionary information is expressed in a constrained R·D·F format which
162 conforms to the `DTD` in this repository.
163 This D·T·D should not be considered stable and should be inspected for
164 changes when·ever pulling in new data.
165
166 ## Website
167
168 The `site/` directory contains documentation and data used for building
169 the Langdev website (<https://langdev.ladys.computer/>).
170 Files and directories which are not meant to reference language subtags
171 will be given names which are either :⁠—
172
173 - Of any length, containing at least one apostrophe, hyphen,
174 underscore, or period.
175
176 - Exactly four lowercase alphabetic letters (distinguishable from a
177 script subtag as it conventionally begins with a capital letter).
178
179 - More than four letters or numbers, starting with a capital letter
180 (distinguishable from a variant subtag as it conventionally begins
181 with a lowercase letter).
182
183 - More than eight characters (all language subtags are eight or fewer
184 characters in length).
185
186 The site is built using [⛩📰 书社][Shushe].
187
188 [Caudex]: <https://git.ladys.computer/Caudex>
189 [Shushe]: <https://git.ladys.computer/Shushe>
This page took 0.066729 seconds and 5 git commands to generate.