AdvancedUcdCowan

See UcdCowan for basic UCD procedures.

It is an error to mutate any objects returned by these procedures.

Blocks

Blocks are disjoint objects that represent the allocation blocks into which the Unicode code point space is divided for administrative purposes. Typically most of a block is allocated at once and contains characters from a single script, but there is often more than one block per script, some blocks contain characters from multiple scripts, and some characters in a block are allocated much later than the rest. The list of blocks provided is implementation-dependent. Since it is not possible to create new ones, eqv? may be used to compare them.

(blocks)

Returns a list of all blocks known to the implementation.

(block-name block)

Returns a string naming block.

(block-first block)

Returns an exact integer representing the first (smallest) code point in the block.

(block-last block)

Returns an exact integer representing the last (largest) code point in the block.

Named Sequences

Named sequences are disjoint objects which represent a sequence of Unicode code points that has a name specified by the Unicode Standard. Named sequences may be provisional in one version of the UCD and then non-provisional in later versions. The list of named sequences provided is implementation-dependent. Since it is not possible to create new ones, eqv? may be used to compare them.

(named-sequences)

Returns a list of all named sequence objects known to the implementation.

(named-sequence-name named-sequence)

Returns a string naming named-sequence.

(named-sequence-code-points named-sequence)

Returns a list of exact integers representing the code points of the named-sequence.

(named-sequence-provisional? named-sequence)

Returns #t if the named-sequence is provisional, or #f if not.

Normalization corrections

Normalization-corrections are disjoint objects that represent official corrections to the UCD normalization tables. The list of normalization-corrections provided is implementation-dependent. Since it is not possible to create new ones, eqv? may be used to compare them.

(normalization-corrections)

Returns a list of all normalization-corrections known to the implementation.

(normalization-correction-description block)

Returns a string describing normalization-correction. Note that normalization-corrections don't have names.

(normalization-correction-codepoint block)

Returns an exact integer specifying the code point of the character whose normalization is being corrected.

(normalization-correction-old block)

Returns a list of exact integers specifying the normalization of (normalization-correction-codepoint block) before this normalization correction is applied.

(normalization-correction-new normalization-correction)

Returns a list of exact integers specifying the normalization of (normalization-correction-codepoint block) after this normalization correction is applied.

(normalization-correction-version block)

Returns a list of three exact integers specifying the version of the UCD (in the format of ucd-version) in which this normalization-correction was applied.

Standardized variants

Standardized-variants are disjoint objects that represent standardized variants of base charactesr. The list of standardized-variants provided is implementation-dependent. Since it is not possible to create new ones, eqv? may be used to compare them.

(standardized-variants)

Returns a list of all standardized-variants known to the implementation.

(standardized-variants-description standardized-variant)

Returns a string describing standardized-variant. Note that standardized-variants don't have names.

(standardized-variants-when standardized-variant)

Returns a string specifying the shaping environment under which standardized-variant is applied.

(standardized-variant-base-codepoint block)

Returns an exact integer specifying the code point of the base character of the standardized variant.

(standardized-variant-variant-codepoint block)

Returns an exact integer specifying the code point of the base character of the standardized variant. Issue: this name is regrettable.

Undigested stuff from UAX #42

CJK radicals

The cjk-radicals child of the ucd describes the CJK radicals. It has one child element cjk-radical per radical. The attributes on that last element capture the radical number, the corresponding CJK radical character, and the corresponding CJK unified ideograph.

[cjk radicals, 50] = ucd.content &= element cjk-radicals { element cjk-radical { attribute number { xsd:string {pattern="[0-9]{1,3}'?"}}, attribute radical { single-code-point }, attribute ideograph { single-code-point }} + }?

Emoji sources

The emoji-sources child of the ucd describes the emoji sources.

[datatype for code points, 51] = jis-code-point = xsd:string { pattern = "[0-9A-F]{4}" }

[emoji sources, 52] = ucd.content &= element emoji-sources { element emoji-source { attribute unicode { one-or-more-code-points }, attribute docomo { jis-code-point? }, attribute kddi { jis-code-point? }, attribute softbank { jis-code-point? } } + }?

Advanced­Ucd­Cowan