25: char and string ordering

R6RS provides comparator routines for characters and strings using locale-independent Unicode algorithms. How do we order our textual data?

cowan

2010-03-02 01:34:21

R5RS ordering is very undemanding: it requires only that the ASCII uppercase letters sort correctly, that the ASCII lowercase letters sort correctly, that the digits sort correctly, and that digits and letters are not interleaved. On the other hand, it also requires that string sorting be lexicographic with respect to character sorting: that is, if two (sub)strings are the same except for the last character, they are sorted in the same way that the last character is sorted.

I'd like to apply R6RS rules to characters: they sort in Unicode order and char->integer and integer->char (which are arbitrary mappings in R5RS) map characters to Unicode code points. But Thing One implementations shouldn't be required to represent all of Unicode, nor to use UTF-8 or UTF-32 for strings internally (the native sort of UTF-16 is not codepoint order).

Instead, I believe we should break the lexicographic rule and allow string ordering to be done however an implementation pleases, as long as the R5RS rules are kept (all modern encodings, including EBCDIC, do keep them), and as long as the regular rules of comparison functions are preserved (consistency, trichotomy, etc.)

alexshinn

2010-11-18 13:55:26

milestone␣␣

resolution␣fixed

statusnewclosed

This has been subsumed by #23.

Ticket 25: char and string ordering