This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home.
R6RS provides comparator routines for
characters and strings using locale-independent
Unicode algorithms. How do we order our
textual data?
cowan
2010-03-02 01:34:21
R5RS ordering is very undemanding: it requires only that the ASCII uppercase letters sort correctly, that the ASCII lowercase letters sort correctly, that the digits sort correctly, and that digits and letters are not interleaved. On the other hand, it also requires that string sorting be lexicographic with respect to character sorting: that is, if two (sub)strings are the same except for the last character, they are sorted in the same way that the last character is sorted.
I'd like to apply R6RS rules to characters: they sort in Unicode order and char->integer and integer->char (which are arbitrary mappings in R5RS) map characters to Unicode code points. But Thing One implementations shouldn't be required to represent all of Unicode, nor to use UTF-8 or UTF-32 for strings internally (the native sort of UTF-16 is not codepoint order).
Instead, I believe we should break the lexicographic rule and allow string ordering to be done however an implementation pleases, as long as the R5RS rules are kept (all modern encodings, including EBCDIC, do keep them), and as long as the regular rules of comparison functions are preserved (consistency, trichotomy, etc.)
R5RS ordering is very undemanding: it requires only that the ASCII uppercase letters sort correctly, that the ASCII lowercase letters sort correctly, that the digits sort correctly, and that digits and letters are not interleaved. On the other hand, it also requires that string sorting be lexicographic with respect to character sorting: that is, if two (sub)strings are the same except for the last character, they are sorted in the same way that the last character is sorted.
I'd like to apply R6RS rules to characters: they sort in Unicode order and char->integer and integer->char (which are arbitrary mappings in R5RS) map characters to Unicode code points. But Thing One implementations shouldn't be required to represent all of Unicode, nor to use UTF-8 or UTF-32 for strings internally (the native sort of UTF-16 is not codepoint order).
Instead, I believe we should break the lexicographic rule and allow string ordering to be done however an implementation pleases, as long as the R5RS rules are kept (all modern encodings, including EBCDIC, do keep them), and as long as the regular rules of comparison functions are preserved (consistency, trichotomy, etc.)