This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home.

Source for ticket #25

cc


    

changetime

2010-11-18 13:55:26

component

WG1 - Strings and Chars

description

R6RS provides comparator routines for
characters and strings using locale-independent
Unicode algorithms.  How do we order our
textual data?

id

25

keywords


    

milestone


    

owner

alexshinn

priority

major

reporter

alexshinn

resolution

fixed

severity


    

status

closed

summary

char and string ordering

time

2010-02-23 16:55:35

type

defect

Changes

Change at time 2010-11-18 13:55:26

author

alexshinn

field

comment

newvalue

This has been subsumed by #23.

oldvalue

2

raw-time

1290059726000000

ticket

25

time

2010-11-18 13:55:26

Change at time 2010-11-18 13:55:26

author

alexshinn

field

milestone

newvalue


    

oldvalue


    

raw-time

1290059726000000

ticket

25

time

2010-11-18 13:55:26

Change at time 2010-11-18 13:55:26

author

alexshinn

field

resolution

newvalue

fixed

oldvalue


    

raw-time

1290059726000000

ticket

25

time

2010-11-18 13:55:26

Change at time 2010-11-18 13:55:26

author

alexshinn

field

status

newvalue

closed

oldvalue

new

raw-time

1290059726000000

ticket

25

time

2010-11-18 13:55:26

Change at time 2010-03-02 01:34:21

author

cowan

field

comment

newvalue

R5RS ordering is very undemanding: it requires only that the ASCII uppercase letters sort correctly, that the ASCII lowercase letters sort correctly, that the digits sort correctly, and that digits and letters are not interleaved.   On the other hand, it also requires that string sorting be lexicographic with respect to character sorting: that is, if two (sub)strings are the same except for the last character, they are sorted in the same way that the last character is sorted.

I'd like to apply R6RS rules to characters: they sort in Unicode order and char->integer and integer->char (which are arbitrary mappings in R5RS) map characters to Unicode code points.  But Thing One implementations shouldn't be required to represent all of Unicode, nor to use UTF-8 or UTF-32 for strings internally (the native sort of UTF-16 is ''not'' codepoint order).

Instead, I believe we should break the lexicographic rule and allow string ordering to be done however an implementation pleases, as long as the R5RS rules are kept (all modern encodings, including EBCDIC, do keep them), and as long as the regular rules of comparison functions are preserved (consistency, trichotomy, etc.)

oldvalue

1

raw-time

1267464861000000

ticket

25

time

2010-03-02 01:34:21