This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home.

Source for wiki StringSlicesCowan version 12

author

cowan

comment

ipnr

127.11.51.1

name

StringSlicesCowan

readonly

text

== Character span library ==

This is a library for manipulating textual content based on ''character spans'', also known as just ''spans''.  These are conceptually references to a part of a Scheme string.  It is not defined whether the character span type is disjoint from strings.  Character spans are immutable, and except as noted below, it is an error to mutate the string(s) that underly a span.

''String cursors'' are pointers into strings, and are not necessarily disjoint from other Scheme types.  For example, they may be exact integers that are character-based indexes into strings.  Alternatively, in an implementation whose internal representation of strings is UTF-8, string cursors may be indexes of individual bytes in the string.  It is also possible to implement string cursors as objects of a disjoint type.

This proposal also contains a useful subset of [http://srfi.schemers.org/srfi-13/srfi-13.html SRFI 13], which manipulates strings directly with some allowances for shared substrings (which are provided only by Guile).  Unlike SRFI 13, the string procedures of this proposal do not have ''start'' and ''end'' arguments, as their function is subsumed by spans.  In addition, the low-level procedures are not provided, nor are there any mutation operations.  Procedures with the same names and basic functions as SRFI-13 procedures are marked [SRFI 13].

The operations provided here (with the exception of those in the Compatibility section) are entirely independent of the character repertoire supported by the implementation.

Procedures marked [R7RS-small] are available in the small language, and are not exported by implementations of this proposal.  They are included only for clarity and completeness.

== Issues ==

1. Allow negative indices in constructors?

2. Titlecase doesn't really fit; keep it?

3. Keep functional update?

4. Keep string trees?

5. Keep compatibility routines, possibly in a different package?

6. I have made the argument order of `string-tabulate` compatible with SRFI 1 `list-tabulate` rather than SRFI 13's `string-tabulate`; the discrepancy was accidental.  Revert to SRFI 13 argument order?

7. Chibi provides `string-mismatch-right`, but the cursor returned is not necessarily valid; in particular, it returns -1 on identical strings.  I have left it out because of this.  Include it?

== Specification ==

With the exception of the constructors, all the procedures in this proposal exist in pairs: one that accepts and produces character spans and one that accepts and produces strings.  Only the character span version is documented in full; the string version should be understood as accepting the same non-span arguments, performing the same operations, and providing the same non-span results.

All predicates passed to procedures defined in this proposal may be called in any order and any number of times, except as otherwise noted.

== Character span constructors ==

`(make-span `''string start end''`)`

Returns a character span which contains the characters of ''string'' in order from ''start'' (inclusive) to ''end'' (exclusive).

`(span `''char'' ...`)`

`(subspan `''span start end''`)`

`(string->span `''string''`)`

Returns a character span which contains the characters of ''string'' in order.  Later mutation of ''string'' will not affect the value of ''span''.

`(span/cursors `''string start-cursor end-cursor''`)`

`(string-subspan/cursors `''string start-cursor end-cursor''`)`

`(span-transform `''proc span obj'' ...`)`

''Proc'' is a procedure which accepts a string as its first argument and returns a string.  It is invoked on a string which contains the characters of ''span'' in order plus the ''obj'' arguments, if any.  The resulting string is returned as a character span by `span-transform`.  This procedure allows string-based procedures to be easily used in an environment that provides and expects spans.

== String constructors ==

`(make-string ` ''k'' [ ''char'' ]`)` [R7RS-small]

Returns a string containing ''k'' characters, all of which are ''char''.  If ''char'' is omitted, the contents of the string are implementation-dependent.

`(string `''char'' ...`)` [R7RS-small]

Returns a string consisting of the ''char'' arguments.

`(string-unfold `''stop? mapper successor'' [ ''seed'' ]`)`

`(string-unfold-right `''stop? mapper successor'' [ ''seed'' ]`)`

`(span->string `''span''`)`

Returns a newly allocated string which contains the characters of ''span'' in order.

`(string-tabulate `''len proc''`)`

Invokes ''proc'' for all exact integers between 0 (inclusive) and ''len'' (exclusive), and returns a newly allocated string containing the characters returned by the invocations.

Compatibility note:  The argument order here agrees with the `list-tabulate` procedure of [http://srfi.schemers.org/srfi-1/srfi-1.html SRFI 1] rather than SRFI 13's `string-tabulate` procedure.  The discrepancy was [http://srfi.schemers.org/srfi-13/mail-archive/msg00143.html unintentional], but was [http://srfi.schemers.org/srfi-13/mail-archive/msg00144.html discovered too late to fix].

== Predicates ==

`(span? `''obj''`)`

`(string? `''obj''`)`  [R7RS-small]

Returns `#t` if ''obj'' is a character span, and `#f` otherwise.

`(span-null? `''span''`)`

`(string-null? `''string''`)`  [SRFI 13]

Returns `#t` if ''span'' contains zero characters, and `#f` otherwise.

`(span-every `''pred span''`)`

`(string-every `''pred string''`)`  [SRFI 13]

Returns `#t` if ''pred'' returns true for every character in ''span'', and `#f` otherwise.

`(span-any `''pred span''`)`

`(string-any `''pred string''`)`  [SRFI 13]

Returns `#t` if ''pred'' returns false for every character in ''span'', and `#f` otherwise.

`(is-char? `''char''`)`

Returns a predicate which accepts one argument, and returns `#t` if the argument is the same as ''char'' (in the sense of `char=?`) and `#f` otherwise.

`(in-char-set? `''char-set''`)`

Returns a predicate which accepts one argument, and returns `#t` if the argument is an element of ''char-set'', a SRFI 14 character set, and `#f` otherwise.

== Selection ==

`(span-ref `''span k''`)`

`(string-ref `''string k''`)` [R7RS-small]

Returns the 'k'th character of ''span'', starting with 0.  It is an error if ''k'' is not a non-negative exact integer less than the length of ''span''.

`(span-take `''span n''`)`

`(string-take `''string n''`)` [SRFI 13]

Returns a character span which contains the first ''n'' characters of ''span''.

`(span-take-right `''span n''`)`

`(string-take-right `''string n''`)` [SRFI 13]

Returns a character span which contains the last ''n'' characters of ''span''.

`(span-drop `''span n''`)`

`(string-drop `''string  n''`)` [SRFI 13]

Returns a character span which contains all but the first ''n'' characters of ''span''.

`(span-drop-right `''span n''`)`

`(string-drop-right `''string n''`)` [SRFI 13]

Returns a character span which contains all but the last ''n'' characters of ''span''.

`(span-split-at `''span n''`)`

`(string-split-at `''string  n''`)` [SRFI 13]

Returns two values, a character span containing the first ''n'' characters of ''span'', and another character span containing the remaining characters of ''span''.

`(span-replicate `''span from to''`)`

`(string-replicate `''string from to''`)`

''Span'' is conceptually replicated an infinite number of times to both left and right, and this doubly infinite sequence is then truncated to form a span starting at index ''from'' (inclusive) and ending at index ''to'' (exclusive).  Negative indexes are allowed in order to access the infinite left extension.

{{{
(string-replicate "abcdef" 2 7) => "cdefab" ; rotate left

(string-replicate "abcdef" -2 4) => "efabcd" ; rotate right

(string-replicate "abc" 0 7) => "abcabca" ; replicate
}}}

This procedure is the same as the SRFI 13 procedure `xsubstring`, except that the ''to'' argument is required.

== Padding, trimming, and compressing ==

`(span-pad `''span len'' [ ''char'' ]`)`

`(string-pad `''string  len'' [ ''char'' ]`)` [SRFI 13]

`(span-pad-right `''span len'' [ ''char'' ]`)`

`(string-pad-right `''string  len'' [ ''char'' ]`)` [SRFI 13]

Returns a span of length ''len'' consisting of ''span'' padded on the left (right) by as many occurrences of the character ''char'' as needed.  If ''span'' has more than ''len'' characters, it is truncated on the left (right) to length ''len''.  If ''char'' is omitted, `#\space` is used.

`(span-trim `''span [ ''pred'' ]`)`

`(string-trim `''string [ ''pred'' ]`)` [SRFI 13]

`(span-trim-right `''span [ ''pred'' ][ ''char'' ]`)`

`(string-trim-right `''string [ ''pred'' ]`)` [SRFI 13]

`(span-trim-both `''span [ ''pred'' ]`)`

`(string-trim-both `''string [ ''pred'' ]`)` [SRFI 13]

Trim ''span'' by skipping over all characters on the left / on the right / on both sides that satisfy ''pred'' and returning the resulting span.

`(span-compress `''span'' [ ''char'' ]`)`

`(string-compress `''string'' [ ''char'' ]`)`

Return a span which differs from ''span'' in that every sequence of consecutive occurrences of ''char'' has been replaced by a single ''char''.  If ''char'' is omitted, `#\space` is used.

== Prefixes and suffixes ==

`(span-prefix `''span,,1,,'' ''span,,2,,''`)`

`(string-prefix `''string,,1,,'' ''string,,2,,''`)`

`(span-suffix `''span,,1,,'' ''span,,2,,''`)`

`(string-suffix `''string,,1,,'' ''string,,2,,''`)` 

Returns a span containing the characters in the common prefix/suffix of ''span,,1,,' and ''span,,2,,''.  If there is no common prefix/suffix, returns an empty span.

`(span-prefix-length `''span,,1,,'' ''span,,2,,''`)`

`(string-prefix-length `''string,,1,,'' ''string,,2,,''`)` [SRFI 13]

`(span-suffix-length `''span,,1,,'' ''span,,2,,''`)`

`(string-suffix-length `''string,,1,,'' ''string,,2,,''`)` [SRFI 13]

Returns the length of the span that would be returned by `span-prefix` and friends.

`(span-prefix? `''span,,1,,'' ''span,,2,,''`)`

`(string-prefix? `''string,,1,,'' ''string,,2,,''`)` [SRFI 13]

`(span-suffix? `''span,,1,, span,,2,,''`)`

`(string-suffix? `''string,,1,,'' ''string,,2,,''`)` [SRFI 13]

Returns `#t` if ''span,,1,,'' is a prefix/suffix of ''span,,2,,'', and `#f` otherwise.

== Searching ==

`(span-count `''pred span''`)`

`(string-count `''proc string''`)` [SRFI 13]

Returns an exact integer, the number of characters in ''span'' which satisfy ''pred''.

`(span-take-while `''pred span''`)`

`(string-take-while  `''pred string''`)`

Returns the characters of

`(span-drop-while `''pred span''`)`

`(string-drop-while  `''pred string''`)`

`(span-break `''pred span''`)`

`(string-break `''pred string''`)` [SRFI 13]

`(span-span `''pred span''`)`

`(string-span `''pred string''`)` [SRFI 13]

`(span-contains `''span,,1,, span,,2,,''`)`

`(string-contains `''string,,1,, string,,2,,''`)` [SRFI 13]

== The whole character span or string ==

`(span-length `''span''`)`

`(string-length `''string''`)` [R7RS-small]

Returns the number of characters in ''span''.

`(span-copy `''span''`)`

`(string-copy `''string'' [ ''start'' [ ''end'' ] ]`)` [R7RS-small]

Makes a copy of ''span'' such that any future mutation of any string underlying ''span'' does not affect the characters of ''span''.

`(span-reverse `''span''`)`

`(string-reverse `''span''`)` [SRFI 13]

Returns a span containing the characters of ''span'' in reverse order.

`(span-append `''span'' ...`)`

`(string-append `''string'' ...`)` [R7RS-small]

Returns a span containing the characters of the ''spans'' in order.

`(span-concatenate `''list-of-spans''`)`

`(string-concatenate `''list-of-strings''`)` [SRFI 13]

Returns a span containing the characters of the spans enumerated in ''list-of-spans'' in order.  This procedure will succeed even if `(apply string-append list-of-strings)` fails due to an implementation limit on the number of arguments a procedure may receive.

`(span-concatenate-reverse `''list-of-spans''`)`

`(string-concatenate-reverse `''list-of-strings''`)` [SRFI 13]

The same as `span-concatenate`, except that ''list-of-spans'' is processed in reverse order.  Note that the individual spans are ''not'' processed in reverse order.

== Folding and mapping ==

`(span-map `''proc span'' ...`)`

`(string-map `''proc string'' ...`)` [R7RS-small]

`(span-for-each `''proc span'' ...`)`

`(string-for-each `''proc string'' ...`)` [R7RS-small]

`(span-fold `''proc nil span''`)`

`(string-fold `''proc nil string''`)` [R7RS-small]

`(span-fold-right `''proc nil span''`)`

`(string-fold-right `''proc nil string''`)` [R7RS-small]

== Parsing ==

`(span-split `''span [''sep'' [ ''limit'' ] ]`)`

`(span-split `''span [''sep'' [ ''limit'' ] ]`)`

Returns a list of the words contained in ''span''.  If ''sep'' (which is also a character span) is omitted, then the words are separated by one or more whitespace characters (those on which `char-whitespace?` returns `#t`). If ''sep'' is supplied, it specifies a string to be used as the word separator. The returned list will then have one more item than the number of non-overlapping occurrences of the separator in the string.  If ''sep'' is an empty span, then the returned list contains a list of the characters in ''span''.

If ''limit'' is provided, at most that many splits occur, and the remainder of ''span'' is returned as the final element of the list (thus, the result will have at most ''limit'' + 1 elements). If ''limit'' is not specified, then as many splits as possible are made.  It is an error if ''limit'' is not a positive exact integer.

== Filtering and partitioning ==

`(span-filter `''pred span''`)` [SRFI 13]

`(string-filter `''pred string''`)`

Returns a span containing the characters of ''span'' which satisfy ''proc''.

`(span-remove `''pred span''`)`

`(string-remove `''pred string''`)`

Returns a span containing the characters of ''span'' which do not satisfy ''proc''.

`(span-partition `''pred span''`)`

`(string-partition `''pred string''`)`

Returns two values, a span containing the characters of ''span'' which satisfy ''proc'' and another span containing those which do not.

== Conversion ==

`(span->list `''span''`)`

`(span->vector `''span''`)`

`(string->list `''string'' [ ''start'' [ ''end'' ] ]`)` [R7RS-small]

`(list->string `''list''[ ''start'' [ ''end'' ] ]`)` [R7RS-small]

`(string->vector `''string''[ ''start'' [ ''end'' ] ]`)` [R7RS-small]

`(vector->string `''vector''[ ''start'' [ ''end'' ] ]`)` [R7RS-small]

== String cursors ==

{{{
string-cursor-start
string-cursor-end

string-cursor-ref

string-cursor-next
string-cursor-previous

string-cursor-forward
string-cursor-backward

string-cursor-forward-until
string-cursor-backward-until

string-cursor=?
string-cursor<?
string-cursor>?
string-cursor<=?
string-cursor>=?

string-cursor->index
string-index->cursor

string-cursors->string

string-cursor-difference
}}}

== Functional update ==

{{{
span-replace
string-replace

span-insert
string-insert

span-delete
string-delete
}}}

== Output ==

`(write-string-tree `''obj'' [ ''port'' ]`)`

It is an error if ''port'' is not a textual output port.  If ''port'' is omitted, the value of `(current-output-port)` is used.

If ''obj'' is a string or character span, its characters are output to ''port''.  If ''obj'' is a character, it is output to ''port''.  If ''obj'' is a number, it is converted to a string as if by `number->string` and the characters of the string are output to ''port''.  If ''obj'' is a pair or vector, its components are processed recursively by `write-string-tree`.  Otherwise, `write-string-tree` does nothing.

`(tree->span `''obj''`)`

`(tree->string `''obj''`)`

Behaves as if `write-string-tree` were applied to ''obj'' and a newly allocated string output port.  When ''obj'' has been completely output, the port's string is returned as a span or a string.

== Compatibility ==

`(span-upcase `''span''`)`

`(string-upcase `''span''`)` [R7RS-small]

`(span-downcase `''span''`)`

`(string-downcase `''span''`)` [R7RS-small]

`(span-foldcase `''span''`)`

`(string-foldcase `''span''`)` [R7RS-small]

For the behavior of the string procedures, see R7RS-small.  In any implementation of this proposal based on R7RS, the span procedures must behave analogously to the string procedures.  That is, if a call to string procedure ''x'' on a string containing characters ''y,,0,, ... y,,n,,'' produces a string containing characters ''z,,0,, ... z,,n,,'', then a call to the analogous span procedure ''x′'' on a span containing characters ''y,,0,, ... y,,n,,'' must produce a span containing characters ''z,,0,, ... z,,n,,'.

`(span=? `''span,,1,, span,,2,, span'' ...`)`

`(string=? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span<? `''span,,1,, span,,2,, span'' ...`)`

`(string<? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span>? `''span,,1,, span,,2,, span'' ...`)`

`(string>? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span<=? `''span,,1,, span,,2,, span'' ...`)`

`(string<=? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span>=? `''span,,1,, span,,2,, span'' ...`)`

`(string>=? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span-ci=? `''span,,1,, span,,2,, span'' ...`)`

`(string-ci=? ` ''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span-ci<? `''span,,1,, span,,2,, span'' ...`)`

`(string-ci<? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span-ci>? `''span,,1,, span,,2,, span'' ...`)`

`(string-ci>? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span-ci<=? `''span,,1,, span,,2,, span'' ...`)`

`(string-ci<=? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

`(span-ci>=? `''span,,1,, span,,2,, span'' ...`)`

`(string-ci>=? `''span,,1,, span,,2,, span'' ...`)` [R7RS-small]

For the behavior of the string procedures, see R7RS-small.  In any implementation of this proposal based on R7RS, the span procedures must behave analogously to the string procedures.

`(span-titlecase ''span''`)`

`(string-titlecase `''span''`)`[SRFI 13]

For every character ''c'' in ''span'': if ''c'' is preceded by a character with case, it is downcased; otherwise it is replaced by its titlecase equivalent, if any.  Other characters are unchanged.  Note that most lowercase characters have the same character as both uppercase and titlecase equivalents.

{{{
(string-titlecase "--capitalize tHIS sentence.") =>
  "--Capitalize This Sentence."

(string-titlecase "see Spot run. see Nix run.") =>
  "See Spot Run. See Nix Run."

(string-titlecase "3com makes routers.") =>
  "3Com Makes Routers."
}}}

time

2014-12-18 00:22:03