This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home. For a version of this page that may be more recent, see StringSlicesCowan in WG2's repo for R7RS-large.

String­Slices­Cowan

cowan
2014-12-18 00:53:45
15history
source

Character span library

This is a library for manipulating textual content based on character spans, also known as just spans. These are conceptually references to a part of a Scheme string. It is not defined whether the character span type is disjoint from strings. Character spans are immutable, and except as noted below, it is an error to mutate the string(s) that underly a span.

String cursors are pointers into strings, and are not necessarily disjoint from other Scheme types. For example, they may be exact integers that are character-based indexes into strings. Alternatively, in an implementation whose internal representation of strings is UTF-8, string cursors may be indexes of individual bytes in the string. It is also possible to implement string cursors as objects of a disjoint type.

Most of the character span procedures in this proposal also have string equivalents. In order to make the specification more concise, the string procedures are listed but don't have detailed explanations, except for the constructors. Procedures with the same names and basic functions as SRFI 13 procedures are marked [SRFI 13]. However, this proposal contains only a subset of SRFI 13. In particular, the string procedures of this proposal do not accept start and end arguments, as their function is subsumed by spans. In addition, the SRFI 13 low-level procedures and macros are not provided, nor are there any mutation operations.

The operations provided here (with the exception of those in the Compatibility section) are entirely independent of the character repertoire supported by the implementation.

Procedures marked [R7RS-small] are available in the small language, and are not exported by implementations of this proposal. They are included only for clarity and completeness.

Issues

  1. Allow negative indices in constructors?
  1. Titlecase doesn't really fit; keep it?
  1. Keep functional update?
  1. Keep string trees?
  1. Keep compatibility routines, possibly in a different package?
  1. I have made the argument order of string-tabulate compatible with SRFI 1 list-tabulate rather than SRFI 13's string-tabulate; the discrepancy was accidental. Revert to SRFI 13 argument order?
  1. Chibi provides string-mismatch-right, but the cursor returned is not necessarily valid; in particular, it returns -1 on identical strings. I have left it out because of this. Include it?

Specification

With the exception of the constructors, all the procedures in this proposal exist in pairs: one that accepts and produces character spans and one that accepts and produces strings. Only the character span version is documented in full; the string version should be understood as accepting the same non-span arguments, performing the same operations, and providing the same non-span results.

All predicates passed to procedures defined in this proposal may be called in any order and any number of times, except as otherwise noted.

Character span constructors

(make-span string start end)

Returns a character span which contains the characters of string in order from start (inclusive) to end (exclusive).

(span char ...)

(subspan span start end)

(string->span string)

Returns a character span which contains the characters of string in order. Later mutation of string will not affect the value of span.

(span/cursors string start-cursor end-cursor)

(string-subspan/cursors string start-cursor end-cursor)

(span-transform proc span obj ...)

Proc is a procedure which accepts a string as its first argument and returns a string. It is invoked on a string which contains the characters of span in order plus the obj arguments, if any. The resulting string is returned as a character span by span-transform. This procedure allows string-based procedures to be easily used in an environment that provides and expects spans.

String constructors

(make-string k [ char ]) [R7RS-small]

Returns a string containing k characters, all of which are char. If char is omitted, the contents of the string are implementation-dependent.

(string char ...) [R7RS-small]

Returns a string consisting of the char arguments.

(string-unfold stop? mapper successor [ seed ])

(string-unfold-right stop? mapper successor [ seed ])

(span->string span)

Returns a newly allocated string which contains the characters of span in order.

(string-tabulate len proc)

Invokes proc for all exact integers between 0 (inclusive) and len (exclusive), and returns a newly allocated string containing the characters returned by the invocations.

Compatibility note: The argument order here agrees with the list-tabulate procedure of SRFI 1 rather than SRFI 13's string-tabulate procedure. The discrepancy was unintentional, but was discovered too late to fix.

Predicates

(span? obj)

(string? obj) [R7RS-small]

Returns #t if obj is a character span, and #f otherwise.

(span-null? span)

(string-null? string) [SRFI 13]

Returns #t if span contains zero characters, and #f otherwise.

(span-every pred span)

(string-every pred string) [SRFI 13]

Returns #t if pred returns true for every character in span, and #f otherwise.

(span-any pred span)

(string-any pred string) [SRFI 13]

Returns #t if pred returns false for every character in span, and #f otherwise.

(is-char? char)

Returns a predicate which accepts one argument, and returns #t if the argument is the same as char (in the sense of char=?) and #f otherwise.

(in-char-set? char-set)

Returns a predicate which accepts one argument, and returns #t if the argument is an element of char-set, a SRFI 14 character set, and #f otherwise.

Selection

(span-ref span k)

(string-ref string k) [R7RS-small]

Returns the 'k'th character of span, starting with 0. It is an error if k is not a non-negative exact integer less than the length of span.

(span-take span n)

(string-take string n) [SRFI 13]

Returns a character span which contains the first n characters of span.

(span-take-right span n)

(string-take-right string n) [SRFI 13]

Returns a character span which contains the last n characters of span.

(span-drop span n)

(string-drop string n) [SRFI 13]

Returns a character span which contains all but the first n characters of span.

(span-drop-right span n)

(string-drop-right string n) [SRFI 13]

Returns a character span which contains all but the last n characters of span.

(span-split-at span n)

(string-split-at string n) [SRFI 13]

Returns two values, a character span containing the first n characters of span, and another character span containing the remaining characters of span.

(span-replicate span from to)

(string-replicate string from to)

Span is conceptually replicated an infinite number of times to both left and right, and this doubly infinite sequence is then truncated to form a span starting at index from (inclusive) and ending at index to (exclusive). Negative indexes are allowed in order to access the infinite left extension.

(string-replicate "abcdef" 2 7) => "cdefab" ; rotate left (string-replicate "abcdef" -2 4) => "efabcd" ; rotate right (string-replicate "abc" 0 7) => "abcabca" ; replicate

This procedure is the same as the SRFI 13 procedure xsubstring, except that the to argument is required.

Padding, trimming, and compressing

(span-pad span len [ char ])

(string-pad string len [ char ]) [SRFI 13]

(span-pad-right span len [ char ])

(string-pad-right string len [ char ]) [SRFI 13]

Returns a span of length len consisting of span padded on the left (right) by as many occurrences of the character char as needed. If span has more than len characters, it is truncated on the left (right) to length len. If char is omitted, #\space is used.

(span-trim span [ pred'' ])

(string-trim string [ pred'' ]) [SRFI 13]

(span-trim-right span [ pred ][ char'' ])

(string-trim-right string [ pred'' ]) [SRFI 13]

(span-trim-both span [ pred'' ])

(string-trim-both string [ pred'' ]) [SRFI 13]

Trim span by skipping over all characters on the left / on the right / on both sides that satisfy pred and returning the resulting span.

(span-compress span [ char ])

(string-compress string [ char ])

Return a span which differs from span in that every sequence of consecutive occurrences of char has been replaced by a single char. If char is omitted, #\space is used.

Prefixes and suffixes

(span-prefix span1 span2)

(string-prefix string1 string2)

(span-suffix span1 span2)

(string-suffix string1 string2)

Returns a span containing the characters in the common prefix/suffix of span,,1,,' and span,,2,,''. If there is no common prefix/suffix, returns an empty span.

(span-prefix-length span1 span2)

(string-prefix-length string1 string2) [SRFI 13]

(span-suffix-length span1 span2)

(string-suffix-length string1 string2) [SRFI 13]

Returns the length of the span that would be returned by span-prefix and friends.

(span-prefix? span1 span2)

(string-prefix? string1 string2) [SRFI 13]

(span-suffix? span1 span2)

(string-suffix? string1 string2) [SRFI 13]

Returns #t if span1 is a prefix/suffix of span2, and #f otherwise.

Searching

(span-count pred span)

(string-count proc string) [SRFI 13]

Returns an exact integer, the number of characters in span which satisfy pred.

(span-find pred span)

(string-find pred string)

(span-find-right pred span)

(string-find-right pred string)

Returns a cursor pointing to the first/last character in span that satisfies pred, or the end/start cursor if there is none.

Compatibility note: These procedures are analogous to SRFI 13's string-index procedures, but return cursors rather than indexes or #f on failure.

(span-bypass pred span)

(string-bypass pred string)

(span-bypass-right pred span)

(string-bypass-right pred string)

Returns a cursor pointing to the first/last character in span that does not satisfy pred, or the end/start cursor if there is none.

Compatibility note: These procedures are analogous to SRFI 13's string-skip procedures, but return cursors rather than indexes or #f on failure.

(span-take-while pred span)

(string-take-while pred string)

Returns the longest initial prefix of span whose characters all satisfy pred.

(span-drop-while pred span)

(string-drop-while pred string)

Drops the longest initial prefix of span whose characters all satisfy pred, and returns the rest of span.

(span-span pred span)

(string-span pred string) [SRFI 13]

(span-break pred span)

(string-break pred string) [SRFI 13]

The span-span procedure splits span, returning two values: the longest initial prefix whose characters all satisfy pred, and the remainder of span. The span-break procedure inverts the sense of the predicate: the remainder commences with the first character of span that satisfies pred. In other words: span-span finds the intial span of characters satisfying pred, and span-break breaks span at the first character satisfying pred. The name "span-span" is unfortunate but unavoidable.

(span-contains span1 span2)

(string-contains string1 string2) [SRFI 13]

The whole character span or string

(span-length span)

(string-length string) [R7RS-small]

Returns the number of characters in span.

(span-copy span)

(string-copy string [ start [ end ] ]) [R7RS-small]

Makes a copy of span such that any future mutation of any string underlying span does not affect the characters of span.

(span-reverse span)

(string-reverse span) [SRFI 13]

Returns a span containing the characters of span in reverse order.

(span-append span ...)

(string-append string ...) [R7RS-small]

Returns a span containing the characters of the spans in order.

(span-concatenate list-of-spans)

(string-concatenate list-of-strings) [SRFI 13]

Returns a span containing the characters of the spans enumerated in list-of-spans in order. This procedure will succeed even if (apply string-append list-of-strings) fails due to an implementation limit on the number of arguments a procedure may receive.

(span-concatenate-reverse list-of-spans)

(string-concatenate-reverse list-of-strings) [SRFI 13]

The same as span-concatenate, except that list-of-spans is processed in reverse order. Note that the individual spans are not processed in reverse order.

Folding and mapping

(span-map proc span ...)

(string-map proc string ...) [R7RS-small]

(span-for-each proc span ...)

(string-for-each proc string ...) [R7RS-small]

(span-fold proc nil span)

(string-fold proc nil string) [R7RS-small]

(span-fold-right proc nil span)

(string-fold-right proc nil string) [R7RS-small]

Parsing

(span-split span [sep [ limit'' ] ])

(span-split span [sep [ limit'' ] ])

Returns a list of the words contained in span. If sep (which is also a character span) is omitted, then the words are separated by one or more whitespace characters (those on which char-whitespace? returns #t). If sep is supplied, it specifies a string to be used as the word separator. The returned list will then have one more item than the number of non-overlapping occurrences of the separator in the string. If sep is an empty span, then the returned list contains a list of the characters in span.

If limit is provided, at most that many splits occur, and the remainder of span is returned as the final element of the list (thus, the result will have at most limit + 1 elements). If limit is not specified, then as many splits as possible are made. It is an error if limit is not a positive exact integer.

Filtering and partitioning

(span-filter pred span) [SRFI 13]

(string-filter pred string)

Returns a span containing the characters of span which satisfy proc.

(span-remove pred span)

(string-remove pred string)

Returns a span containing the characters of span which do not satisfy proc.

(span-partition pred span)

(string-partition pred string)

Returns two values, a span containing the characters of span which satisfy proc and another span containing those which do not.

Conversion

(span->list span)

(span->vector span)

(string->list string [ start [ end ] ]) [R7RS-small]

(list->string list[ start [ end ] ]) [R7RS-small]

(string->vector string[ start [ end ] ]) [R7RS-small]

(vector->string vector[ start [ end ] ]) [R7RS-small]

String cursors

string-cursor-start string-cursor-end string-cursor-ref string-cursor-next string-cursor-previous string-cursor-forward string-cursor-backward string-cursor-forward-until string-cursor-backward-until string-cursor=? string-cursor<? string-cursor>? string-cursor<=? string-cursor>=? string-cursor->index string-index->cursor string-cursors->string string-cursor-difference

Functional update

span-replace string-replace span-insert string-insert span-delete string-delete

Output

(write-string-tree obj [ port ])

It is an error if port is not a textual output port. If port is omitted, the value of (current-output-port) is used.

If obj is a string or character span, its characters are output to port. If obj is a character, it is output to port. If obj is a number, it is converted to a string as if by number->string and the characters of the string are output to port. If obj is a pair or vector, its components are processed recursively by write-string-tree. Otherwise, write-string-tree does nothing.

(tree->span obj)

(tree->string obj)

Behaves as if write-string-tree were applied to obj and a newly allocated string output port. When obj has been completely output, the port's string is returned as a span or a string.

Compatibility

(span-upcase span)

(string-upcase span) [R7RS-small]

(span-downcase span)

(string-downcase span) [R7RS-small]

(span-foldcase span)

(string-foldcase span) [R7RS-small]

For the behavior of the string procedures, see R7RS-small. In any implementation of this proposal based on R7RS, the span procedures must behave analogously to the string procedures. That is, if a call to string procedure x on a string containing characters y0 ... yn produces a string containing characters z0 ... zn, then a call to the analogous span procedure x′ on a span containing characters y0 ... yn must produce a span containing characters ''z0 ... z,,n,,'.

(span=? span1 span2 span ...)

(string=? span1 span2 span ...) [R7RS-small]

(span<? span1 span2 span ...)

(string<? span1 span2 span ...) [R7RS-small]

(span>? span1 span2 span ...)

(string>? span1 span2 span ...) [R7RS-small]

(span<=? span1 span2 span ...)

(string<=? span1 span2 span ...) [R7RS-small]

(span>=? span1 span2 span ...)

(string>=? span1 span2 span ...) [R7RS-small]

(span-ci=? span1 span2 span ...)

(string-ci=? span1 span2 span ...) [R7RS-small]

(span-ci<? span1 span2 span ...)

(string-ci<? span1 span2 span ...) [R7RS-small]

(span-ci>? span1 span2 span ...)

(string-ci>? span1 span2 span ...) [R7RS-small]

(span-ci<=? span1 span2 span ...)

(string-ci<=? span1 span2 span ...) [R7RS-small]

(span-ci>=? span1 span2 span ...)

(string-ci>=? span1 span2 span ...) [R7RS-small]

For the behavior of the string procedures, see R7RS-small. In any implementation of this proposal based on R7RS, the span procedures must behave analogously to the string procedures.

(span-titlecase ''span'')`

(string-titlecase span)[SRFI 13]

For every character c in span: if c is preceded by a character with case, it is downcased; otherwise it is replaced by its titlecase equivalent, if any. Other characters are unchanged. Note that most lowercase characters have the same character as both uppercase and titlecase equivalents.

(string-titlecase "--capitalize tHIS sentence.") => "--Capitalize This Sentence." (string-titlecase "see Spot run. see Nix run.") => "See Spot Run. See Nix Run." (string-titlecase "3com makes routers.") => "3Com Makes Routers."