This site is a static rendering of the Trac instance that was used by R7RS-WG1 for its work on R7RS-small (PDF), which was ratified in 2013. For more information, see Home. For a version of this page that may be more recent, see StringSlicesCowan in WG2's repo for R7RS-large.

String­Slices­Cowan

cowan
2014-12-16 01:04:55
8history
source

String span library

This is a library for manipulating textual content based on string spans, which are conceptually references to parts of one or more Scheme string, and string cursors, which are pointers into strings. It is not defined whether the string span type is disjoint from strings, or whether string cursors are a disjoint type at all. String spans are immutable, and it is an error to mutate the underlying string.

In addition, string cursors may or may not be the same as character-based indexes into strings. For example, in an implementation whose internal representation of strings is UTF-8, string cursors may be indexes of individual bytes in the string. However, the operations provided here (with the exception of those in the Compatibility section) are entirely independent of the character repertoire supported by the implementation.

This proposal also contains a useful subset of SRFI 13, which manipulates strings directly with some allowances for shared substrings (which are provided only by Guile). The string operations of this proposal are defined in terms of the string span operations. Unlike SRFI 13, it does not provide start and end arguments, as their functionality is subsumed by spans. In addition, the low-level procedures are not provided, nor are any mutation operations. Procedures with the same names and functions as SRFI-13 procedures are marked [SRFI 13]; note that they don't necessarily support all of the arguments of the SRFI 13 versions.

Procedures marked [R7RS-small] are available in the small language, and are not exported by implementations of this proposal. They are included only for completeness.

All predicates passed to procedures defined in this proposal may be called in any order and any number of times, except as otherwise noted.

Issues

  1. Allow negative indices in constructors?
  1. Titlecase doesn't really fit; keep it?
  1. Keep functional update?
  1. Keep string trees?
  1. Keep compatibility routines, possibly in a different package?

Specification

With the exception of the constructors, all the procedures in this proposal exist in pairs: one that accepts and produces string spans and one that accepts and produces strings. Only the string span version is documented in full; the string version should be understood as accepting the same non-span arguments, performing the same operations, and providing the same non-span results.

String constructors

(make-string k [ char ]) [R7RS-small]

Returns a string containing k characters, all of which are char. If char is omitted, the contents of the string are implementation-dependent.

(string char ...) [R7RS-small]

(string-unfold stop? mapper successor [ seed ])

(string-unfold-right stop? mapper successor [ seed ])

(span->string span)

Returns a string which contains the characters of span in order.

String span constructors

(make-span string start end)

Returns a string span which contains the characters of string in order from start (inclusive) to end (exclusive).

(span char ...)

(subspan span start end)

(string->span string)

Returns a string span which contains the characters of string in order.

(span/cursors string start-cursor end-cursor)

(string-subspan/cursors string start-cursor end-cursor)

(span-transform proc span obj ...)

Proc is a procedure which accepts a string as its first argument and returns a string. It is invoked on a string which contains the characters of span in order plus the obj arguments, if any. The resulting string is returned as a string span by span-transform. This procedure allows string-based procedures to be easily used in an environment that provides and expects spans.

Predicates

(span? obj)

(string? obj) [R7RS-small]

Returns #t if obj is a string span, and #f otherwise.

(span-null? span)

(string-null? string) [SRFI 13]

Returns #t if span contains zero characters, and #f otherwise.

(span-every pred span)

(string-every pred string) [SRFI 13]

Returns #t if pred returns true for every character in span'', and #f otherwise.

(span-any pred span)

(string-any pred string) [SRFI 13]

Returns #t if pred returns true for any character in span'', and #f otherwise.

(is-char? char)

Returns a predicate which accepts one argument, and returns #t if the argument is the same as char (in the sense of char=?) and #f otherwise.

(in-char-set? char-set)

Returns a predicate which accepts one argument, and returns #t if the argument is an element of char-set, a SRFI 14 character set, and #f otherwise.

Selection

(span-ref span k)

(string-ref string k) [R7RS-small]

Returns the 'k'th character of span, starting with 0. It is an error if k is not a non-negative exact integer less than the length of span.

(span-take span n)

(string-take string n) [SRFI 13]

Returns a string span which contains the first n characters of span.

(span-take-right span n)

(string-take-right string n) [SRFI 13]

Returns a string span which contains the last n characters of span.

(span-drop span n)

(string-drop string n) [SRFI 13]

Returns a string span which contains all but the first n characters of span.

(span-drop-right span n)

(string-drop-right string n) [SRFI 13]

Returns a string span which contains all but the last n characters of span.

(span-split-at span n)

(string-split-at string n) [SRFI 13]

Returns two values, a string span containing the first n characters of span, and another string span containing the remaining characters of span.

(span-replicate span from to)

(string-replicate string from to)

Padding, trimming, and compressing

(span-pad span len [ char ])

(string-pad string len [ char ]) [SRFI 13]

(span-trim span len [ char ])

(string-trim string len [ char ]) [SRFI 13]

(span-trim-right span len [ char ])

(string-trim-right string len [ char ]) [SRFI 13]

(span-trim-both span len [ char ])

(string-trim-both string len [ char ]) [SRFI 13]

(span-compress span [ char ])

(string-compress string [ char ])

Prefixes and suffixes

(span-prefix span1 span2)

(string-prefix string1 string2)

(span-suffix span1 span2)

(string-suffix string1 string2) [SRFI 13]

(span-prefix-length span1 span2)

(string-prefix-length string1 string2)

(span-suffix-length span1 span2) [SRFI 13]

(string-suffix-length string1 string2)

(span-prefix? span1 span2)

(string-prefix? string1 string2) [SRFI 13]

(span-suffix? span1 span2)

(string-suffix? string1 string2) [SRFI 13]

Searching

(span-count proc span)

(string-count proc string) [SRFI 13]

(span-take-while proc span)

(string-take-while proc string) [SRFI 13]

(span-drop-while proc span)

(string-drop-while proc string) [SRFI 13]

(span-break proc span)

(string-break proc string) [SRFI 13]

(span-drop proc span)

(string-drop proc string) [SRFI 13]

(span-contains span1 span2)

(string-contains string1 string2) [SRFI 13]

The whole string span or string

(span-length span)

(string-length string) [R7RS-small]

(span-copy span)

(string-copy string [ start [ end ] ]) [R7RS-small]

(span-reverse span)

(string-reverse span) [SRFI 13]

(span-append span ...)

(string-append string ...) [R7RS-small]

(span-concatenate list-of-spans)

(string-concatenate list-of-strings) [SRFI 13]

(span-concatenate-reverse list-of-spans)

(string-concatenate-reverse list-of-strings) [SRFI 13]

Folding and mapping

(span-map proc span ...)

(string-map proc string ...) [R7RS-small]

(span-for-each proc span ...)

(string-for-each proc string ...) [R7RS-small]

(span-fold proc nil span)

(string-fold proc nil string) [R7RS-small]

(span-fold-right proc nil span)

(string-fold-right proc nil string) [R7RS-small]

Parsing

(span-split span [sep [ limit'' ] ])

(span-split span [sep [ limit'' ] ])

Returns a list of the words contained in span. If sep (which is also a string span) is omitted, then the words are separated by arbitrary strings of whitespace characters (those on which char-whitespace? returns #t). If sep is supplied, it specifies a string to be used as the word separator. The returned list will then have one more item than the number of non-overlapping occurrences of the separator in the string. If sep is empty, then the returned list contains a list of the characters in span.

If limit is provided, at most that many splits occur, and the remainder of span is returned as the final element of the list (thus, the result will have at most limit + 1 elements). If limit is not specified, then all possible splits are made. It is an error if limit is not a positive exact integer.

Filtering and partitioning

(span-filter proc span) [SRFI 13]

(string-filter proc string)

(span-remove proc span)

(string-remove proc string)

(span-partition proc span)

(string-partition proc string)

Conversion

(span->list span)

(span->vector span)

(string->list string [ start [ end ] ]) [R7RS-small]

(list->string list[ start [ end ] ]) [R7RS-small]

(string->vector string[ start [ end ] ]) [R7RS-small]

(vector->string vector[ start [ end ] ]) [R7RS-small]

String cursors

string-cursor-start string-cursor-end string-cursor-ref string-cursor-next string-cursor-previous string-cursor-forward string-cursor-backward string-cursor-forward-until string-cursor-backward-until string-cursor=? string-cursor<? string-cursor>? string-cursor<=? string-cursor>=? string-cursor->index string-index->cursor string-cursors->string string-cursor-difference

Functional update

span-replace string-replace span-insert string-insert span-delete string-delete

String trees

string-tree->string write-string-tree

Compatibility

span-upcase span-downcase span-foldcase span=? span<? span>? span<=? span>=? span-ci=? span-ci<? span-ci>? span-ci<=? span-ci>=? span-titlecase string-titlecase [SRFI 13]