This is a proposal for a WG2 blob (bytevector) API. Blobs are a disjoint type. They have no native interpretation, but parts of them can be interpreted as one of a variety of binary types. The conceit is that everything is a separate procedure with minimal arguments; this makes for a lot of procedures, but each one can be easily inlined by even a very dumb compiler, providing high efficiency.
Basic procedures
- (blob? obj)
- Returns #t if obj is a blob.
- (make-blob n)
- Returns a newly allocated blob containing n bytes.
- (blob-length blob)
- Returns the length of blob in bytes.
- (copy-blob blob)
- Returns a newly allocated blob containing the same bytes as blob.
- (subblob blob from to)
- Returns a newly allocated blob containing the bytes in blob starting at start (inclusive) and ending at end (exclusive).
- (copy-blob! from to)
- Copy blob from on top of blob to, which must not be shorter.
- (copy-partial-blob! from start end to at)
- Copy the part of from starting at start and ending at end into to starting at at.
Because there is no preferred way to interpret the data in a blob, there is no blob function analogous to list or vector and no second argument to make-blob.
Numeric procedures
- (blob-<type><endian>-ref blob n)
- Returns a Scheme number corresponding to the binary value encoded according to type beginning at offset n in blob.
- (blob-<type><endian>-set! blob n v)
- Converts v to a binary value encoded according to type and places it into blob beginning at offset n.
The types are:
- u8
- unsigned 8-bit integer
- s8
- signed 8-bit integer
- u16
- unsigned 16-bit integer
- s16
- signed 16-bit integer
- u32
- unsigned 32-bit integer
- s32
- signed 32-bit integer
- u64
- unsigned 64-bit integer
- s64
- signed 64-bit integer
- u128
- unsigned 128-bit integer
- s128
- signed 128-bit integer
- f32
- 32-bit IEEE float
- fn32
- 32-bit native float (may not be IEEE)
- f64
- 64-bit IEEE float in native endianism
- fn64
- 64-bit native float (may not be IEEE)
- c64
- 64-bit complex number (two 32-bit IEEE floats)
- cn64
- 64-bit complex number (two 32-bit native floats, may not be IEEE)
- c128
- 128-bit complex number (two 64-bit IEEE floats)
- cn128
- 128-bit complex number (two 64-bit native floats, may not be IEEE)
The endianism values are:
- (empty)
- Native endianism (system-dependent)
- le
- Little-endianism
- be
- Big-endianism
Endianism is not applicable to the following types: s8 u8 fn32 fn64 cn64 cn128
String procedures
- (blob-<encoding>-ref blob n l)
- Returns a Scheme string corresponding to the binary value encoded according to encoding beginning at offset n in blob and continuing for l bytes.
- (blob-<encoding>-set! blob n v)
- Converts v to a binary string encoded according to encoding and places it into blob beginning at offset n. Returns the number of bytes encoded.
The encodings are:
- utf8
- UTF-8 encoding
- utf16
- UTF-16 encoding (respects BOM if present, defaults to native encoding otherwise)
- utf16be
- UTF-16BE encoding (treats BOM as a normal character)
- utf16le
- UTF-16LE encoding (treats BOM as a normal character)
Issues
Pick one:
- Offsets are in bytes and can be arbitrary
- Offsets are in bytes but must be naturally aligned (divisible by n for an n-byte value)
- Offsets are in n-byte sub-blobs (forces natural alignment, SRFI-4 style)
Should blob=? be provided?
I've trashed the UTF-32 conversions because nobody uses UTF-32. They can come back if somebody needs them.
WG1
I propose that WG1 provide blob?, make-blob, blob-length, copy-blob, copy-blob!, blob-u8-ref, blob-u8-set!, and possibly subblob and copy-partial-blob! as well.