16 Library introduction [library]

16.3 Method of description [description]

16.3.3 Other conventions [conventions] Type descriptions [type.descriptions] Character sequences [character.seq] General [character.seq.general]

The C standard library makes widespread use of characters and character sequences that follow a few uniform conventions:
  • Properties specified as locale-specific may change during program execution by a call to setlocale(int, const char*) ([clocale.syn]), or by a change to a locale object, as described in [locales] and [input.output].
  • The execution character set and the execution wide-character set are supersets of the basic literal character set ([lex.charset]).
    The encodings of the execution character sets and the sets of additional elements (if any) are locale-specific.
    Each element of the execution wide-character set is encoded as a single code unit representable by a value of type wchar_t.
    [Note 1: 
    The encodings of the execution character sets can be unrelated to any literal encoding.
    — end note]
  • A letter is any of the 26 lowercase or 26 uppercase letters in the basic character set.
  • The decimal-point character is the locale-specific (single-byte) character used by functions that convert between a (single-byte) character sequence and a value of one of the floating-point types.
    It is used in the character sequence to denote the beginning of a fractional part.
    It is represented in [support] through [thread] and [depr] by a period, '.', which is also its value in the "C" locale.
  • A character sequence is an array object A that can be declared as T A[N], where T is any of the types char, unsigned char, or signed char ([basic.fundamental]), optionally qualified by any combination of const or volatile.
    The initial elements of the array have defined contents up to and including an element determined by some predicate.
    A character sequence can be designated by a pointer value S that points to its first element. Byte strings [byte.strings]

A null-terminated byte string, or ntbs, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character); no other element in the sequence has the value zero.144
The length of an ntbs is the number of elements that precede the terminating null character.
An empty ntbs has a length of zero.
The value of an ntbs is the sequence of values of the elements up to and including the terminating null character.
A static ntbs is an ntbs with static storage duration.145
Many of the objects manipulated by function signatures declared in <cstring> are character sequences or ntbss.
The size of some of these character sequences is limited by a length value, maintained separately from the character sequence.
A string-literal, such as "abc", is a static ntbs. Multibyte strings [multibyte.strings]

A multibyte character is a sequence of one or more bytes representing the code unit sequence for an encoded character of the execution character set.
A null-terminated multibyte string, or ntmbs, is an ntbs that constitutes a sequence of valid multibyte characters, beginning and ending in the initial shift state.146
A static ntmbs is an ntmbs with static storage duration.
An ntbs that contains characters only from the basic literal character set is also an ntmbs.
Each multibyte character then consists of a single byte.