mbchar(S-osr5)

mbchar: mbtowc, wctomb, mblen, mbrtowc, wcrtomb, mbrlen -- multibyte character handling

Syntax

cc ...-lc

#include <stdlib.h>

int mbtowc(wchar_t *pwc, const char *s, size_t n);
int wctomb(char *s, wchar_t wchar);
int mblen(const char *s, size_t n);

#include <wchar.h>

int mbrtowc(wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);
int wcrtomb(char *s, wchar_t wc, mbstate_t *ps);
int mbrlen(const char *s, size_t n, mbstate_t *ps);

Description

mbtowc- convert a multibyte character to a wide character

wctomb- convert a wide character to a multibyte character

mblen- determine the number of bytes in a multibye character

mbrtowc- convert a multibyte character to a wide character (restartable)

wcrtomb- convert a wide character to a multibyte character (restartable)

mbrlen- determine the number of bytes in a multibye character (restartable)

Traditional computer systems assumed that a character of a natural language can be represented in one byte of storage. However, languages such as Japanese, Korean, or Chinese, require more than one byte of storage to represent a character. These characters are called ``multibyte characters''. Such character sets are often called ``extended character sets''.

The number of bytes of storage required by a character in a given locale is defined in the LC_CTYPE category of the locale (see setlocale(S-osr5)). The maximum number of bytes in a multibyte character in an extended character set in the current locale is given by the macro, MB_CUR_MAX, defined in stdlib.h.

Multibyte character handling functions provide the means of translating multibyte characters into a bit pattern which is stored in a data type, wchar_t.

mbtowc(S-osr5) determines the number of bytes that comprise the multibyte character pointed to by s. If pwc is not a null pointer, mbtowc( ) converts the multibyte character to a wide character and places the result in the object pointed to by pwc. (The value of the wide character corresponding to the null character is zero.) At most n bytes are examined, starting at the byte pointed to by s.

wctomb(S-osr5) determines the number of bytes needed to represent the multibyte character corresponding to the code whose value is wchar, and, if s is not a null pointer, stores the multibyte character representation in the array pointed to by s. At most MB_CUR_MAX bytes are stored.

mblen(S-osr5) determines the number of bytes comprising the multibyte character pointed to by s. It is equivalent to:

mbtowc((wchar_t *)0, s, n)

The functions mbrtowc( ), wcrtomb( ), and mbrlen( ) are essentially the same as the above three functions, except that the conversion state on entry is specified by the mbstate_t object pointed to by ps:

If s is a null pointer, mbrtowc( ) and wcrtomb( ) determine the number of bytes necessary to enter the initial shift state (zero if encodings are not state-dependent or if the initial conversion state is described). The resulting state described is the initial conversion state. In this case, the value of the pwc is ignored.
If s is not a null pointer, mbrtowc( ) determines the number of bytes that are contained in the multibyte character (plus any leading shift sequences) pointed to by s. It produces the value of the corresponding wide character. Then, if pwc is not a null pointer, it stores that value in the object pointed to by pwc. If the corresponding wide character is the null wide character, the resulting state described is the initial conversion state.
If s is not a null pointer, wcrtomb( ) determines the number of bytes needed to represent the multibyte character that corresponds to the wide character given by wc (including any shift sequences). It stores the resulting bytes in the array whose first element is pointed to by s. At most MB_CUR_MAX bytes are stored. If wc is a null wide character, the resulting state described is the initial conversion state.

mbrlen( ) is equivalent to the following call:

mbrtowc((wchar_t *)0, s, n, ps != 0 ? ps : &internal)

where internal is the address of the internal mbstate_t object for mbrlen( ). ps can also be a null pointer for mbrtowc( ) and wcrtomb( ).

Return values

mbtowc( ) returns zero if s is a null pointer or if s is not a null pointer but points to the null character. If s is not a null pointer and the next n or fewer bytes form a valid multibyte character, mbtowc( ) returns the number of bytes that comprise the converted multibyte character; otherwise, s does not point to a valid multibyte character and mbtowc( ) returns -1 .

If s is a null pointer, wctomb( ) returns zero. If s is not a null pointer, wctomb( ) returns -1 if the value of wchar does not correspond to a valid multibyte character. Otherwise it returns the number of bytes that comprise the multibyte character corresponding to the value of wchar.

mbrlen( ) returns a value between -2 and n, inclusive; see mbrtowc( ).

If s is a null pointer, mbrtowc( ) and wcrtomb( ) return the number of bytes necessary to enter the initial shift state. The value returned cannot be greater than MB_CUR_MAX.

If s is not a null pointer, wcrtomb( ) returns the number of bytes stored in the array object (including any shift sequences) when wc is a valid wide character; otherwise (when wc is not a valid wide character), an encoding error occurs, the value of the macro [EILSEQ] is stored in errno and -1 is returned, but the conversion state is unchanged.

If s is not a null pointer, mbrtowc( ) returns the first of the following that applies:

0: if s points to the null character.
positive: if the next n or fewer bytes form a valid multibyte character; the value returned is the number of bytes that constitute that multibyte character.
-2: if the next n bytes form an incomplete (but potentially valid) multibyte character, and all n bytes have been processed; this situation does not apply since the multibyte encoding is stateless.
-1: if an encoding error occurs (when the next n or fewer bytes do not form a complete and valid multibyte character); the value of the macro [EILSEQ] is stored in errno, but the conversion state is unchanged.

Diagnostics

If the following condition occurs, mbrtowc( ) or wcrtomb( ) returns -1 and sets errno to the corresponding value:

[EILSEQ]: the last character processed was not complete and valid.

Standards conformance

mbtowc(S-osr5), wctomb(S-osr5), and mblen(S-osr5) are conformant with:

ANSI X3.159-1989 Programming Language -- C,
X/Open CAE Specification, System Interfaces and Headers, Issue 4, 1992,
and IEEE POSIX Std 1003.1-1990 System Application Program Interface (API) [C Language] (ISO/IEC 9945-1) .