|
|
#include <stdlib.h>int mbtowc(wchar_t *pwc, const char *s, size_t n); int wctomb(char *s, wchar_t wchar); int mblen(const char *s, size_t n);
#include <wchar.h>
int mbrtowc(wchar_t *pwc, const char *s, size_t n, mbstate_t *ps); int wcrtomb(char *s, wchar_t wc, mbstate_t *ps); int mbrlen(const char *s, size_t n, mbstate_t *ps);
wctomb- convert a wide character to a multibyte character
mblen- determine the number of bytes in a multibye character
mbrtowc- convert a multibyte character to a wide character (restartable)
wcrtomb- convert a wide character to a multibyte character (restartable)
mbrlen- determine the number of bytes in a multibye character (restartable)
Traditional computer systems assumed that a character of a natural language can be represented in one byte of storage. However, languages such as Japanese, Korean, or Chinese, require more than one byte of storage to represent a character. These characters are called ``multibyte characters''. Such character sets are often called ``extended character sets''.
The number of bytes of storage required by a character in a given locale is defined in the LC_CTYPE category of the locale (see setlocale(S-osr5)). The maximum number of bytes in a multibyte character in an extended character set in the current locale is given by the macro, MB_CUR_MAX, defined in stdlib.h.
Multibyte character handling functions provide the means of translating multibyte characters into a bit pattern which is stored in a data type, wchar_t.
mbtowc(S-osr5) determines the number of bytes that comprise the multibyte character pointed to by s. If pwc is not a null pointer, mbtowc( ) converts the multibyte character to a wide character and places the result in the object pointed to by pwc. (The value of the wide character corresponding to the null character is zero.) At most n bytes are examined, starting at the byte pointed to by s.
wctomb(S-osr5) determines the number of bytes needed to represent the multibyte character corresponding to the code whose value is wchar, and, if s is not a null pointer, stores the multibyte character representation in the array pointed to by s. At most MB_CUR_MAX bytes are stored.
mblen(S-osr5)
determines the number of bytes comprising the multibyte
character pointed to by
s.
It is equivalent to:
mbtowc((wchar_t *)0, s, n)
The functions mbrtowc( ), wcrtomb( ), and mbrlen( ) are essentially the same as the above three functions, except that the conversion state on entry is specified by the mbstate_t object pointed to by ps:
where internal is the address of the internal mbstate_t object for mbrlen( ). ps can also be a null pointer for mbrtowc( ) and wcrtomb( ).
If s is a null pointer, wctomb( ) returns zero. If s is not a null pointer, wctomb( ) returns -1 if the value of wchar does not correspond to a valid multibyte character. Otherwise it returns the number of bytes that comprise the multibyte character corresponding to the value of wchar.
mbrlen( ) returns a value between -2 and n, inclusive; see mbrtowc( ).
If s is a null pointer, mbrtowc( ) and wcrtomb( ) return the number of bytes necessary to enter the initial shift state. The value returned cannot be greater than MB_CUR_MAX.
If s is not a null pointer, wcrtomb( ) returns the number of bytes stored in the array object (including any shift sequences) when wc is a valid wide character; otherwise (when wc is not a valid wide character), an encoding error occurs, the value of the macro [EILSEQ] is stored in errno and -1 is returned, but the conversion state is unchanged.
If s is not a null pointer, mbrtowc( ) returns the first of the following that applies:
ANSI X3.159-1989 Programming Language -- C,
X/Open CAE Specification, System Interfaces and Headers, Issue 4, 1992,
and
IEEE POSIX Std 1003.1-1990 System Application Program Interface (API) [C Language] (ISO/IEC 9945-1)
.
mbrtowc(S-osr5), wcrtomb(S-osr5), and mbrlen(S-osr5) are not part of any currently supported standard; they were developed by UNIX System Laboratories, Inc. and are maintained by The SCO Group.