DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

mbchar(S-osr5)


mbchar: mbtowc, wctomb, mblen, mbrtowc, wcrtomb, mbrlen -- multibyte character handling

Syntax

cc ...-lc

#include <stdlib.h>

int mbtowc(wchar_t *pwc, const char *s, size_t n); int wctomb(char *s, wchar_t wchar); int mblen(const char *s, size_t n);

#include <wchar.h>

int mbrtowc(wchar_t *pwc, const char *s, size_t n, mbstate_t *ps); int wcrtomb(char *s, wchar_t wc, mbstate_t *ps); int mbrlen(const char *s, size_t n, mbstate_t *ps);

Description

mbtowc- convert a multibyte character to a wide character

wctomb- convert a wide character to a multibyte character

mblen- determine the number of bytes in a multibye character

mbrtowc- convert a multibyte character to a wide character (restartable)

wcrtomb- convert a wide character to a multibyte character (restartable)

mbrlen- determine the number of bytes in a multibye character (restartable)

Traditional computer systems assumed that a character of a natural language can be represented in one byte of storage. However, languages such as Japanese, Korean, or Chinese, require more than one byte of storage to represent a character. These characters are called ``multibyte characters''. Such character sets are often called ``extended character sets''.

The number of bytes of storage required by a character in a given locale is defined in the LC_CTYPE category of the locale (see setlocale(S-osr5)). The maximum number of bytes in a multibyte character in an extended character set in the current locale is given by the macro, MB_CUR_MAX, defined in stdlib.h.

Multibyte character handling functions provide the means of translating multibyte characters into a bit pattern which is stored in a data type, wchar_t.

mbtowc(S-osr5) determines the number of bytes that comprise the multibyte character pointed to by s. If pwc is not a null pointer, mbtowc( ) converts the multibyte character to a wide character and places the result in the object pointed to by pwc. (The value of the wide character corresponding to the null character is zero.) At most n bytes are examined, starting at the byte pointed to by s.

wctomb(S-osr5) determines the number of bytes needed to represent the multibyte character corresponding to the code whose value is wchar, and, if s is not a null pointer, stores the multibyte character representation in the array pointed to by s. At most MB_CUR_MAX bytes are stored.

mblen(S-osr5) determines the number of bytes comprising the multibyte character pointed to by s. It is equivalent to:

mbtowc((wchar_t *)0, s, n)

The functions mbrtowc( ), wcrtomb( ), and mbrlen( ) are essentially the same as the above three functions, except that the conversion state on entry is specified by the mbstate_t object pointed to by ps:

mbrlen( ) is equivalent to the following call:

mbrtowc((wchar_t *)0, s, n, ps != 0 ? ps : &internal)

where internal is the address of the internal mbstate_t object for mbrlen( ). ps can also be a null pointer for mbrtowc( ) and wcrtomb( ).

Return values

mbtowc( ) returns zero if s is a null pointer or if s is not a null pointer but points to the null character. If s is not a null pointer and the next n or fewer bytes form a valid multibyte character, mbtowc( ) returns the number of bytes that comprise the converted multibyte character; otherwise, s does not point to a valid multibyte character and mbtowc( ) returns -1 .

If s is a null pointer, wctomb( ) returns zero. If s is not a null pointer, wctomb( ) returns -1 if the value of wchar does not correspond to a valid multibyte character. Otherwise it returns the number of bytes that comprise the multibyte character corresponding to the value of wchar.

mbrlen( ) returns a value between -2 and n, inclusive; see mbrtowc( ).

If s is a null pointer, mbrtowc( ) and wcrtomb( ) return the number of bytes necessary to enter the initial shift state. The value returned cannot be greater than MB_CUR_MAX.

If s is not a null pointer, wcrtomb( ) returns the number of bytes stored in the array object (including any shift sequences) when wc is a valid wide character; otherwise (when wc is not a valid wide character), an encoding error occurs, the value of the macro [EILSEQ] is stored in errno and -1 is returned, but the conversion state is unchanged.

If s is not a null pointer, mbrtowc( ) returns the first of the following that applies:


0
if s points to the null character.

positive
if the next n or fewer bytes form a valid multibyte character; the value returned is the number of bytes that constitute that multibyte character.

-2
if the next n bytes form an incomplete (but potentially valid) multibyte character, and all n bytes have been processed; this situation does not apply since the multibyte encoding is stateless.

-1
if an encoding error occurs (when the next n or fewer bytes do not form a complete and valid multibyte character); the value of the macro [EILSEQ] is stored in errno, but the conversion state is unchanged.

Diagnostics

If the following condition occurs, mbrtowc( ) or wcrtomb( ) returns -1 and sets errno to the corresponding value:

[EILSEQ]
the last character processed was not complete and valid.

See also

environ(M), mbstring(S-osr5), setlocale(S-osr5), iswctype(S-osr5)

Standards conformance

mbtowc(S-osr5), wctomb(S-osr5), and mblen(S-osr5) are conformant with:

ANSI X3.159-1989 Programming Language -- C,
X/Open CAE Specification, System Interfaces and Headers, Issue 4, 1992,
and IEEE POSIX Std 1003.1-1990 System Application Program Interface (API) [C Language] (ISO/IEC 9945-1) .

mbrtowc(S-osr5), wcrtomb(S-osr5), and mbrlen(S-osr5) are not part of any currently supported standard; they were developed by UNIX System Laboratories, Inc. and are maintained by The SCO Group.


© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 02 June 2005