?? library_6.html
字號:
beginning at <VAR>string</VAR> to its corresponding wide character code. It
stores the result in <CODE>*<VAR>result</VAR></CODE>.
<P>
<CODE>mbtowc</CODE> never examines more than <VAR>size</VAR> bytes. (The idea is
to supply for <VAR>size</VAR> the number of bytes of data you have in hand.)
<P>
<CODE>mbtowc</CODE> with non-null <VAR>string</VAR> distinguishes three
possibilities: the first <VAR>size</VAR> bytes at <VAR>string</VAR> start with
valid multibyte character, they start with an invalid byte sequence or
just part of a character, or <VAR>string</VAR> points to an empty string (a
null character).
<P>
For a valid multibyte character, <CODE>mbtowc</CODE> converts it to a wide
character and stores that in <CODE>*<VAR>result</VAR></CODE>, and returns the
number of bytes in that character (always at least <CODE>1</CODE>, and never
more than <VAR>size</VAR>).
<P>
For an invalid byte sequence, <CODE>mbtowc</CODE> returns <CODE>-1</CODE>. For an
empty string, it returns <CODE>0</CODE>, also storing <CODE>0</CODE> in
<CODE>*<VAR>result</VAR></CODE>.
<P>
If the multibyte character code uses shift characters, then
<CODE>mbtowc</CODE> maintains and updates a shift state as it scans. If you
call <CODE>mbtowc</CODE> with a null pointer for <VAR>string</VAR>, that
initializes the shift state to its standard initial value. It also
returns nonzero if the multibyte character code in use actually has a
shift state. See section <A HREF="library_6.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC75">Multibyte Codes Using Shift Sequences</A>.
<P>
<A NAME="IDX355"></A>
<U>Function:</U> int <B>wctomb</B> <I>(char *<VAR>string</VAR>, wchar_t <VAR>wchar</VAR>)</I><P>
The <CODE>wctomb</CODE> ("wide character to multibyte") function converts
the wide character code <VAR>wchar</VAR> to its corresponding multibyte
character sequence, and stores the result in bytes starting at
<VAR>string</VAR>. At most <CODE>MB_CUR_MAX</CODE> characters are stored.
<P>
<CODE>wctomb</CODE> with non-null <VAR>string</VAR> distinguishes three
possibilities for <VAR>wchar</VAR>: a valid wide character code (one that can
be translated to a multibyte character), an invalid code, and <CODE>0</CODE>.
<P>
Given a valid code, <CODE>wctomb</CODE> converts it to a multibyte character,
storing the bytes starting at <VAR>string</VAR>. Then it returns the number
of bytes in that character (always at least <CODE>1</CODE>, and never more
than <CODE>MB_CUR_MAX</CODE>).
<P>
If <VAR>wchar</VAR> is an invalid wide character code, <CODE>wctomb</CODE> returns
<CODE>-1</CODE>. If <VAR>wchar</VAR> is <CODE>0</CODE>, it returns <CODE>0</CODE>, also
storing <CODE>0</CODE> in <CODE>*<VAR>string</VAR></CODE>.
<P>
If the multibyte character code uses shift characters, then
<CODE>wctomb</CODE> maintains and updates a shift state as it scans. If you
call <CODE>wctomb</CODE> with a null pointer for <VAR>string</VAR>, that
initializes the shift state to its standard initial value. It also
returns nonzero if the multibyte character code in use actually has a
shift state. See section <A HREF="library_6.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC75">Multibyte Codes Using Shift Sequences</A>.
<P>
Calling this function with a <VAR>wchar</VAR> argument of zero when
<VAR>string</VAR> is not null has the side-effect of reinitializing the
stored shift state <EM>as well as</EM> storing the multibyte character
<CODE>0</CODE> and returning <CODE>0</CODE>.
<P>
<H2><A NAME="SEC74" HREF="library_toc.html#SEC74" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC74">Example of Character-by-Character Conversion</A></H2>
<P>
Here is an example that reads multibyte character text from descriptor
<CODE>input</CODE> and writes the corresponding wide characters to descriptor
<CODE>output</CODE>. We need to convert characters one by one for this
example because <CODE>mbstowcs</CODE> is unable to continue past a null
character, and cannot cope with an apparently invalid partial character
by reading more input.
<P>
<PRE>
int
file_mbstowcs (int input, int output)
{
char buffer[BUFSIZ + MB_LEN_MAX];
int filled = 0;
int eof = 0;
while (!eof)
{
int nread;
int nwrite;
char *inp = buffer;
wchar_t outbuf[BUFSIZ];
wchar_t *outp = outbuf;
/* Fill up the buffer from the input file. */
nread = read (input, buffer + filled, BUFSIZ);
if (nread < 0) {
perror ("read");
return 0;
}
/* If we reach end of file, make a note to read no more. */
if (nread == 0)
eof = 1;
/* <CODE>filled</CODE> is now the number of bytes in <CODE>buffer</CODE>. */
filled += nread;
/* Convert those bytes to wide characters--as many as we can. */
while (1)
{
int thislen = mbtowc (outp, inp, filled);
/* Stop converting at invalid character;
this can mean we have read just the first part
of a valid character. */
if (thislen == -1)
break;
/* Treat null character like any other,
but also reset shift state. */
if (thislen == 0) {
thislen = 1;
mbtowc (NULL, NULL, 0);
}
/* Advance past this character. */
inp += thislen;
filled -= thislen;
outp++;
}
/* Write the wide characters we just made. */
nwrite = write (output, outbuf,
(outp - outbuf) * sizeof (wchar_t));
if (nwrite < 0)
{
perror ("write");
return 0;
}
/* See if we have a <EM>real</EM> invalid character. */
if ((eof && filled > 0) || filled >= MB_CUR_MAX)
{
error ("invalid multibyte character");
return 0;
}
/* If any characters must be carried forward,
put them at the beginning of <CODE>buffer</CODE>. */
if (filled > 0)
memcpy (inp, buffer, filled);
}
}
return 1;
}
</PRE>
<P>
<H2><A NAME="SEC75" HREF="library_toc.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC75">Multibyte Codes Using Shift Sequences</A></H2>
<P>
In some multibyte character codes, the <EM>meaning</EM> of any particular
byte sequence is not fixed; it depends on what other sequences have come
earlier in the same string. Typically there are just a few sequences
that can change the meaning of other sequences; these few are called
<DFN>shift sequences</DFN> and we say that they set the <DFN>shift state</DFN> for
other sequences that follow.
<P>
To illustrate shift state and shift sequences, suppose we decide that
the sequence <CODE>0200</CODE> (just one byte) enters Japanese mode, in which
pairs of bytes in the range from <CODE>0240</CODE> to <CODE>0377</CODE> are single
characters, while <CODE>0201</CODE> enters Latin-1 mode, in which single bytes
in the range from <CODE>0240</CODE> to <CODE>0377</CODE> are characters, and
interpreted according to the ISO Latin-1 character set. This is a
multibyte code which has two alternative shift states ("Japanese mode"
and "Latin-1 mode"), and two shift sequences that specify particular
shift states.
<P>
When the multibyte character code in use has shift states, then
<CODE>mblen</CODE>, <CODE>mbtowc</CODE> and <CODE>wctomb</CODE> must maintain and update
the current shift state as they scan the string. To make this work
properly, you must follow these rules:
<P>
<UL>
<LI>
Before starting to scan a string, call the function with a null pointer
for the multibyte character address--for example, <CODE>mblen (NULL,
0)</CODE>. This initializes the shift state to its standard initial value.
<P>
<LI>
Scan the string one character at a time, in order. Do not "back up"
and rescan characters already scanned, and do not intersperse the
processing of different strings.
</UL>
<P>
Here is an example of using <CODE>mblen</CODE> following these rules:
<P>
<PRE>
void
scan_string (char *s)
{
int length = strlen (s);
/* Initialize shift state. */
mblen (NULL, 0);
while (1)
{
int thischar = mblen (s, length);
/* Deal with end of string and invalid characters. */
if (thischar == 0)
break;
if (thischar == -1)
{
error ("invalid multibyte character");
break;
}
/* Advance past this character. */
s += thischar;
length -= thischar;
}
}
</PRE>
<P>
The functions <CODE>mblen</CODE>, <CODE>mbtowc</CODE> and <CODE>wctomb</CODE> are not
reentrant when using a multibyte code that uses a shift state. However,
no other library functions call these functions, so you don't have to
worry that the shift state will be changed mysteriously.
<P>Go to the <A HREF="library_5.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_5.html">previous</A>, <A HREF="library_7.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_7.html">next</A> section.<P>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -