?? library_5.html
字號:
<!-- This HTML file has been created by texi2html 1.27
from library.texinfo on 3 March 1994 -->
<TITLE>The GNU C Library - String and Array Utilities</TITLE>
<P>Go to the <A HREF="library_4.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_4.html">previous</A>, <A HREF="library_6.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html">next</A> section.<P>
<H1><A NAME="SEC57" HREF="library_toc.html#SEC57" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC57">String and Array Utilities</A></H1>
<P>
Operations on strings (or arrays of characters) are an important part of
many programs. The GNU C library provides an extensive set of string
utility functions, including functions for copying, concatenating,
comparing, and searching strings. Many of these functions can also
operate on arbitrary regions of storage; for example, the <CODE>memcpy</CODE>
function can be used to copy the contents of any kind of array.
<P>
It's fairly common for beginning C programmers to "reinvent the wheel"
by duplicating this functionality in their own code, but it pays to
become familiar with the library functions and to make use of them,
since this offers benefits in maintenance, efficiency, and portability.
<P>
For instance, you could easily compare one string to another in two
lines of C code, but if you use the built-in <CODE>strcmp</CODE> function,
you're less likely to make a mistake. And, since these library
functions are typically highly optimized, your program may run faster
too.
<P>
<A NAME="IDX267"></A>
<H2><A NAME="SEC58" HREF="library_toc.html#SEC58" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC58">Representation of Strings</A></H2>
<P>
This section is a quick summary of string concepts for beginning C
programmers. It describes how character strings are represented in C
and some common pitfalls. If you are already familiar with this
material, you can skip this section.
<A NAME="IDX268"></A>
<A NAME="IDX269"></A>
<P>
A <DFN>string</DFN> is an array of <CODE>char</CODE> objects. But string-valued
variables are usually declared to be pointers of type <CODE>char *</CODE>.
Such variables do not include space for the text of a string; that has
to be stored somewhere else--in an array variable, a string constant,
or dynamically allocated memory (see section <A HREF="library_3.html#SEC18" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_3.html#SEC18">Memory Allocation</A>). It's up to
you to store the address of the chosen memory space into the pointer
variable. Alternatively you can store a <DFN>null pointer</DFN> in the
pointer variable. The null pointer does not point anywhere, so
attempting to reference the string it points to gets an error.
<P>
By convention, a <DFN>null character</DFN>, <CODE>'\0'</CODE>, marks the end of a
string. For example, in testing to see whether the <CODE>char *</CODE>
variable <VAR>p</VAR> points to a null character marking the end of a string,
you can write <CODE>!*<VAR>p</VAR></CODE> or <CODE>*<VAR>p</VAR> == '\0'</CODE>.
<P>
A null character is quite different conceptually from a null pointer,
although both are represented by the integer <CODE>0</CODE>.
<A NAME="IDX270"></A>
<P>
<DFN>String literals</DFN> appear in C program source as strings of
characters between double-quote characters (<SAMP>`"'</SAMP>). In ANSI C,
string literals can also be formed by <DFN>string concatenation</DFN>:
<CODE>"a" "b"</CODE> is the same as <CODE>"ab"</CODE>. Modification of string
literals is not allowed by the GNU C compiler, because literals
are placed in read-only storage.
<P>
Character arrays that are declared <CODE>const</CODE> cannot be modified
either. It's generally good style to declare non-modifiable string
pointers to be of type <CODE>const char *</CODE>, since this often allows the
C compiler to detect accidental modifications as well as providing some
amount of documentation about what your program intends to do with the
string.
<P>
The amount of memory allocated for the character array may extend past
the null character that normally marks the end of the string. In this
document, the term <DFN>allocation size</DFN> is always used to refer to the
total amount of memory allocated for the string, while the term
<DFN>length</DFN> refers to the number of characters up to (but not
including) the terminating null character.
<A NAME="IDX272"></A>
<A NAME="IDX273"></A>
<A NAME="IDX274"></A>
<A NAME="IDX275"></A>
<A NAME="IDX271"></A>
<P>
A notorious source of program bugs is trying to put more characters in a
string than fit in its allocated size. When writing code that extends
strings or moves characters into a pre-allocated array, you should be
very careful to keep track of the length of the text and make explicit
checks for overflowing the array. Many of the library functions
<EM>do not</EM> do this for you! Remember also that you need to allocate
an extra byte to hold the null character that marks the end of the
string.
<P>
<H2><A NAME="SEC59" HREF="library_toc.html#SEC59" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC59">String/Array Conventions</A></H2>
<P>
This chapter describes both functions that work on arbitrary arrays or
blocks of memory, and functions that are specific to null-terminated
arrays of characters.
<P>
Functions that operate on arbitrary blocks of memory have names
beginning with <SAMP>`mem'</SAMP> (such as <CODE>memcpy</CODE>) and invariably take an
argument which specifies the size (in bytes) of the block of memory to
operate on. The array arguments and return values for these functions
have type <CODE>void *</CODE>, and as a matter of style, the elements of these
arrays are referred to as "bytes". You can pass any kind of pointer
to these functions, and the <CODE>sizeof</CODE> operator is useful in
computing the value for the size argument.
<P>
In contrast, functions that operate specifically on strings have names
beginning with <SAMP>`str'</SAMP> (such as <CODE>strcpy</CODE>) and look for a null
character to terminate the string instead of requiring an explicit size
argument to be passed. (Some of these functions accept a specified
maximum length, but they also check for premature termination with a
null character.) The array arguments and return values for these
functions have type <CODE>char *</CODE>, and the array elements are referred
to as "characters".
<P>
In many cases, there are both <SAMP>`mem'</SAMP> and <SAMP>`str'</SAMP> versions of a
function. The one that is more appropriate to use depends on the exact
situation. When your program is manipulating arbitrary arrays or blocks of
storage, then you should always use the <SAMP>`mem'</SAMP> functions. On the
other hand, when you are manipulating null-terminated strings it is
usually more convenient to use the <SAMP>`str'</SAMP> functions, unless you
already know the length of the string in advance.
<P>
<H2><A NAME="SEC60" HREF="library_toc.html#SEC60" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC60">String Length</A></H2>
<P>
You can get the length of a string using the <CODE>strlen</CODE> function.
This function is declared in the header file <TT>`string.h'</TT>.
<A NAME="IDX276"></A>
<P>
<A NAME="IDX277"></A>
<U>Function:</U> size_t <B>strlen</B> <I>(const char *<VAR>s</VAR>)</I><P>
The <CODE>strlen</CODE> function returns the length of the null-terminated
string <VAR>s</VAR>. (In other words, it returns the offset of the terminating
null character within the array.)
<P>
For example,
<PRE>
strlen ("hello, world")
=> 12
</PRE>
<P>
When applied to a character array, the <CODE>strlen</CODE> function returns
the length of the string stored there, not its allocation size. You can
get the allocation size of the character array that holds a string using
the <CODE>sizeof</CODE> operator:
<P>
<PRE>
char string[32] = "hello, world";
sizeof (string)
=> 32
strlen (string)
=> 12
</PRE>
<P>
<H2><A NAME="SEC61" HREF="library_toc.html#SEC61" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC61">Copying and Concatenation</A></H2>
<P>
You can use the functions described in this section to copy the contents
of strings and arrays, or to append the contents of one string to
another. These functions are declared in the header file
<TT>`string.h'</TT>.
<A NAME="IDX279"></A>
<A NAME="IDX280"></A>
<A NAME="IDX281"></A>
<A NAME="IDX282"></A>
<A NAME="IDX283"></A>
<A NAME="IDX278"></A>
<P>
A helpful way to remember the ordering of the arguments to the functions
in this section is that it corresponds to an assignment expression, with
the destination array specified to the left of the source array. All
of these functions return the address of the destination array.
<P>
Most of these functions do not work properly if the source and
destination arrays overlap. For example, if the beginning of the
destination array overlaps the end of the source array, the original
contents of that part of the source array may get overwritten before it
is copied. Even worse, in the case of the string functions, the null
character marking the end of the string may be lost, and the copy
function might get stuck in a loop trashing all the memory allocated to
your program.
<P>
All functions that have problems copying between overlapping arrays are
explicitly identified in this manual. In addition to functions in this
section, there are a few others like <CODE>sprintf</CODE> (see section <A HREF="library_11.html#SEC135" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_11.html#SEC135">Formatted Output Functions</A>) and <CODE>scanf</CODE> (see section <A HREF="library_11.html#SEC153" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_11.html#SEC153">Formatted Input Functions</A>).
<P>
<A NAME="IDX284"></A>
<U>Function:</U> void * <B>memcpy</B> <I>(void *<VAR>to</VAR>, const void *<VAR>from</VAR>, size_t <VAR>size</VAR>)</I><P>
The <CODE>memcpy</CODE> function copies <VAR>size</VAR> bytes from the object
beginning at <VAR>from</VAR> into the object beginning at <VAR>to</VAR>. The
behavior of this function is undefined if the two arrays <VAR>to</VAR> and
<VAR>from</VAR> overlap; use <CODE>memmove</CODE> instead if overlapping is possible.
<P>
The value returned by <CODE>memcpy</CODE> is the value of <VAR>to</VAR>.
<P>
Here is an example of how you might use <CODE>memcpy</CODE> to copy the
contents of a <CODE>struct</CODE>:
<P>
<PRE>
struct foo *old, *new;
...
memcpy (new, old, sizeof(struct foo));
</PRE>
<P>
<A NAME="IDX285"></A>
<U>Function:</U> void * <B>memmove</B> <I>(void *<VAR>to</VAR>, const void *<VAR>from</VAR>, size_t <VAR>size</VAR>)</I><P>
<CODE>memmove</CODE> copies the <VAR>size</VAR> bytes at <VAR>from</VAR> into the
<VAR>size</VAR> bytes at <VAR>to</VAR>, even if those two blocks of space
overlap. In the case of overlap, <CODE>memmove</CODE> is careful to copy the
original values of the bytes in the block at <VAR>from</VAR>, including those
bytes which also belong to the block at <VAR>to</VAR>.
<P>
<A NAME="IDX286"></A>
<U>Function:</U> void * <B>memccpy</B> <I>(void *<VAR>to</VAR>, const void *<VAR>from</VAR>, int <VAR>c</VAR>, size_t <VAR>size</VAR>)</I><P>
This function copies no more than <VAR>size</VAR> bytes from <VAR>from</VAR> to
<VAR>to</VAR>, stopping if a byte matching <VAR>c</VAR> is found. The return
value is a pointer into <VAR>to</VAR> one byte past where <VAR>c</VAR> was copied,
or a null pointer if no byte matching <VAR>c</VAR> appeared in the first
<VAR>size</VAR> bytes of <VAR>from</VAR>.
<P>
<A NAME="IDX287"></A>
<U>Function:</U> void * <B>memset</B> <I>(void *<VAR>block</VAR>, int <VAR>c</VAR>, size_t <VAR>size</VAR>)</I><P>
This function copies the value of <VAR>c</VAR> (converted to an
<CODE>unsigned char</CODE>) into each of the first <VAR>size</VAR> bytes of the
object beginning at <VAR>block</VAR>. It returns the value of <VAR>block</VAR>.
<P>
<A NAME="IDX288"></A>
<U>Function:</U> char * <B>strcpy</B> <I>(char *<VAR>to</VAR>, const char *<VAR>from</VAR>)</I><P>
This copies characters from the string <VAR>from</VAR> (up to and including
the terminating null character) into the string <VAR>to</VAR>. Like
<CODE>memcpy</CODE>, this function has undefined results if the strings
overlap. The return value is the value of <VAR>to</VAR>.
<P>
<A NAME="IDX289"></A>
<U>Function:</U> char * <B>strncpy</B> <I>(char *<VAR>to</VAR>, const char *<VAR>from</VAR>, size_t <VAR>size</VAR>)</I><P>
This function is similar to <CODE>strcpy</CODE> but always copies exactly
<VAR>size</VAR> characters into <VAR>to</VAR>.
<P>
If the length of <VAR>from</VAR> is more than <VAR>size</VAR>, then <CODE>strncpy</CODE>
copies just the first <VAR>size</VAR> characters.
<P>
If the length of <VAR>from</VAR> is less than <VAR>size</VAR>, then <CODE>strncpy</CODE>
copies all of <VAR>from</VAR>, followed by enough null characters to add up
to <VAR>size</VAR> characters in all. This behavior is rarely useful, but it
is specified by the ANSI C standard.
<P>
The behavior of <CODE>strncpy</CODE> is undefined if the strings overlap.
<P>
Using <CODE>strncpy</CODE> as opposed to <CODE>strcpy</CODE> is a way to avoid bugs
relating to writing past the end of the allocated space for <VAR>to</VAR>.
However, it can also make your program much slower in one common case:
copying a string which is probably small into a potentially large buffer.
In this case, <VAR>size</VAR> may be large, and when it is, <CODE>strncpy</CODE> will
waste a considerable amount of time copying null characters.
<P>
<A NAME="IDX290"></A>
<U>Function:</U> char * <B>strdup</B> <I>(const char *<VAR>s</VAR>)</I><P>
This function copies the null-terminated string <VAR>s</VAR> into a newly
allocated string. The string is allocated using <CODE>malloc</CODE>; see
section <A HREF="library_3.html#SEC21" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_3.html#SEC21">Unconstrained Allocation</A>. If <CODE>malloc</CODE> cannot allocate space
for the new string, <CODE>strdup</CODE> returns a null pointer. Otherwise it
returns a pointer to the new string.
<P>
<A NAME="IDX291"></A>
<U>Function:</U> char * <B>stpcpy</B> <I>(char *<VAR>to</VAR>, const char *<VAR>from</VAR>)</I><P>
This function is like <CODE>strcpy</CODE>, except that it returns a pointer to
the end of the string <VAR>to</VAR> (that is, the address of the terminating
null character) rather than the beginning.
<P>
For example, this program uses <CODE>stpcpy</CODE> to concatenate <SAMP>`foo'</SAMP>
and <SAMP>`bar'</SAMP> to produce <SAMP>`foobar'</SAMP>, which it then prints.
<P>
<PRE>
#include <string.h>
int
main (void)
{
char *to = buffer;
to = stpcpy (to, "foo");
to = stpcpy (to, "bar");
printf ("%s\n", buffer);
}
</PRE>
<P>
This function is not part of the ANSI or POSIX standards, and is not
customary on Unix systems, but we did not invent it either. Perhaps it
comes from MS-DOG.
<P>
Its behavior is undefined if the strings overlap.
<P>
<A NAME="IDX292"></A>
<U>Function:</U> char * <B>strcat</B> <I>(char *<VAR>to</VAR>, const char *<VAR>from</VAR>)</I><P>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -