?? pattern.texi
字號:
@node Pattern Matching, I/O Overview, Searching and Sorting, Top@chapter Pattern MatchingThe GNU C Library provides pattern matching facilities for two kinds ofpatterns: regular expressions and file-name wildcards. The library alsoprovides a facility for expanding variable and command references andparsing text into words in the way the shell does.@menu* Wildcard Matching:: Matching a wildcard pattern against a single string.* Globbing:: Finding the files that match a wildcard pattern.* Regular Expressions:: Matching regular expressions against strings.* Word Expansion:: Expanding shell variables, nested commands, arithmetic, and wildcards. This is what the shell does with shell commands.@end menu@node Wildcard Matching@section Wildcard Matching@pindex fnmatch.hThis section describes how to match a wildcard pattern against aparticular string. The result is a yes or no answer: does thestring fit the pattern or not. The symbols described here are alldeclared in @file{fnmatch.h}.@comment fnmatch.h@comment POSIX.2@deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags})This function tests whether the string @var{string} matches the pattern@var{pattern}. It returns @code{0} if they do match; otherwise, itreturns the nonzero value @code{FNM_NOMATCH}. The arguments@var{pattern} and @var{string} are both strings.The argument @var{flags} is a combination of flag bits that alter thedetails of matching. See below for a list of the defined flags.In the GNU C Library, @code{fnmatch} cannot experience an ``error''---italways returns an answer for whether the match succeeds. However, otherimplementations of @code{fnmatch} might sometimes report ``errors''.They would do so by returning nonzero values that are not equal to@code{FNM_NOMATCH}.@end deftypefunThese are the available flags for the @var{flags} argument:@table @code@comment fnmatch.h@comment GNU@item FNM_FILE_NAMETreat the @samp{/} character specially, for matching file names. Ifthis flag is set, wildcard constructs in @var{pattern} cannot match@samp{/} in @var{string}. Thus, the only way to match @samp{/} is withan explicit @samp{/} in @var{pattern}.@comment fnmatch.h@comment POSIX.2@item FNM_PATHNAMEThis is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2. Wedon't recommend this name because we don't use the term ``pathname'' forfile names.@comment fnmatch.h@comment POSIX.2@item FNM_PERIODTreat the @samp{.} character specially if it appears at the beginning of@var{string}. If this flag is set, wildcard constructs in @var{pattern}cannot match @samp{.} as the first character of @var{string}.If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then thespecial treatment applies to @samp{.} following @samp{/} as well as to@samp{.} at the beginning of @var{string}. (The shell uses the@code{FNM_PERIOD} and @code{FNM_FILE_NAME} falgs together for matchingfile names.)@comment fnmatch.h@comment POSIX.2@item FNM_NOESCAPEDon't treat the @samp{\} character specially in patterns. Normally,@samp{\} quotes the following character, turning off its special meaning(if any) so that it matches only itself. When quoting is enabled, thepattern @samp{\?} matches only the string @samp{?}, because the questionmark in the pattern acts like an ordinary character.If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character.@comment fnmatch.h@comment GNU@item FNM_LEADING_DIRIgnore a trailing sequence of characters starting with a @samp{/} in@var{string}; that is to say, test whether @var{string} starts with adirectory name that @var{pattern} matches.If this flag is set, either @samp{foo*} or @samp{foobar} as a patternwould match the string @samp{foobar/frobozz}.@comment fnmatch.h@comment GNU@item FNM_CASEFOLDIgnore case in comparing @var{string} to @var{pattern}.@end table@node Globbing@section Globbing@cindex globbingThe archetypal use of wildcards is for matching against the files in adirectory, and making a list of all the matches. This is called@dfn{globbing}.You could do this using @code{fnmatch}, by reading the directory entriesone by one and testing each one with @code{fnmatch}. But that would beslow (and complex, since you would have to handle subdirectories byhand).The library provides a function @code{glob} to make this particular useof wildcards convenient. @code{glob} and the other symbols in thissection are declared in @file{glob.h}.@menu* Calling Glob:: Basic use of @code{glob}.* Flags for Globbing:: Flags that enable various options in @code{glob}.@end menu@node Calling Glob@subsection Calling @code{glob}The result of globbing is a vector of file names (strings). To returnthis vector, @code{glob} uses a special data type, @code{glob_t}, whichis a structure. You pass @code{glob} the address of the structure, andit fills in the structure's fields to tell you about the results.@comment glob.h@comment POSIX.2@deftp {Data Type} glob_tThis data type holds a pointer to a word vector. More precisely, itrecords both the address of the word vector and its size.@table @code@item gl_pathcThe number of elements in the vector.@item gl_pathvThe address of the vector. This field has type @w{@code{char **}}.@item gl_offsThe offset of the first real element of the vector, from its nominaladdress in the @code{gl_pathv} field. Unlike the other fields, thisis always an input to @code{glob}, rather than an output from it.If you use a nonzero offset, then that many elements at the beginning ofthe vector are left empty. (The @code{glob} function fills them withnull pointers.)The @code{gl_offs} field is meaningful only if you use the@code{GLOB_DOOFFS} flag. Otherwise, the offset is always zeroregardless of what is in this field, and the first real element comes atthe beginning of the vector.@end table@end deftp@comment glob.h@comment POSIX.2@deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr})The function @code{glob} does globbing using the pattern @var{pattern}in the current directory. It puts the result in a newly allocatedvector, and stores the size and address of this vector into@code{*@var{vector-ptr}}. The argument @var{flags} is a combination ofbit flags; see @ref{Flags for Globbing}, for details of the flags.The result of globbing is a sequence of file names. The function@code{glob} allocates a string for each resulting word, thenallocates a vector of type @code{char **} to store the addresses ofthese strings. The last element of the vector is a null pointer.This vector is called the @dfn{word vector}.To return this vector, @code{glob} stores both its address and itslength (number of elements, not counting the terminating null pointer)into @code{*@var{vector-ptr}}.Normally, @code{glob} sorts the file names alphabetically before returning them. You can turn this off with the flag @code{GLOB_NOSORT}if you want to get the information as fast as possible. Usually it'sa good idea to let @code{glob} sort them---if you process the files inalphabetical order, the users will have a feel for the rate of progressthat your application is making.If @code{glob} succeeds, it returns 0. Otherwise, it returns oneof these error codes:@table @code@comment glob.h@comment POSIX.2@item GLOB_ABORTEDThere was an error opening a directory, and you used the flag@code{GLOB_ERR} or your specified @var{errfunc} returned a nonzerovalue.@iftexSee below@end iftex@ifinfo@xref{Flags for Globbing},@end ifinfofor an explanation of the @code{GLOB_ERR} flag and @var{errfunc}.@comment glob.h@comment POSIX.2@item GLOB_NOMATCHThe pattern didn't match any existing files. If you use the@code{GLOB_NOCHECK} flag, then you never get this error code, becausethat flag tells @code{glob} to @emph{pretend} that the pattern matchedat least one file.@comment glob.h@comment POSIX.2@item GLOB_NOSPACEIt was impossible to allocate memory to hold the result.@end tableIn the event of an error, @code{glob} stores information in@code{*@var{vector-ptr}} about all the matches it has found so far.@end deftypefun@node Flags for Globbing@subsection Flags for GlobbingThis section describes the flags that you can specify in the @var{flags} argument to @code{glob}. Choose the flags you want,and combine them with the C bitwise OR operator @code{|}.@table @code@comment glob.h@comment POSIX.2@item GLOB_APPENDAppend the words from this expansion to the vector of words produced byprevious calls to @code{glob}. This way you can effectively expandseveral words as if they were concatenated with spaces between them.In order for appending to work, you must not modify the contents of theword vector structure between calls to @code{glob}. And, if you set@code{GLOB_DOOFFS} in the first call to @code{glob}, you must alsoset it when you append to the results.Note that the pointer stored in @code{gl_pathv} may no longer be validafter you call @code{glob} the second time, because @code{glob} mighthave relocated the vector. So always fetch @code{gl_pathv} from the@code{glob_t} structure after each @code{glob} call; @strong{never} savethe pointer across calls.@comment glob.h@comment POSIX.2@item GLOB_DOOFFSLeave blank slots at the beginning of the vector of words.The @code{gl_offs} field says how many slots to leave.The blank slots contain null pointers.@comment glob.h@comment POSIX.2@item GLOB_ERRGive up right away and report an error if there is any difficultyreading the directories that must be read in order to expand @var{pattern}fully. Such difficulties might include a directory in which you don'thave the requisite access. Normally, @code{glob} tries its best to keepon going despite any errors, reading whatever directories it can.You can exercise even more control than this by specifying anerror-handler function @var{errfunc} when you call @code{glob}. If@var{errfunc} is not a null pointer, then @code{glob} doesn't give upright away when it can't read a directory; instead, it calls@var{errfunc} with two arguments, like this:@smallexample(*@var{errfunc}) (@var{filename}, @var{error-code})@end smallexample@noindentThe argument @var{filename} is the name of the directory that@code{glob} couldn't open or couldn't read, and @var{error-code} is the@code{errno} value that was reported to @code{glob}.If the error handler function returns nonzero, then @code{glob} gives upright away. Otherwise, it continues.@comment glob.h@comment POSIX.2@item GLOB_MARKIf the pattern matches the name of a directory, append @samp{/} to thedirectory's name when returning it.@comment glob.h@comment POSIX.2@item GLOB_NOCHECKIf the pattern doesn't match any file names, return the pattern itselfas if it were a file name that had been matched. (Normally, when thepattern doesn't match anything, @code{glob} returns that there were nomatches.)@comment glob.h@comment POSIX.2@item GLOB_NOSORTDon't sort the file names; return them in no particular order.(In practice, the order will depend on the order of the entries inthe directory.) The only reason @emph{not} to sort is to save time.@comment glob.h@comment POSIX.2@item GLOB_NOESCAPEDon't treat the @samp{\} character specially in patterns. Normally,@samp{\} quotes the following character, turning off its special meaning(if any) so that it matches only itself. When quoting is enabled, thepattern @samp{\?} matches only the string @samp{?}, because the questionmark in the pattern acts like an ordinary character.If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character.@code{glob} does its work by calling the function @code{fnmatch}repeatedly. It handles the flag @code{GLOB_NOESCAPE} by turning on the@code{FNM_NOESCAPE} flag in calls to @code{fnmatch}.@end table@node Regular Expressions@section Regular Expression MatchingThe GNU C library supports two interfaces for matching regularexpressions. One is the standard POSIX.2 interface, and the other iswhat the GNU system has had for many years.Both interfaces are declared in the header file @file{regex.h}.If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2functions, structures, and constants are declared.@c !!! we only document the POSIX.2 interface here!!@menu* POSIX Regexp Compilation:: Using @code{regcomp} to prepare to match.* Flags for POSIX Regexps:: Syntax variations for @code{regcomp}.* Matching POSIX Regexps:: Using @code{regexec} to match the compiled pattern that you get from @code{regcomp}.* Regexp Subexpressions:: Finding which parts of the string were matched.* Subexpression Complications:: Find points of which parts were matched.* Regexp Cleanup:: Freeing storage; reporting errors.@end menu@node POSIX Regexp Compilation@subsection POSIX Regular Expression CompilationBefore you can actually match a regular expression, you must@dfn{compile} it. This is not true compilation---it produces a specialdata structure, not machine instructions. But it is like ordinarycompilation in that its purpose is to enable you to ``execute'' thepattern fast. (@xref{Matching POSIX Regexps}, for how to use thecompiled regular expression for matching.)There is a special data type for compiled regular expressions:@comment regex.h@comment POSIX.2@deftp {Data Type} regex_tThis type of object holds a compiled regular expression.It is actually a structure. It has just one field that your programsshould look at:@table @code@item re_nsubThis field holds the number of parenthetical subexpressions in theregular expression that was compiled.@end tableThere are several other fields, but we don't describe them here, becauseonly the functions in the library should use them.@end deftpAfter you create a @code{regex_t} object, you can compile a regularexpression into it by calling @code{regcomp}.@comment regex.h@comment POSIX.2@deftypefun int regcomp (regex_t *@var{compiled}, const char *@var{pattern}, int @var{cflags})The function @code{regcomp} ``compiles'' a regular expression into adata structure that you can use with @code{regexec} to match against astring. The compiled regular expression format is designed forefficient matching. @code{regcomp} stores it into @code{*@var{compiled}}.It's up to you to allocate an object of type @code{regex_t} and pass itsaddress to @code{regcomp}.The argument @var{cflags} lets you specify various options that controlthe syntax and semantics of regular expressions. @xref{Flags for POSIXRegexps}.If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits fromthe compiled regular expression the information necessary to recordhow subexpressions actually match. In this case, you might as wellpass @code{0} for the @var{matchptr} and @var{nmatch} arguments whenyou call @code{regexec}.If you don't use @code{REG_NOSUB}, then the compiled regular expressiondoes have the capacity to record how subexpressions match. Also,
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -