?? flex.texi
字號:
@exampleDIGIT [0-9]ID [a-z][a-z0-9]*@end example@noindentdefines "DIGIT" to be a regular expression which matches asingle digit, and "ID" to be a regular expression whichmatches a letter followed by zero-or-moreletters-or-digits. A subsequent reference to@example@{DIGIT@}+"."@{DIGIT@}*@end example@noindentis identical to@example([0-9])+"."([0-9])*@end example@noindentand matches one-or-more digits followed by a '.' followedby zero-or-more digits.The @var{rules} section of the @code{flex} input contains a series ofrules of the form:@examplepattern action@end example@noindentwhere the pattern must be unindented and the action mustbegin on the same line.See below for a further description of patterns andactions.Finally, the user code section is simply copied to@file{lex.yy.c} verbatim. It is used for companion routineswhich call or are called by the scanner. The presence ofthis section is optional; if it is missing, the second @samp{%%}in the input file may be skipped, too.In the definitions and rules sections, any @emph{indented} text ortext enclosed in @samp{%@{} and @samp{%@}} is copied verbatim to theoutput (with the @samp{%@{@}}'s removed). The @samp{%@{@}}'s mustappear unindented on lines by themselves.In the rules section, any indented or %@{@} text appearingbefore the first rule may be used to declare variableswhich are local to the scanning routine and (after thedeclarations) code which is to be executed whenever thescanning routine is entered. Other indented or %@{@} textin the rule section is still copied to the output, but itsmeaning is not well-defined and it may well causecompile-time errors (this feature is present for @code{POSIX} compliance;see below for other such features).In the definitions section (but not in the rules section),an unindented comment (i.e., a line beginning with "/*")is also copied verbatim to the output up to the next "*/".@node Patterns, Matching, Format, Top@section PatternsThe patterns in the input are written using an extendedset of regular expressions. These are:@table @samp@item xmatch the character @samp{x}@item .any character (byte) except newline@item [xyz]a "character class"; in this case, the patternmatches either an @samp{x}, a @samp{y}, or a @samp{z}@item [abj-oZ]a "character class" with a range in it; matchesan @samp{a}, a @samp{b}, any letter from @samp{j} through @samp{o},or a @samp{Z}@item [^A-Z]a "negated character class", i.e., any characterbut those in the class. In this case, anycharacter EXCEPT an uppercase letter.@item [^A-Z\n]any character EXCEPT an uppercase letter ora newline@item @var{r}*zero or more @var{r}'s, where @var{r} is any regular expression@item @var{r}+one or more @var{r}'s@item @var{r}?zero or one @var{r}'s (that is, "an optional @var{r}")@item @var{r}@{2,5@}anywhere from two to five @var{r}'s@item @var{r}@{2,@}two or more @var{r}'s@item @var{r}@{4@}exactly 4 @var{r}'s@item @{@var{name}@}the expansion of the "@var{name}" definition(see above)@item "[xyz]\"foo"the literal string: @samp{[xyz]"foo}@item \@var{x}if @var{x} is an @samp{a}, @samp{b}, @samp{f}, @samp{n}, @samp{r}, @samp{t}, or @samp{v},then the ANSI-C interpretation of \@var{x}.Otherwise, a literal @samp{@var{x}} (used to escapeoperators such as @samp{*})@item \0a NUL character (ASCII code 0)@item \123the character with octal value 123@item \x2athe character with hexadecimal value @code{2a}@item (@var{r})match an @var{r}; parentheses are used to overrideprecedence (see below)@item @var{r}@var{s}the regular expression @var{r} followed by theregular expression @var{s}; called "concatenation"@item @var{r}|@var{s}either an @var{r} or an @var{s}@item @var{r}/@var{s}an @var{r} but only if it is followed by an @var{s}. The textmatched by @var{s} is included when determining whether this rule isthe @dfn{longest match}, but is then returned to the input beforethe action is executed. So the action only sees the text matchedby @var{r}. This type of pattern is called @dfn{trailing context}.(There are some combinations of @samp{@var{r}/@var{s}} that @code{flex}cannot match correctly; see notes in the Deficiencies / Bugs sectionbelow regarding "dangerous trailing context".)@item ^@var{r}an @var{r}, but only at the beginning of a line (i.e.,which just starting to scan, or right after anewline has been scanned).@item @var{r}$an @var{r}, but only at the end of a line (i.e., justbefore a newline). Equivalent to "@var{r}/\n".Note that flex's notion of "newline" is exactlywhatever the C compiler used to compile flexinterprets '\n' as; in particular, on some DOSsystems you must either filter out \r's in theinput yourself, or explicitly use @var{r}/\r\n for "r$".@item <@var{s}>@var{r}an @var{r}, but only in start condition @var{s} (seebelow for discussion of start conditions)<@var{s1},@var{s2},@var{s3}>@var{r}same, but in any of start conditions @var{s1},@var{s2}, or @var{s3}@item <*>@var{r}an @var{r} in any start condition, even an exclusive one.@item <<EOF>>an end-of-file<@var{s1},@var{s2}><<EOF>>an end-of-file when in start condition @var{s1} or @var{s2}@end tableNote that inside of a character class, all regularexpression operators lose their special meaning except escape('\') and the character class operators, '-', ']', and, atthe beginning of the class, '^'.The regular expressions listed above are grouped accordingto precedence, from highest precedence at the top tolowest at the bottom. Those grouped together have equalprecedence. For example,@examplefoo|bar*@end example@noindentis the same as@example(foo)|(ba(r*))@end example@noindentsince the '*' operator has higher precedence thanconcatenation, and concatenation higher than alternation ('|').This pattern therefore matches @emph{either} the string "foo" @emph{or}the string "ba" followed by zero-or-more r's. To match"foo" or zero-or-more "bar"'s, use:@examplefoo|(bar)*@end example@noindentand to match zero-or-more "foo"'s-or-"bar"'s:@example(foo|bar)*@end exampleIn addition to characters and ranges of characters,character classes can also contain character class@dfn{expressions}. These are expressions enclosed inside @samp{[}: and @samp{:}]delimiters (which themselves must appear between the '['and ']' of the character class; other elements may occurinside the character class, too). The valid expressionsare:@example[:alnum:] [:alpha:] [:blank:][:cntrl:] [:digit:] [:graph:][:lower:] [:print:] [:punct:][:space:] [:upper:] [:xdigit:]@end exampleThese expressions all designate a set of charactersequivalent to the corresponding standard C @samp{isXXX} function. Forexample, @samp{[:alnum:]} designates those characters for which@samp{isalnum()} returns true - i.e., any alphabetic or numeric.Some systems don't provide @samp{isblank()}, so flex defines@samp{[:blank:]} as a blank or a tab.For example, the following character classes are allequivalent:@example[[:alnum:]][[:alpha:][:digit:][[:alpha:]0-9][a-zA-Z0-9]@end exampleIf your scanner is case-insensitive (the @samp{-i} flag), then@samp{[:upper:]} and @samp{[:lower:]} are equivalent to @samp{[:alpha:]}.Some notes on patterns:@itemize -@itemA negated character class such as the example"[^A-Z]" above @emph{will match a newline} unless "\n" (or anequivalent escape sequence) is one of thecharacters explicitly present in the negated characterclass (e.g., "[^A-Z\n]"). This is unlike how manyother regular expression tools treat negatedcharacter classes, but unfortunately the inconsistencyis historically entrenched. Matching newlinesmeans that a pattern like [^"]* can match theentire input unless there's another quote in theinput.@itemA rule can have at most one instance of trailingcontext (the '/' operator or the '$' operator).The start condition, '^', and "<<EOF>>" patternscan only occur at the beginning of a pattern, and,as well as with '/' and '$', cannot be groupedinside parentheses. A '^' which does not occur atthe beginning of a rule or a '$' which does notoccur at the end of a rule loses its specialproperties and is treated as a normal character.The following are illegal:@examplefoo/bar$<sc1>foo<sc2>bar@end exampleNote that the first of these, can be written"foo/bar\n".The following will result in '$' or '^' beingtreated as a normal character:@examplefoo|(bar$)foo|^bar@end exampleIf what's wanted is a "foo" or abar-followed-by-a-newline, the following could be used (the special'|' action is explained below):@examplefoo |bar$ /* action goes here */@end exampleA similar trick will work for matching a foo or abar-at-the-beginning-of-a-line.@end itemize@node Matching, Actions, Patterns, Top@section How the input is matchedWhen the generated scanner is run, it analyzes its inputlooking for strings which match any of its patterns. Ifit finds more than one match, it takes the one matchingthe most text (for trailing context rules, this includesthe length of the trailing part, even though it will thenbe returned to the input). If it finds two or morematches of the same length, the rule listed first in the@code{flex} input file is chosen.Once the match is determined, the text corresponding tothe match (called the @var{token}) is made available in theglobal character pointer @code{yytext}, and its length in theglobal integer @code{yyleng}. The @var{action} corresponding to thematched pattern is then executed (a more detaileddescription of actions follows), and then the remaining input isscanned for another match.If no match is found, then the @dfn{default rule} is executed:the next character in the input is considered matched andcopied to the standard output. Thus, the simplest legal@code{flex} input is:@example%%@end examplewhich generates a scanner that simply copies its input(one character at a time) to its output.Note that @code{yytext} can be defined in two different ways:either as a character @emph{pointer} or as a character @emph{array}.You can control which definition @code{flex} uses by includingone of the special directives @samp{%pointer} or @samp{%array} in thefirst (definitions) section of your flex input. Thedefault is @samp{%pointer}, unless you use the @samp{-l} lexcompatibility option, in which case @code{yytext} will be an array. Theadvantage of using @samp{%pointer} is substantially fasterscanning and no buffer overflow when matching very largetokens (unless you run out of dynamic memory). Thedisadvantage is that you are restricted in how your actions canmodify @code{yytext} (see the next section), and calls to the@samp{unput()} function destroys the present contents of @code{yytext},which can be a considerable porting headache when movingbetween different @code{lex} versions.The advantage of @samp{%array} is that you can then modify @code{yytext}to your heart's content, and calls to @samp{unput()} do notdestroy @code{yytext} (see below). Furthermore, existing @code{lex}programs sometimes access @code{yytext} externally usingdeclarations of the form:@exampleextern char yytext[];@end exampleThis definition is erroneous when used with @samp{%pointer}, butcorrect for @samp{%array}.@samp{%array} defines @code{yytext} to be an array of @code{YYLMAX} characters,which defaults to a fairly large value. You can changethe size by simply #define'ing @code{YYLMAX} to a different valuein the first section of your @code{flex} input. As mentionedabove, with @samp{%pointer} yytext grows dynamically toaccommodate large tokens. While this means your @samp{%pointer} scannercan accommodate very large tokens (such as matching entireblocks of comments), bear in mind that each time thescanner must resize @code{yytext} it also must rescan the entiretoken from the beginning, so matching such tokens canprove slow. @code{yytext} presently does @emph{not} dynamically grow ifa call to @samp{unput()} results in too much text being pushedback; instead, a run-time error results.Also note that you cannot use @samp{%array} with C++ scannerclasses (the @code{c++} option; see below).@node Actions, Generated scanner, Matching, Top@section ActionsEach pattern in a rule has a corresponding action, whichcan be any arbitrary C statement. The pattern ends at thefirst non-escaped whitespace character; the remainder ofthe line is its action. If the action is empty, then whenthe pattern is matched the input token is simplydiscarded. For example, here is the specification for aprogram which deletes all occurrences of "zap me" from itsinput:@example%%"zap me"@end example(It will copy all other characters in the input to theoutput since they will be matched by the default rule.)Here is a program which compresses multiple blanks andtabs down to a single blank, and throws away whitespacefound at the end of a line:@example%%[ \t]+ putchar( ' ' );[ \t]+$ /* ignore this token */@end exampleIf the action contains a '@{', then the action spans tillthe balancing '@}' is found, and the action may crossmultiple lines. @code{flex} knows about C strings and comments andwon't be fooled by braces found within them, but also
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -