?? yacc-docs.txt
字號(hào):
can be structured according to these rules in two ways:
IF ( C1 ) {
IF ( C2 ) S1
}
ELSE S2
or
IF ( C1 ) {
IF ( C2 ) S1
ELSE S2
}
The second interpretation is the one given in most programming
languages having this construct. Each ELSE is associated with
the last preceding ``un-ELSE'd'' IF. In this example, consider
the situation where the parser has seen
IF ( C1 ) IF ( C2 ) S1
and is looking at the ELSE. It can immediately reduce by the
simple-if rule to get
IF ( C1 ) stat
and then read the remaining input,
ELSE S2
and reduce
IF ( C1 ) stat ELSE S2
by the if-else rule. This leads to the first of the above group-
ings of the input.
On the other hand, the ELSE may be shifted, S2 read, and
then the right hand portion of
IF ( C1 ) IF ( C2 ) S1 ELSE S2
can be reduced by the if-else rule to get
Yacc: Yet Another Compiler-Compiler PS1:15-19
IF ( C1 ) stat
which can be reduced by the simple-if rule. This leads to the
second of the above groupings of the input, which is usually
desired.
Once again the parser can do two valid things - there is a
shift/reduce conflict. The application of disambiguating rule 1
tells the parser to shift in this case, which leads to the
desired grouping.
This shift/reduce conflict arises only when there is a par-
ticular current input symbol, ELSE, and particular inputs already
seen, such as
IF ( C1 ) IF ( C2 ) S1
In general, there may be many conflicts, and each one will be
associated with an input symbol and a set of previously read
inputs. The previously read inputs are characterized by the
state of the parser.
The conflict messages of Yacc are best understood by examin-
ing the verbose (-v) option output file. For example, the output
corresponding to the above conflict state might be:
23: shift/reduce conflict (shift 45, reduce 18) on ELSE
state 23
stat : IF ( cond ) stat_ (18)
stat : IF ( cond ) stat_ELSE stat
ELSE shift 45
. reduce 18
The first line describes the conflict, giving the state and the
input symbol. The ordinary state description follows, giving the
grammar rules active in the state, and the parser actions.
Recall that the underline marks the portion of the grammar rules
which has been seen. Thus in the example, in state 23 the parser
has seen input corresponding to
IF ( cond ) stat
and the two grammar rules shown are active at this time. The
parser can do two possible things. If the input symbol is ELSE,
it is possible to shift into state 45. State 45 will have, as
part of its description, the line
stat : IF ( cond ) stat ELSE_stat
PS1:15-20 Yacc: Yet Another Compiler-Compiler
since the ELSE will have been shifted in this state. Back in
state 23, the alternative action, described by ``.'', is to be
done if the input symbol is not mentioned explicitly in the above
actions; thus, in this case, if the input symbol is not ELSE, the
parser reduces by grammar rule 18:
stat : IF '(' cond ')' stat
Once again, notice that the numbers following ``shift'' commands
refer to other states, while the numbers following ``reduce''
commands refer to grammar rule numbers. In the y.output file,
the rule numbers are printed after those rules which can be
reduced. In most one states, there will be at most reduce action
possible in the state, and this will be the default command. The
user who encounters unexpected shift/reduce conflicts will prob-
ably want to look at the verbose output to decide whether the
default actions are appropriate. In really tough cases, the user
might need to know more about the behavior and construction of
the parser than can be covered here. In this case, one of the
theoretical references[2, 3, 4] might be consulted; the services
of a local guru might also be appropriate.
6: Precedence
There is one common situation where the rules given above
for resolving conflicts are not sufficient; this is in the pars-
ing of arithmetic expressions. Most of the commonly used con-
structions for arithmetic expressions can be naturally described
by the notion of precedence levels for operators, together with
information about left or right associativity. It turns out that
ambiguous grammars with appropriate disambiguating rules can be
used to create parsers that are faster and easier to write than
parsers constructed from unambiguous grammars. The basic notion
is to write grammar rules of the form
expr : expr OP expr
and
expr : UNARY expr
for all binary and unary operators desired. This creates a very
ambiguous grammar, with many parsing conflicts. As disambiguat-
ing rules, the user specifies the precedence, or binding
strength, of all the operators, and the associativity of the
binary operators. This information is sufficient to allow Yacc
to resolve the parsing conflicts in accordance with these rules,
and construct a parser that realizes the desired precedences and
associativities.
The precedences and associativities are attached to tokens
in the declarations section. This is done by a series of lines
beginning with a Yacc keyword: %left, %right, or %nonassoc, fol-
lowed by a list of tokens. All of the tokens on the same line
Yacc: Yet Another Compiler-Compiler PS1:15-21
are assumed to have the same precedence level and associativity;
the lines are listed in order of increasing precedence or binding
strength. Thus,
%left '+' '-'
%left '*' '/'
describes the precedence and associativity of the four arithmetic
operators. Plus and minus are left associative, and have lower
precedence than star and slash, which are also left associative.
The keyword %right is used to describe right associative opera-
tors, and the keyword %nonassoc is used to describe operators,
like the operator .LT. in Fortran, that may not associate with
themselves; thus,
A .LT. B .LT. C
is illegal in Fortran, and such an operator would be described
with the keyword %nonassoc in Yacc. As an example of the
behavior of these declarations, the description
%right '='
%left '+' '-'
%left '*' '/'
%%
expr : expr '=' expr
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| NAME
;
might be used to structure the input
a = b = c*d - e - f*g
as follows:
a = ( b = ( ((c*d)-e) - (f*g) ) )
When this mechanism is used, unary operators must, in general, be
given a precedence. Sometimes a unary operator and a binary
operator have the same symbolic representation, but different
precedences. An example is unary and binary '-'; unary minus may
be given the same strength as multiplication, or even higher,
while binary minus has a lower strength than multiplication. The
keyword, %prec, changes the precedence level associated with a
particular grammar rule. %prec appears immediately after the
body of the grammar rule, before the action or closing semicolon,
and is followed by a token name or literal. It causes the pre-
cedence of the grammar rule to become that of the following token
PS1:15-22 Yacc: Yet Another Compiler-Compiler
name or literal. For example, to make unary minus have the same
precedence as multiplication the rules might resemble:
%left '+' '-'
%left '*' '/'
%%
expr : expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '-' expr %prec '*'
| NAME
;
A token declared by %left, %right, and %nonassoc need not
be, but may be, declared by %token as well.
The precedences and associativities are used by Yacc to
resolve parsing conflicts; they give rise to disambiguating
rules. Formally, the rules work as follows:
1. The precedences and associativities are recorded for those
tokens and literals that have them.
2. A precedence and associativity is associated with each gram-
mar rule; it is the precedence and associativity of the last
token or literal in the body of the rule. If the %prec con-
struction is used, it overrides this default. Some grammar
rules may have no precedence and associativity associated
with them.
3. When there is a reduce/reduce conflict, or there is a
shift/reduce conflict and either the input symbol or the
grammar rule has no precedence and associativity, then the
two disambiguating rules given at the beginning of the sec-
tion are used, and the conflicts are reported.
4. If there is a shift/reduce conflict, and both the grammar
rule and the input character have precedence and associa-
tivity associated with them, then the conflict is resolved
in favor of the action (shift or reduce) associated with the
higher precedence. If the precedences are the same, then
the associativity is used; left associative implies reduce,
right associative implies shift, and nonassociating implies
error.
Conflicts resolved by precedence are not counted in the
number of shift/reduce and reduce/reduce conflicts reported by
Yacc. This means that mistakes in the specification of pre-
cedences may disguise errors in the input grammar; it is a good
idea to be sparing with precedences, and use them in an
Yacc: Yet Another Compiler-Compiler PS1:15-23
essentially ``cookbook'' fashion, until some experience has been
gained. The y.output file is very useful in deciding whether the
parser is actually doing what was intended.
7: Error Handling
Error handling is an extremely difficult area, and many of
the problems are semantic ones. When an error is found, for
example, it may be necessary to reclaim parse tree storage,
delete or alter symbol table entries, and, typically, set
switches to avoid generating any further output.
It is seldom acceptable to stop all processing when an error
is found; it is more useful to continue scanning the input to
find further syntax errors. This leads to the problem of getting
the parser ``restarted'' after an error. A general class of
algorithms to do this involves discarding a number of tokens from
the input string, and attempting to adjust the parser so that
input can continue.
To allow the user some control over this process, Yacc pro-
vides a simple, but reasonably general, feature. The token name
``error'' is reserved for error handling. This name can be used
in grammar rules; in effect, it suggests places where errors are
expected, and recovery might take place. The parser pops its
stack until it enters a state where the token ``error'' is legal.
It then behaves as if the token ``error'' were the current looka-
head token, and performs the action encountered. The lookahead
token is then reset to the token that caused the error. If no
special error rules have been specified, the processing halts
when an error is detected.
In order to prevent a cascade of error messages, the parser,
after detecting an error, remains in error state until three
tokens have been successfully read and shifted. If an error is
detected when the parser is already in error state, no message is
given, and the input token is quietly deleted.
As an example, a rule of the form
stat : error
would, in effect, mean that on a syntax error the parser would
attempt to skip over the statement in which the error was seen.
More precisely, the parser will scan ahead, looking for three
tokens that might legally follow a statement, and start process-
ing at the first of these; if the beginnings of statements are
not sufficiently distinctive, it may make a false start in the
middle of a statement, and end up reporting a second error where
there is in fact no error.
Actions may be used with these special error rules. These
actions might attempt to reinitialize tables, reclaim symbol
table space, etc.
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -