?? pat5c.htm
字號:
<A NAME="auto1055"></A><LI><EM>Complex grammars are hard to maintain.</EM>The Interpreter pattern defines at least one class for every rulein the grammar (grammar rules defined using BNF may require multipleclasses). Hence grammars containing many rules can be hard tomanage and maintain. Other design patterns can be applied tomitigate the problem (see <A HREF="#implementation">Implementation</A>).But when the grammar is very complex, other techniques such asparser or compiler generators are more appropriate.</LI><A NAME="auto1056"></A><P></P><A NAME="auto1057"></A><LI><EM>Adding new ways to interpret expressions.</EM>The Interpreter pattern makes it easier to evaluate an expression in anew way. For example, you can support pretty printing ortype-checking an expression by defining a new operation on theexpression classes. If you keep creating new ways of interpreting anexpression, then consider using the <A HREF="pat5kfs.htm" TARGET="_mainDisplayFrame">Visitor (331)</A>pattern to avoid changing the grammar classes.</LI></OL><A NAME="implementation"></A><H2><A HREF="#samplecode"><IMG SRC="gifsb/down3.gif" BORDER=0 ALT="next: Sample Code"></A> Implementation</H2> <A NAME="auto1058"></A><P>The Interpreter and <A HREF="pat4cfs.htm" TARGET="_mainDisplayFrame">Composite (163)</A>patterns share many implementation issues. The following issuesare specific to Interpreter:</P><OL><A NAME="auto1059"></A><LI><EM>Creating the abstract syntax tree.</EM>The Interpreter pattern doesn't explain how to <EM>create</EM> anabstract syntax tree. In other words, it doesn't address parsing.The abstract syntax tree can be created by a table-driven parser, by ahand-crafted (usually recursive descent) parser, or directly by theclient.</LI><A NAME="auto1060"></A><P></P><A NAME="variable-w-interp"></A><LI><EM>Defining the Interpret operation.</EM>You don't have to define the Interpret operation in the expressionclasses. If it's common to create a new interpreter, then it's betterto use the <A HREF="pat5kfs.htm" TARGET="_mainDisplayFrame">Visitor (331)</A> pattern to put Interpret in aseparate "visitor" object. For example, a grammar for a programminglanguage will have many operations on abstract syntax trees, such asas type-checking, optimization, code generation, and so on. It will bemore likely to use a visitor to avoid defining these operations onevery grammar class.</LI><A NAME="auto1061"></A><P></P><A NAME="flywt-w-interp"></A><A NAME="term-symb-w-flywt"></A><LI><EM>Sharing terminal symbols with the Flyweight pattern.</EM>Grammars whose sentences contain many occurrences of a terminal symbolmight benefit from sharing a single copy of that symbol. Grammars forcomputer programs are good examples—each program variable willappear in many places throughout the code. In the Motivation example,a sentence can have the terminal symbol <CODE>dog</CODE> (modeled by theLiteralExpression class) appearing many times.<A NAME="auto1062"></A><P>Terminal nodes generally don't store information about their positionin the abstract syntax tree. Parent nodes pass them whatever contextthey need during interpretation. Hence there is a distinction betweenshared (intrinsic) state and passed-in (extrinsic) state, and the<A HREF="pat4ffs.htm" TARGET="_mainDisplayFrame">Flyweight (195</A>) pattern applies.</P><A NAME="auto1063"></A><P>For example, each instance of LiteralExpression for <CODE>dog</CODE>receives a context containing the substring matched so far. And everysuch LiteralExpression does the same thing in its Interpretoperation—it checks whether the next part of the input contains a<CODE>dog</CODE>—no matter where the instance appears in the tree.</P></LI></OL><A NAME="samplecode"><A><H2><A HREF="#knownuses"><IMG SRC="gifsb/down3.gif" BORDER=0 ALT="next: Known Uses"></A> Sample Code</H2> <A NAME="auto1064"></A><P>Here are two examples. The first is a complete example in Smalltalkfor checking whether a sequence matches a regular expression. Thesecond is a C++ program for evaluating Boolean expressions.</P><A NAME="Smalltalk_example_in_Interpreter"></A><P>The regular expression matcher tests whether a string is in thelanguage defined by the regular expression. The regular expression isdefined by the following grammar:</P><A NAME="auto1065"></A><PRE> expression ::= literal | alternation | sequence | repetition | '(' expression ')' alternation ::= expression '|' expression sequence ::= expression '&' expression repetition ::= expression 'repeat' literal ::= 'a' | 'b' | 'c' | ... { 'a' | 'b' | 'c' | ... }*</PRE><A NAME="auto1066"></A><P>This grammar is a slight modification of the Motivation example. Wechanged the concrete syntax of regular expressions a little, becausesymbol "<CODE>*</CODE>" can't be a postfix operation in Smalltalk. Sowe use <CODE>repeat</CODE> instead. For example, the regular expression</P><A NAME="auto1067"></A><PRE> (('dog ' | 'cat ') repeat & 'weather')</PRE><A NAME="auto1068"></A><P>matches the input string "<CODE>dog dog cat weather</CODE>".</P><A NAME="auto1069"></A><P>To implement the matcher, we define the five classes described on<A HREF="#auto1006">page 243</A>. The class<CODE>SequenceExpression</CODE> has instance variables<CODE>expression1</CODE> and <CODE>expression2</CODE> for its childrenin the abstract syntax tree. <CODE>AlternationExpression</CODE>stores its alternatives in the instance variables<CODE>alternative1</CODE> and <CODE>alternative2</CODE>, while<CODE>RepetitionExpression</CODE> holds the expression it repeats in its<CODE>repetition</CODE> instance variable.LiteralExpression has a <CODE>components</CODE> instance variable thatholds a list of objects (probably characters). These represent the literalstring that must match the input sequence.</P><A NAME="auto1070"></A><P>The <CODE>match:</CODE> operation implements an interpreter for theregular expression. Each of the classes defining the abstract syntaxtree implements this operation. It takes<CODE>inputState</CODE> as an argument representing the current stateof the matching process, having read part of the input string.</P><A NAME="auto1071"></A><P>This current state is characterized by a set of input streamsrepresenting the set of inputs that the regular expression could haveaccepted so far. (This is roughly equivalent to recording all statesthat the equivalent finite state automata would be in, havingrecognized the input stream to this point).</P><A NAME="auto1072"></A><P>The current state is most important to the <CODE>repeat</CODE> operation.For example, if the regular expression were</P><A NAME="auto1073"></A><PRE> 'a' repeat</PRE><A NAME="auto1074"></A><P>then the interpreter could match "<CODE>a</CODE>", "<CODE>aa</CODE>","<CODE>aaa</CODE>", and so on. If it were</P><A NAME="auto1075"></A><PRE> 'a' repeat & 'bc'</PRE><A NAME="auto1076"></A><P>then it could match "<CODE>abc</CODE>", "<CODE>aabc</CODE>","<CODE>aaabc</CODE>", and so on. But if the regular expression were</P><A NAME="auto1077"></A><PRE> 'a' repeat & 'abc'</PRE><A NAME="auto1078"></A><P>then matching the input "<CODE>aabc</CODE>" against the subexpression"<CODE>'a' repeat</CODE>" would yield two input streams, one having matchedone character of the input, and the other having matched twocharacters. Only the stream that has accepted one character willmatch the remaining "<CODE>abc</CODE>".</P><A NAME="seqexp-smalltk"></A><P>Now we consider the definitions of <CODE>match:</CODE> for each classdefining the regular expression. The definition for<CODE>SequenceExpression</CODE> matches each of its subexpressions insequence. Usually it will eliminate input streams from its<CODE>inputState</CODE>.</P><A NAME="auto1079"></A><PRE> match: inputState ^ expression2 match: (expression1 match: inputState).</PRE><A NAME="smallaltexp"></A><P>An <CODE>AlternationExpression</CODE> will return a state that consistsof the union of states from either alternative. The definition of<CODE>match:</CODE> for <CODE>AlternationExpression</CODE> is</P><A NAME="auto1080"></A><PRE> match: inputState | finalState | finalState := alternative1 match: inputState. finalState addAll: (alternative2 match: inputState). ^ finalState</PRE><A NAME="repeatexp-smalltk"></A><P>The <CODE>match:</CODE> operation for <CODE>RepetitionExpression</CODE>tries to find as many states that could match as possible:</P><A NAME="auto1081"></A><PRE> match: inputState | aState finalState | aState := inputState. finalState := inputState copy. [aState isEmpty] whileFalse: [aState := repetition match: aState. finalState addAll: aState]. ^ finalState</PRE><A NAME="auto1082"></A><P>Its output state usually contains more states than its input state,because a <CODE>RepetitionExpression</CODE> can match one, two, or manyoccurrences of <CODE>repetition</CODE> on the input state. The outputstates represent all these possibilities, allowing subsequent elementsof the regular expression to decide which state is the correct one.</P><A NAME="literal-small"></A><P>Finally, the definition of <CODE>match:</CODE> for<CODE>LiteralExpression</CODE> tries to match its components against eachpossible input stream. It keeps only those input streams that have amatch:</P><A NAME="auto1083"></A><PRE> match: inputState | finalState tStream | finalState := Set new. inputState do: [:stream | tStream := stream copy. (tStream nextAvailable: components size ) = components ifTrue: [finalState add: tStream] ]. ^ finalState</PRE><A NAME="auto1084"></A><P>The <CODE>nextAvailable:</CODE> message advances the input stream. Thisis the only <CODE>match:</CODE> operation that advances the stream.Notice how the state that's returned contains a copy of the inputstream, thereby ensuring that matching a literal never changes theinput stream. This is important because each alternative of an<CODE>AlternationExpression</CODE> should see identical copies ofthe input stream.</P><A NAME="abssyntree2"></A><P>Now that we've defined the classes that make up an abstract syntaxtree, we can describe how to build it.Rather than write a parser for regular expressions, we'll definesome operations on the <CODE>RegularExpression</CODE> classes so thatevaluating a Smalltalk expression will produce an abstract syntax treefor the corresponding regular expression. That lets us use thebuilt-in Smalltalk compiler as if it were a parser for regularexpressions.</P><A NAME="buildabssyn"></A><P>To build the abstract syntax tree, we'll need to define"<CODE>|</CODE>", "<CODE>repeat</CODE>", and "<CODE>&</CODE>" asoperations on <CODE>RegularExpression</CODE>. These operations aredefined in class <CODE>RegularExpression</CODE> like this:</P><A NAME="auto1085"></A><PRE> & aNode ^ SequenceExpression new expression1: self expression2: aNode asRExp repeat ^ RepetitionExpression new repetition: self | aNode ^ AlternationExpression new alternative1: self alternative2: aNode asRExp asRExp ^ self</PRE><A NAME="auto1086"></A><P>The <CODE>asRExp</CODE> operation will convert literals into<CODE>RegularExpressions</CODE>. These operations are defined in class<CODE>String</CODE>:</P><A NAME="auto1087"></A><PRE> & aNode ^ SequenceExpression new expression1: self asRExp expression2: aNode asRExp repeat ^ RepetitionExpression new repetition: self | aNode ^ AlternationExpression new alternative1: self asRExp alternative2: aNode asRExp asRExp ^ LiteralExpression new components: self</PRE><A NAME="smalltkv-use-interp"></A><P>If we defined these operations higher up in the class hierarchy(<CODE>SequenceableCollection</CODE> in Smalltalk-80,<CODE>IndexedCollection</CODE> in Smalltalk/V), then they wouldalso be defined for classes such as <CODE>Array</CODE> and<CODE>OrderedCollection</CODE>. This would letregular expressions match sequences of any kind of object.</P><A NAME="auto1088"></A><P>The second example is a system for manipulating and evaluatingBoolean expressions implemented in C++. The terminal symbols in thislanguage are Boolean variables, that is, the constants
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -