?? buzzard2.des
字號:
FIRST & THIRD almost FORTH FORTH is a language mostly familiar to users of "small" machines.FORTH programs are small because they are interpreted--a functioncall in FORTH takes two bytes. FORTH is an extendable language--built-in primitives are indistinguishable from user-defined_words_. FORTH interpreters are small because much of the systemcan be coded in FORTH--only a small number of primitives need tobe implemented. Some FORTH interpreters can also compile definedwords into machine code, resulting in a fast system. FIRST is an incredibly small language which is sufficient fordefining the language THIRD, which is mostly like FORTH. There aresome differences, and THIRD is probably just enough like FORTH forthose differences to be disturbing to regular FORTH users. The only existing FIRST interpreter is written in obfuscated C,and rings in at under 800 bytes of source code, although throughdeletion of whitespace and unobfuscation it can be brought to about650 bytes. This document FIRST defines the FIRST environment and primitives,with relevent design decision explanations. It secondly documentsthe general strategies we will use to implement THIRD. The THIRDsection demonstrates how the complete THIRD system is built upusing FIRST.Section 1: FIRSTEnvironment FIRST implements a virtual machine. The machine has three chunksof memory: "main memory", "the stack", and "string storage". Whenthe virtual machine wishes to do random memory accesses, they comeout of main memory--it cannot access the stack or string storage. The stack is simply a standard LIFO data structure that is usedimplicitly by most of the FIRST primitives. The stack is made upof ints, whatever size they are on the host machine. String storage is used to store the names of built-in and definedprimitives. Separate storage is used for these because it allowsthe C code to use C string operations, reducing C source code size. Main memory is a large array of ints. When we speak ofaddresses, we actually mean indices into main memory. Main memoryis used for two things, primarily: the return stack and the dictionary. The return stack is a LIFO data structure, independent ofthe abovementioned "the stack", which is used by FIRST to keeptrack of function call return addresses. The dictionary is a list of words. Each word contains a headerand a data field. In the header is the address of the previous word,an index into the string storage indicating where the name of thisword is stored, and a "code pointer". The code pointer is simplyan integer which names which "machine-language-primitive" implementsthis instruction. For example, for defined words the code pointernames the "run some code" primitive, which pushes the current programcounter onto the return stack and sets the counter to the address ofthe data field for this word. There are several important pointers into main memory. There isa pointer to the most recently defined word, which is used to startsearches back through memory when compiling words. There is a pointerto the top of the return stack. There is a pointer to the currentend of the dictionary, used while compiling. For the last two pointers, namely the return stack pointer andthe dictionary pointer, there is an important distinction: the pointersthemselves are stored in main memory (in FIRST's main memory). Thisis critical, because it means FIRST programs can get at them withoutany further primitives needing to be defined.Instructions There are two kinds of FIRST instructions, normal instructions andimmediate instructions. Immediate instructions do something significantwhen they are used. Normal instructions compile a pointer to theirexecutable part onto the end of the dictionary. As we will see, thismeans that by default FIRST simply compiles things. Integer OperationsSymbol Name Function - binary minus pop top 2 elements of stack, subtract, push * multiply pop top 2 elements of stack, multiply, push / divide pop top 2 elements of stack, divide, push <0 less than 0 pop top element of stack, push 1 if < 0 else 0Note that we can synthesize addition and negation from binary minus,but we cannot synthesize a time efficient divide or multiply from it.<0 is synthesizable, but only nonportably. Memory OperationsSymbol Name Function @ fetch pop top of stack, treat as address to push contents of ! store top of stack is address, 2nd is value; store to memory and pop both off the stack Input/Output OperationsName Functionecho output top of stack through C's putchar()key push C's getchar() onto top of stack_read read a space-delimited word, find it in the dictionary, and compile a pointer to that word's code pointer onto the current end of the dictionaryAlthough _read could be synthesized from key, we need _read to be ableto compile words to be able to start any syntheses. Execution OperationsName Functionexit leave the current function: pop the return stack into the program counter Immediate (compilation) OperationsSymbol Name Function : define read in the next space-delimited word, add it to the end of our string storage, and generate a header for the new word so that when it is typed it compiles a pointer to itself so that it can be executed.immediate immediate when used immediately after a name following a ':', makes the word being defined run whenever it is typed.: cannot be synthesized, because we could not synthesize anything.immediate has to be an immediate operation, so it could not besynthesized unless by default operations were immediate; but thatwould preclude our being able to do any useful compilation. Stack OperationsName Functionpick pop top of stack, use as index into stack and copy up that elementIf the data stack were stored in main memory, we could synthesize pick;but putting the stack and stack pointer in main memory would significantlyincrease the C source code size. There are three more primitives, but they are "internal only"--they have no names and no dictionary entries. The first is"pushint". It takes the next integer out of the instruction streamand pushes it on the stack. This could be synthesized, but probablynot without using integer constants. It is generated by _read whenthe input is not a known word. The second is "compile me". Whenthis instruction is executed, a pointer to the word's data field isappended to the dictionary. The third is "run me"--the word's datafield is taken to be a stream of pointers to words, and is executed. One last note about the environment: FIRST builds a very smallword internally that it executes as its main loop. This word calls_read and then calls itself. Each time it calls itself, it usesup a word on the return stack, so it will eventually trash things.This is discussed some more in section 2.Here's a handy summary of all the FIRST words: - * / binary integer operations on the stack <0 is top of stack less than 0? @ ! read from or write to memory echo key output or input one character _read read a word from input and compile a pointer to it exit stop running the current function : compile the header of a definition immediate modify the header to create an immediate word Here is a sample FIRST program. I'm assuming you're usingthe ASCII character set. FIRST does not depend upon ASCII, butsince FIRST has no syntax for character constants, one normally hasto use decimal values. This can be gotten around using getchar, though.Oh. One other odd thing. FIRST initially builds its symbol tableby calling : several times, so it needs to get the names of the basesymbols as its first 13 words of input. You could even name themdifferently if you wanted. These FIRST programs have FORTH comments in them: they are containedinside parentheses. FIRST programs cannot have FORTH comments; but I needsome device to indicate what's going on. (THIRD programs are an entirelydifferent subject.) ( Our first line gives the symbols for the built-ins ): immediate _read @ ! - * / <0 exit echo key _pick ( now we define a simple word that will print out a couple characters ): L ( define a word named 'L' ) 108 echo ( output an ascii 'l' ) exit: hello ( define a word named 'hello') 72 echo ( output an ascii 'H' ) 101 echo ( output an ascii 'e' ) 111 ( push ascii 'o' onto the stack ) L L ( output two ascii 'l's ) echo ( output the 'o' we pushed on the stack before ) 10 echo ( print a newline ) exit ( stop running this routine ): test immediate ( define a word named 'test' that runs whenever typed ) hello ( call hello ) exittest( The result of running this program should be: Hello)Section 2: Motivating THIRD What is missing from FIRST? There are a large number ofimportant primitives that aren't implemented, but which areeasy to implement. drop , which throws away the top of thestack, can be implemented as { 0 * + } -- that is, multiplythe top of the stack by 0 (which turns the top of the stackinto a 0), and then add the top two elements of the stack. dup , which copies the top of the stack, can be easilyimplemented using temporary storage locations. Conveniently,FIRST leaves memory locations 3, 4, and 5 unused. So we canimplement dup by writing the top of stack into 3, and thenreading it out twice: { 3 ! 3 @ 3 @ }. we will never use the FIRST primitive 'pick' in building THIRD,just to show that it can be done; 'pick' is only provided becausepick itself cannot be built out of the rest of FIRST's buildingblocks. So, instead of worrying about stack primitives and thelike, what else is missing from FIRST? We get recursion, butno control flow--no conditional operations. We cannot at themoment write a looping routine which terminates. Another glaring dissimilarity between FIRST and FORTH isthat there is no "command mode"--you cannot be outside of a: definition and issue some straight commands to be executed.Also, as we noted above, we cannot do comments. FORTH also provides a system for defining new data types,using the words [in one version of FORTH] <builds and does> .We would like to implement these words as well. As the highest priority thing, we will build control flowstructures first. Once we have control structures, we canwrite recursive routines that terminate, and we are ready totackle tasks like parsing, and the building of a command mode. By the way, location 0 holds the dictionary pointer, location1 holds the return stack pointer, and location 2 should alwaysbe 0--it's a fake dictionary entry that means "pushint".Section 3: Building THIRD In this section, I'm going to keep my conversation indented to this depth, rather than using fake comments-- because we'll have real comments eventually. The first thing we have to do is give the symbols for our built-ins.: immediate _read @ ! - * / < exit echo key _pick Next we want to be mildly self commenting, so we define the word 'r' to push the *address of the return stack pointer* onto the stack--NOT the value of the return stack pointer. (In fact, when we run r, the value of the return stack pointer is temporarily changed.): r 1 exit Next, we're currently executing a short loop that contains _read and recursion, which is slowly blowing up the return stack. So let's define a new word, from which you can never return. What it does is drops the top value off the return stack, calls _read, then calls itself. Because it kills the top of the return stack, it can recurse indefinitely.: ] r @ Get the value of the return stack pointer 1 - Subtract one r ! Store it back into the return stack pointer _read Read and compile one word ] Start over Notice that we don't need to exit, since we never come back. Also, it's possible that an immediate word may get run during _read, and that _read will never return! Now let's get compile running.: main immediate ]main Next off, I'm going to do this the easy but non-portable way, and put some character constant definitions in. I wanted them at the top of the file, but that would have burned too much of the return stack.: '"' 34 exit: ')' 41 exit: '\n' 10 exit: 'space' 32 exit: '0' 48 exit: '-' 45 exit: cr '\n' echo exit Next, we want to define some temporary variables for locations 3, 4, and 5, since this'll make our code look clearer.: _x 3 @ exit: _x! 3 ! exit: _y 4 @ exit: _y! 4 ! exit Ok. Now, we want to make THIRD look vaguely like FORTH, so we're going to define ';'. What ; ought to do is terminate a compilation, and turn control over to the command-mode handler. We don't have one, so all we want ';' to do for now is compile 'exit' at the end of the current word. To do this we'll need several other words. Swap by writing out the top two elements into temps, and then reading them back in the other order.: swap _x! _y! _x _y exit Take another look and make sure you see why that works, since it LOOKS like I'm reading them back in the same order--in fact, it not only looks like it, but I AM! Addition might be nice to have. To add, we need to negate the top element of the stack, and then subtract. To negate, we subtract from 0.: + 0 swap - - exit Create a copy of the top of stack: dup _x! _x _x exit Get a mnemonic name for our dictionary pointer--we need to compile stuff, so it goes through this.: h 0 exit We're going to need to advance that pointer, so let's make a generic pointer-advancing function. Given a pointer to a memory location, increment the value at that memory location.: inc dup @ Get another copy of the address, and get the value so now we have value, address on top of stack. 1 + Add one to the value swap Swap to put the address on top of the stack ! exit Write it to memory , is a standard FORTH word. It should write the top of stack into the dictionary, and advance the pointer: , h @ Get the value of the dictionary pointer ! Write the top of stack there h inc And increment the dictionary pointer exit ' is a standard FORTH word. It should push the address of the word that follows it onto the stack. We could do this by making ' immediate, but then it'd need to parse the next word. Instead, we compile the next word as normal. When ' is executed, the top of the return stack will point into the instruction stream immediately after the ' . We push the word there, and advance the return stack pointer so that we don't execute it.: ' r @ Get the address of the top of return stack We currently have a pointer to the top of return stack @ Get the value from there We currently have a pointer to the instruction stream dup Get another copy of it--the bottom copy will stick around until the end of this word 1 + Increment the pointer, pointing to the NEXT instruction r @ ! Write it back onto the top of the return stack We currently have our first copy of the old pointer to the instruction stream
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -