?? coding.html
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><HTML><HEAD> <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.20"> <TITLE>cc65 coding hints</TITLE></HEAD><BODY><H1>cc65 coding hints</H1><H2>Ullrich von Bassewitz, <A HREF="mailto:uz@cc65.org">uz@cc65.org</A></H2>03.12.2000<HR><EM>How to generate the most effective code with cc65.</EM><HR><H2><A NAME="s1">1. Use prototypes</A></H2><P>This will not only help to find errors between separate modules, it will alsogenerate better code, since the compiler must not assume that a variable sizedparameter list is in place and must not pass the argument count to the calledfunction. This will lead to shorter and faster code.</P><H2><A NAME="s2">2. Don't declare auto variables in nested function blocks</A></H2><P>Variable declarations in nested blocks are usually a good thing. But withcc65, there is a drawback: Since the compiler generates code in one pass, itmust create the variables on the stack each time the block is entered anddestroy them when the block is left. This causes a speed penalty and largercode.</P><H2><A NAME="s3">3. Remember that the compiler does not optimize</A></H2><P>The compiler needs hints from you about the code to generate. When accessingindexed data structures, get a pointer to the element and use this pointerinstead of calculating the index again and again. If you want to have yourloops unrolled, or loop invariant code moved outside the loop, you have to dothat yourself.</P><H2><A NAME="s4">4. Longs are slow!</A></H2><P>While long support is necessary for some things, it's really, really slow onthe 6502. Remember that any long variable will use 4 bytes of memory, and anyoperation works on double the data compared to an int.</P><H2><A NAME="s5">5. Use unsigned types wherever possible</A></H2><P>The CPU has no opcodes to handle signed values greater than 8 bit. So signextension, test of signedness etc. has to be done by hand. The code to handlesigned operations is usually a bit slower than the same code for unsignedtypes.</P><H2><A NAME="s6">6. Use chars instead of ints if possible</A></H2><P>While in arithmetic operations, chars are immidiately promoted to ints, theyare passed as chars in parameter lists and are accessed as chars in variables.The code generated is usually not much smaller, but it is faster, sinceaccessing chars is faster. For several operations, the generated code may bebetter if intermediate results that are known not to be larger than 8 bit arecasted to chars.</P><P>When doing</P><P><BLOCKQUOTE><CODE><PRE> unsigned char a; ... if ((a & 0x0F) == 0)</PRE></CODE></BLOCKQUOTE></P><P>the result of the & operator is an int because of the int promotion rules ofthe language. So the compare is also done with 16 bits. When using</P><P><BLOCKQUOTE><CODE><PRE> unsigned char a; ... if ((unsigned char)(a & 0x0F) == 0)</PRE></CODE></BLOCKQUOTE></P><P>the generated code is much shorter, since the operation is done with 8 bitsinstead of 16.</P><H2><A NAME="s7">7. Make the size of your array elements one of 1, 2, 4, 8</A></H2><P>When indexing into an array, the compiler has to calculate the byte offsetinto the array, which is the index multiplied by the size of one element. Whendoing the multiplication, the compiler will do a strength reduction, that is,replace the multiplication by a shift if possible. For the values 2, 4 and 8,there are even more specialized subroutines available. So, array access isfastest when using one of these sizes.</P><H2><A NAME="s8">8. Expressions are evaluated from left to right</A></H2><P>Since cc65 is not building an explicit expression tree when parsing anexpression, constant subexpressions may not be detected and optimized properlyif you don't help. Look at this example:</P><P><BLOCKQUOTE><CODE><PRE> #define OFFS 4 int i; i = i + OFFS + 3;</PRE></CODE></BLOCKQUOTE></P><P>The expression is parsed from left to right, that means, the compiler sees'i', and puts it contents into the secondary register. Next is OFFS, which isconstant. The compiler emits code to add a constant to the secondary register.Same thing again for the constant 3. So the code produced contains a fetch of'i', two additions of constants, and a store (into 'i'). Unfortunately, thecompiler does not see, that "OFFS + 3" is a constant for itself, since it doesit's evaluation from left to right. There are some ways to help the compilerto recognize expression like this:</P><P><OL><LI>Write "i = OFFS + 3 + i;". Since the first and second operand areconstant, the compiler will evaluate them at compile time reducing the code toa fetch, one addition (secondary + constant) and one store.</LI><LI>Write "i = i + (OFFS + 3)". When seeing the opening parenthesis, thecompiler will start a new expression evaluation for the stuff in the braces,and since all operands in the subexpression are constant, it will detect thisand reduce the code to one fetch, one addition and one store.</LI></OL></P><H2><A NAME="s9">9. Use the preincrement and predecrement operators</A></H2><P>The compiler is not always smart enough to figure out, if the rvalue of anincrement is used or not. So it has to save and restore that value whenproducing code for the postincrement and postdecrement operators, even if thisvalue is never used. To avoid the additional overhead, use the preincrementand predecrement operators if you don't need the resulting value. That means,use</P><P><BLOCKQUOTE><CODE><PRE> ... ++i; ...</PRE></CODE></BLOCKQUOTE></P><P>instead of</P><P><BLOCKQUOTE><CODE><PRE> ... i++; ...</PRE></CODE></BLOCKQUOTE></P><H2><A NAME="s10">10. Use constants to access absolute memory locations</A></H2><P>The compiler produces optimized code, if the value of a pointer is a constant.So, to access direct memory locations, use</P><P><BLOCKQUOTE><CODE><PRE> #define VDC_DATA 0xD601 *(char*)VDC_STATUS = 0x01;</PRE></CODE></BLOCKQUOTE></P><P>That will be translated to</P><P><BLOCKQUOTE><CODE><PRE> lda #$01 sta $D600</PRE></CODE></BLOCKQUOTE></P><P>The constant value detection works also for struct pointers and arrays, if thesubscript is a constant. So</P><P><BLOCKQUOTE><CODE><PRE> #define VDC ((unsigned char*)0xD600) #define STATUS 0x01 VDC [STATUS] = 0x01;</PRE></CODE></BLOCKQUOTE></P><P>will also work.</P><P>If you first load the constant into a variable and use that variable to accessan absolute memory location, the generated code will be much slower, since thecompiler does not know anything about the contents of the variable.</P><H2><A NAME="s11">11. Use initialized local variables - but use it with care</A></H2><P>Initialization of local variables when declaring them gives shorter and fastercode. So, use</P><P><BLOCKQUOTE><CODE><PRE> int i = 1;</PRE></CODE></BLOCKQUOTE></P><P>instead of</P><P><BLOCKQUOTE><CODE><PRE> int i; i = 1;</PRE></CODE></BLOCKQUOTE></P><P>But beware: To maximize your savings, don't mix uninitialized and initializedvariables. Create one block of initialized variables and one of uniniitalizedones. The reason for this is, that the compiler will sum up the space neededfor uninitialized variables as long as possible, and then allocate the spaceonce for all these variables. If you mix uninitialized and initializedvariables, you force the compiler to allocate space for the uninitializedvariables each time, it parses an initialized one. So do this:</P><P><BLOCKQUOTE><CODE><PRE> int i, j; int a = 3; int b = 0;</PRE></CODE></BLOCKQUOTE></P><P>instead of</P><P><BLOCKQUOTE><CODE><PRE> int i; int a = 3; int j; int b = 0;</PRE></CODE></BLOCKQUOTE></P><P>The latter will work, but will create larger and slower code.</P><H2><A NAME="s12">12. When using the ternary operator, cast values that are not ints</A></H2><P>The result type of the <CODE>?:</CODE> operator is a long, if one of the second orthird operands is a long. If the second operand has been evaluated and it wasof type int, and the compiler detects that the third operand is a long, it hasto add an additional <CODE>int</CODE> -> <CODE>long</CODE> conversion for the secondoperand. However, since the code for the second operand has already beenemitted, this gives much worse code.</P><P>Look at this:</P><P><BLOCKQUOTE><CODE><PRE> long f (long a) { return (a != 0)? 1 : a; }</PRE></CODE></BLOCKQUOTE></P><P>When the compiler sees the literal "1", it does not know, that the result typeof the <CODE>?:</CODE> operator is a long, so it will emit code to load a integerconstant 1. After parsing "a", which is a long, a <CODE>int</CODE> -> <CODE>long</CODE>conversion has to be applied to the second operand. This creates oneadditional jump, and an additional code for the conversion.</P><P>A better way would have been to write:</P><P><BLOCKQUOTE><CODE><PRE> long f (long a) { return (a != 0)? 1L : a; }</PRE></CODE></BLOCKQUOTE></P><P>By forcing the literal "1" to be of type long, the correct code is created inthe first place, and no additional conversion code is needed.</P><H2><A NAME="s13">13. Use the array operator [] even for pointers</A></H2><P>When addressing an array via a pointer, don't use the plus and dereferenceoperators, but the array operator. This will generate better code in somecommon cases.</P><P>Don't use</P><P><BLOCKQUOTE><CODE><PRE> char* a; char b, c; char b = *(a + c);</PRE></CODE></BLOCKQUOTE></P><P>Use</P><P><BLOCKQUOTE><CODE><PRE> char* a; char b, c; char b = a[c];</PRE></CODE></BLOCKQUOTE></P><P>instead.</P><H2><A NAME="s14">14. Use register variables with care</A></H2><P>Register variables may give faster and shorter code, but they do also have anoverhead. Register variables are actually zero page locations, so using themsaves roughly one cycle per access. Since the old values have to be saved andrestored, there is an overhead of about 70 cycles per 2 byte variable. It iseasy to see, that - apart from the additional code that is needed to save andrestore the values - you need to make heavy use of a variable to justify theoverhead.</P><P>As a general rule: Use register variables only for pointers that aredereferenced several times in your function, or for heavily used inductionvariables in a loop (with several 100 accesses).</P><P>When declaring register variables, try to keep them together, because thiswill allow the compiler to save and restore the old values in one chunk, andnot in several.</P><P>And remember: Register variables must be enabled with <CODE>-r</CODE> or <CODE>-Or</CODE>.</P><H2><A NAME="s15">15. Decimal constants greater than 0x7FFF are actually long ints</A></H2><P>The language rules for constant numeric values specify that decimal constantswithout a type suffix that are not in integer range must be of type long intor unsigned long int. This means that a simple constant like 40000 is of typelong int, and may cause an expression to be evaluated with 32 bits.</P><P>An example is:</P><P><BLOCKQUOTE><CODE><PRE> unsigned val; ... if (val < 65535) { ... }</PRE></CODE></BLOCKQUOTE></P><P>Here, the compare is evaluated using 32 bit precision. This makes the codelarger and a lot slower.</P><P>Using</P><P><BLOCKQUOTE><CODE><PRE> unsigned val; ... if (val < 0xFFFF) { ... }</PRE></CODE></BLOCKQUOTE></P><P>or</P><P><BLOCKQUOTE><CODE><PRE> unsigned val; ... if (val < 65535U) { ... }</PRE></CODE></BLOCKQUOTE></P><P>instead will give shorter and faster code.</P><H2><A NAME="s16">16. Access to parameters in variadic functions is expensive</A></H2><P>Since cc65 has the "wrong" calling order, the location of the fixed parametersin a variadic function (a function with a variable parameter list) depends onthe number and size of variable arguments passed. Since this number and sizeis unknown at compile time, the compiler will generate code to calculate thelocation on the stack when needed.</P><P>Because of this additional code, accessing the fixed parameters in a variadicfunction is much more expensive than access to parameters in a "normal"function. Unfortunately, this additional code is also invisible to theprogrammer, so it is easy to forget.</P><P>As a rule of thumb, if you access such a parameter more than once, you shouldthink about copying it into a normal variable and using this variable instead.</P></BODY></HTML>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -