?? 00000020.htm
字號:
<?xml version="1.0" encoding="gb2312"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=gb2312"/><title>linux fpr ppc chapter 19 jacobw </title></head><body><center><h1>BBS 水木清華站∶精華區</h1></center><a name="top"></a>發信人: plato (純真年代), 信區: Embedded <br />標 題: linux fpr ppc chapter 19 <br />發信站: BBS 水木清華站 (Wed May 30 23:23:31 2001) <br /> <br />Next Previous Contents <br />---------------------------------------------------------------------------- <br />---- <br />19. Performance <br />19.1 CPU core <br />Cache <br />Firstly, make sure you have both the I and D caches enabled! <br />Also, make sure you have serialization disabled (Set ICTRL to 0x7). <br />To get maximum performance, you need to enable copyback data cache. This can <br /> be disabled in order to make the standard Linux/PPC libraries work without <br />recompiling. If you build your own glibc as described under Runtime Library, <br /> you can enable copyback. Look for a "make config" option, or grep for DC_SF <br />WT in <br />arch/ppc/kernel/head.S <br />and change the <br />#if 0 <br />to <br />#if 1 <br />. <br />BogoMIPS <br />The BogoMIPS value on 8xx processors should be within 1% or so of the actual <br /> CPU core frequency, allowing for rounding & minor timing calculation errors <br />. This makes it a useful sanity check to verify that the internal clock mult <br />iplier is set correctly, and that the I-cache is turned on. However, note th <br />at the calculation of the BogoMIPS value is still tied to the external clock <br /> source and internal prescaler settings, so it shouldn't be solely relied on <br /> to verify that the core frequency really is what you think it should be. A <br />simple cross-check is to perform a 'sleep 10' at the shell prompt, and time <br />it with a watch to check that you're at least in the ballpark. It's wise to <br />measure your system more accurately than this with a CRO at least once. <br />Also, beware that the BogoMIPS rating should not be used as a general CPU pe <br />rformance measure; see: <a href="http://linuxdoc.org/HOWTO/mini/BogoMips.html">http://linuxdoc.org/HOWTO/mini/BogoMips.html</a> <br />19.2 Profiling <br />There are numerous options available for system profiling, depending on what <br /> you wish to measure, and how invasive you are prepared to be. <br />/proc/profile <br />/proc/profile is a standard kernel feature which provides simple kernel prof <br />iling based on Instruction Pointer sampling in the periodic timer interrupt <br />routine. It's simplistic but effective, and low overhead since the interrupt <br /> is going to happen anyway. The data is processed with readprofile which loo <br />ks up the System.map to show which kernel functions are using the most CPU t <br />ime. It doesn't work for modules yet so at present you need to compile them <br />in for profiling. <br />You need to enable this at boot time by passing profile=2 on the command lin <br />e; The number gives the power of 2 granularity used for the counters -- 2 wi <br />ll give you a seperate counter for each PowerPC instruction (each 4 bytes). <br />Higher numbers consume less memory and give less precise results. The data f <br />rom /proc/profile will be in target byte order, so if you're cross-developin <br />g you may need to either byte swap it, or compile readprofile to run on your <br /> target. <br />The PowerPC branch of the Linux kernel has been slow to implement the Instru <br />ction Pointer sampling function necessary to generate the /proc/profile data <br />. If it isn't implemented in your kernel, you'll see that readprofile always <br /> shows zero time for every kernel function. In this case you need to apply t <br />he profile.patch from: <a href="http://members.xoom.com/greyhams/linux/patches/">http://members.xoom.com/greyhams/linux/patches/</a> <br />Linux Trace Toolkit <br /><a href="http://www.opersys.com/LTT">http://www.opersys.com/LTT</a> <br />The Linux Trace Toolkit works with an instrumented Linux kernel by saving ti <br />me-stamped records of important kernel events to a binary data file. A data <br />decoder converts the binary data to text and calculates statistical summarie <br />s, such as percent processor utilization by each process. The toolkit also i <br />ncludes an integrated environment that graphically displays the results and <br />provides search capability. <br />A version for embedded PowerPC targets is now available from: <a href="ftp://ftp.mvis">ftp://ftp.mvis</a> <br />ta.com/pub/LTT. <br />gprof <br />All the usual Linux user mode profiling tools like gprof are available. <br />kernprof <br /><a href="http://oss.sgi.com/projects/kernprof">http://oss.sgi.com/projects/kernprof</a> <br />This project aims to make full gprof profiling available for the kernel. How <br />ever, it hasn't been ported to the PowerPC architecture yet. <br />19.3 IDMA <br />Beware that IDMA on the 860 is not designed for high performance, and the CP <br />U gets better throughput with explicit cache bursted programmed I/O. Search <br />for IDMA for more discussion. <br />Confusion sometimes arises because DMA transfers in most systems are faster <br />than CPU transfers, whereas here the reverse is generally true. Furthermore, <br /> IDMA transfers eat into CPM processing time, limiting throughput on other c <br />ommunications modules at the same time. <br />19.4 Network <br />To get good TCP/IP performance, you need a fast CPU. Using the FEC, a 50 MHz <br /> 860P will run about 30 Mbits/sec TCP/IP, and a 100 MHz 860P will run about <br />60 Mbits/sec TCP/IP. The bottleneck is the protocol and application processi <br />ng in the PPC core. The performance of a TCP/IP connection scales nearly lin <br />early with the processor speed. <br />If you need to go faster, use the 8260. <br />19.5 Optimisation <br />Optimising everything for space using gcc's -Os option is likely to provide <br />both the smallest code size and best performance, because it inhibits loop u <br />nrolling optimisation which tends to have a negative effect on embedded proc <br />essors with relatively small cache sizes. Furthermore, PowerPC processors ca <br />n speculatively execute branches overlapped with other loop instructions, ma <br />king the branch effectively execute in zero cycles so loop unrolling is unne <br />cessary in many circumstances. <br />---------------------------------------------------------------------------- <br />---- <br />Next Previous Contents <br /> <br />-- <br /> <br />※ 來源:·BBS 水木清華站 smth.org·[FROM: 166.111.161.8] <br /><a href="00000019.htm">上一篇</a><a href="javascript:history.go(-1)">返回上一頁</a><a href="index.htm">回到目錄</a><a href="#top">回到頁首</a><a href="00000021.htm">下一篇</a></h1></center><center><h1>BBS 水木清華站∶精華區</h1></center></body></html>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -