?? e_log.s
字號:
.file "log.s"// Copyright (c) 2000 - 2005, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at// http://www.intel.com/software/products/opensource/libraries/num.htm.//// History//==============================================================// 02/02/00 Initial version// 04/04/00 Unwind support added// 06/16/00 Updated table to be rounded correctly// 08/15/00 Bundle added after call to __libm_error_support to properly// set [the previously overwritten] GR_Parameter_RESULT.// 08/17/00 Improved speed of main path by 5 cycles// Shortened path for x=1.0// 01/09/01 Improved speed, fixed flags for neg denormals// 05/20/02 Cleaned up namespace and sf0 syntax// 05/23/02 Modified algorithm. Now only one polynomial is used// for |x-1| >= 1/256 and for |x-1| < 1/256// 12/11/02 Improved performance for Itanium 2// 03/31/05 Reformatted delimiters between data tables//// API//==============================================================// double log(double)// double log10(double)////// Overview of operation//==============================================================// Background// ----------//// This algorithm is based on fact that// log(a b) = log(a) + log(b).// In our case we have x = 2^N f, where 1 <= f < 2.// So// log(x) = log(2^N f) = log(2^N) + log(f) = n*log(2) + log(f)//// To calculate log(f) we do following// log(f) = log(f * frcpa(f) / frcpa(f)) =// = log(f * frcpa(f)) + log(1/frcpa(f))//// According to definition of IA-64's frcpa instruction it's a// floating point that approximates 1/f using a lookup on the// top of 8 bits of the input number's significand with relative// error < 2^(-8.886). So we have following//// |(1/f - frcpa(f)) / (1/f))| = |1 - f*frcpa(f)| < 1/256//// and//// log(f) = log(f * frcpa(f)) + log(1/frcpa(f)) =// = log(1 + r) + T//// The first value can be computed by polynomial P(r) approximating// log(1 + r) on |r| < 1/256 and the second is precomputed tabular// value defined by top 8 bit of f.//// Finally we have that log(x) ~ (N*log(2) + T) + P(r)//// Note that if input argument is close to 1.0 (in our case it means// that |1 - x| < 1/256) we can use just polynomial approximation// because x = 2^0 * f = f = 1 + r and// log(x) = log(1 + r) ~ P(r)////// To compute log10(x) we use the simple identity//// log10(x) = log(x)/log(10)//// so we have that//// log10(x) = (N*log(2) + T + log(1+r)) / log(10) =// = N*(log(2)/log(10)) + (T/log(10)) + log(1 + r)/log(10)////// Implementation// --------------// It can be seen that formulas for log and log10 differ from one another// only by coefficients and tabular values. Namely as log as log10 are// calculated as (N*L1 + T) + L2*Series(r) where in case of log// L1 = log(2)// T = log(1/frcpa(x))// L2 = 1.0// and in case of log10// L1 = log(2)/log(10)// T = log(1/frcpa(x))/log(10)// L2 = 1.0/log(10)//// So common code with two different entry points those set pointers// to the base address of coresponding data sets containing values// of L2,T and prepare integer representation of L1 needed for following// setf instruction.//// Note that both log and log10 use common approximation polynomial// it means we need only one set of coefficients of approximation.////// 1. |x-1| >= 1/256// InvX = frcpa(x)// r = InvX*x - 1// P(r) = r*((r*A3 - A2) + r^4*((A4 + r*A5) + r^2*(A6 + r*A7)),// all coefficients are calcutated in quad and rounded to double// precision. A7,A6,A5,A4 are stored in memory whereas A3 and A2// created with setf.//// N = float(n) where n is true unbiased exponent of x//// T is tabular value of log(1/frcpa(x)) calculated in quad precision// and represented by two floating-point numbers 64-bit Thi and 32-bit Tlo.// To load Thi,Tlo we get bits from 55 to 62 of register format significand// as index and calculate two addresses// ad_Thi = Thi_table_base_addr + 8 * index// ad_Tlo = Tlo_table_base_addr + 4 * index//// L2 (1.0 or 1.0/log(10) depending on function) is calculated in quad// precision and rounded to double extended; it's loaded from memory.//// L1 (log(2) or log10(2) depending on function) is calculated in quad// precision and represented by two floating-point 64-bit numbers L1hi,L1lo// stored in memory.//// And final result = ((L1hi*N + Thi) + (N*L1lo + Tlo)) + L2*P(r)////// 2. |x-1| < 1/256// r = x - 1// P(r) = r*((r*A3 - A2) + r^4*((A4 + r*A5) + r^2*(A6 + r*A7)),// A7,A6,A5A4,A3,A2 are the same as in case |x-1| >= 1/256//// And final results// log(x) = P(r)// log10(x) = L2*P(r)//// 3. How we define is input argument such that |x-1| < 1/256 or not.//// To do it we analyze biased exponent and integer representation of// input argument//// a) First we test is biased exponent equal to 0xFFFE or 0xFFFF (i.e.// we test is 0.5 <= x < 2). This comparison can be performed using// unsigned version of cmp instruction in such a way// biased_exponent_of_x - 0xFFFE < 2////// b) Second (in case when result of a) is true) we need to compare x// with 1-1/256 and 1+1/256 or in double precision memory representation// with 0x3FEFE00000000000 and 0x3FF0100000000000 correspondingly.// This comparison can be made like in a), using unsigned// version of cmp i.e. ix - 0x3FEFE00000000000 < 0x0000300000000000.// 0x0000300000000000 is difference between 0x3FF0100000000000 and// 0x3FEFE00000000000//// Note: NaT, any NaNs, +/-INF, +/-0, negatives and unnormalized numbers are// filtered and processed on special branches.////// Special values//==============================================================//// log(+0) = -inf// log(-0) = -inf//// log(+qnan) = +qnan// log(-qnan) = -qnan// log(+snan) = +qnan// log(-snan) = -qnan//// log(-n) = QNAN Indefinite// log(-inf) = QNAN Indefinite//// log(+inf) = +inf////// Registers used//==============================================================// Floating Point registers used:// f8, input// f7 -> f15, f32 -> f42//// General registers used:// r8 -> r11// r14 -> r23//// Predicate registers used:// p6 -> p15// Assembly macros//==============================================================GR_TAG = r8GR_ad_1 = r8GR_ad_2 = r9GR_Exp = r10GR_N = r11GR_x = r14GR_dx = r15GR_NearOne = r15GR_xorg = r16GR_mask = r16GR_05 = r17GR_A3 = r18GR_Sig = r19GR_Ind = r19GR_Nm1 = r20GR_bias = r21GR_ad_3 = r22GR_rexp = r23GR_SAVE_B0 = r33GR_SAVE_PFS = r34GR_SAVE_GP = r35GR_SAVE_SP = r36GR_Parameter_X = r37GR_Parameter_Y = r38GR_Parameter_RESULT = r39GR_Parameter_TAG = r40FR_NormX = f7FR_RcpX = f9FR_tmp = f9FR_r = f10FR_r2 = f11FR_r4 = f12FR_N = f13FR_Ln2hi = f14FR_Ln2lo = f15FR_A7 = f32FR_A6 = f33FR_A5 = f34FR_A4 = f35FR_A3 = f36FR_A2 = f37FR_Thi = f38FR_NxLn2hipThi = f38FR_NxLn2pT = f38FR_Tlo = f39FR_NxLn2lopTlo = f39FR_InvLn10 = f40FR_A32 = f41FR_A321 = f42FR_Y = f1FR_X = f10FR_RESULT = f8// Data//==============================================================RODATA.align 16LOCAL_OBJECT_START(log_data)// coefficients of polynomial approximationdata8 0x3FC2494104381A8E // A7data8 0xBFC5556D556BBB69 // A6//// two parts of ln(2)data8 0x3FE62E42FEF00000,0x3DD473DE6AF278ED//data8 0x8000000000000000,0x3FFF // 1.0//data8 0x3FC999999988B5E9 // A5data8 0xBFCFFFFFFFF6FFF5 // A4//// hi parts of ln(1/frcpa(1+i/256)), i=0...255data8 0x3F60040155D5889D // 0data8 0x3F78121214586B54 // 1data8 0x3F841929F96832EF // 2data8 0x3F8C317384C75F06 // 3data8 0x3F91A6B91AC73386 // 4data8 0x3F95BA9A5D9AC039 // 5data8 0x3F99D2A8074325F3 // 6data8 0x3F9D6B2725979802 // 7data8 0x3FA0C58FA19DFAA9 // 8data8 0x3FA2954C78CBCE1A // 9data8 0x3FA4A94D2DA96C56 // 10data8 0x3FA67C94F2D4BB58 // 11data8 0x3FA85188B630F068 // 12data8 0x3FAA6B8ABE73AF4C // 13data8 0x3FAC441E06F72A9E // 14data8 0x3FAE1E6713606D06 // 15data8 0x3FAFFA6911AB9300 // 16data8 0x3FB0EC139C5DA600 // 17data8 0x3FB1DBD2643D190B // 18data8 0x3FB2CC7284FE5F1C // 19data8 0x3FB3BDF5A7D1EE64 // 20data8 0x3FB4B05D7AA012E0 // 21data8 0x3FB580DB7CEB5701 // 22data8 0x3FB674F089365A79 // 23data8 0x3FB769EF2C6B568D // 24data8 0x3FB85FD927506A47 // 25data8 0x3FB9335E5D594988 // 26data8 0x3FBA2B0220C8E5F4 // 27data8 0x3FBB0004AC1A86AB // 28data8 0x3FBBF968769FCA10 // 29data8 0x3FBCCFEDBFEE13A8 // 30data8 0x3FBDA727638446A2 // 31data8 0x3FBEA3257FE10F79 // 32data8 0x3FBF7BE9FEDBFDE5 // 33data8 0x3FC02AB352FF25F3 // 34data8 0x3FC097CE579D204C // 35data8 0x3FC1178E8227E47B // 36data8 0x3FC185747DBECF33 // 37data8 0x3FC1F3B925F25D41 // 38data8 0x3FC2625D1E6DDF56 // 39data8 0x3FC2D1610C868139 // 40data8 0x3FC340C59741142E // 41data8 0x3FC3B08B6757F2A9 // 42data8 0x3FC40DFB08378003 // 43data8 0x3FC47E74E8CA5F7C // 44data8 0x3FC4EF51F6466DE4 // 45data8 0x3FC56092E02BA516 // 46data8 0x3FC5D23857CD74D4 // 47data8 0x3FC6313A37335D76 // 48data8 0x3FC6A399DABBD383 // 49data8 0x3FC70337DD3CE41A // 50data8 0x3FC77654128F6127 // 51data8 0x3FC7E9D82A0B022D // 52data8 0x3FC84A6B759F512E // 53data8 0x3FC8AB47D5F5A30F // 54data8 0x3FC91FE49096581B // 55data8 0x3FC981634011AA75 // 56data8 0x3FC9F6C407089664 // 57data8 0x3FCA58E729348F43 // 58data8 0x3FCABB55C31693AC // 59data8 0x3FCB1E104919EFD0 // 60data8 0x3FCB94EE93E367CA // 61data8 0x3FCBF851C067555E // 62data8 0x3FCC5C0254BF23A5 // 63data8 0x3FCCC000C9DB3C52 // 64data8 0x3FCD244D99C85673 // 65data8 0x3FCD88E93FB2F450 // 66data8 0x3FCDEDD437EAEF00 // 67data8 0x3FCE530EFFE71012 // 68data8 0x3FCEB89A1648B971 // 69data8 0x3FCF1E75FADF9BDE // 70data8 0x3FCF84A32EAD7C35 // 71data8 0x3FCFEB2233EA07CD // 72data8 0x3FD028F9C7035C1C // 73data8 0x3FD05C8BE0D9635A // 74data8 0x3FD085EB8F8AE797 // 75data8 0x3FD0B9C8E32D1911 // 76data8 0x3FD0EDD060B78080 // 77data8 0x3FD122024CF0063F // 78data8 0x3FD14BE2927AECD4 // 79data8 0x3FD180618EF18ADF // 80data8 0x3FD1B50BBE2FC63B // 81data8 0x3FD1DF4CC7CF242D // 82data8 0x3FD214456D0EB8D4 // 83data8 0x3FD23EC5991EBA49 // 84data8 0x3FD2740D9F870AFB // 85data8 0x3FD29ECDABCDFA03 // 86data8 0x3FD2D46602ADCCEE // 87data8 0x3FD2FF66B04EA9D4 // 88data8 0x3FD335504B355A37 // 89data8 0x3FD360925EC44F5C // 90data8 0x3FD38BF1C3337E74 // 91data8 0x3FD3C25277333183 // 92data8 0x3FD3EDF463C1683E // 93data8 0x3FD419B423D5E8C7 // 94data8 0x3FD44591E0539F48 // 95data8 0x3FD47C9175B6F0AD // 96data8 0x3FD4A8B341552B09 // 97data8 0x3FD4D4F39089019F // 98data8 0x3FD501528DA1F967 // 99data8 0x3FD52DD06347D4F6 // 100data8 0x3FD55A6D3C7B8A89 // 101data8 0x3FD5925D2B112A59 // 102data8 0x3FD5BF406B543DB1 // 103data8 0x3FD5EC433D5C35AD // 104data8 0x3FD61965CDB02C1E // 105data8 0x3FD646A84935B2A1 // 106data8 0x3FD6740ADD31DE94 // 107data8 0x3FD6A18DB74A58C5 // 108data8 0x3FD6CF31058670EC // 109data8 0x3FD6F180E852F0B9 // 110data8 0x3FD71F5D71B894EF // 111data8 0x3FD74D5AEFD66D5C // 112data8 0x3FD77B79922BD37D // 113data8 0x3FD7A9B9889F19E2 // 114data8 0x3FD7D81B037EB6A6 // 115data8 0x3FD8069E33827230 // 116data8 0x3FD82996D3EF8BCA // 117data8 0x3FD85855776DCBFA // 118data8 0x3FD8873658327CCE // 119data8 0x3FD8AA75973AB8CE // 120data8 0x3FD8D992DC8824E4 // 121data8 0x3FD908D2EA7D9511 // 122data8 0x3FD92C59E79C0E56 // 123
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -