?? w_tgammaf.s
字號:
.file "tgammaf.s"// Copyright (c) 2001 - 2005, Intel Corporation// All rights reserved.//// Contributed 2001 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,INCLUDING,BUT NOT // LIMITED TO,THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT,INDIRECT,INCIDENTAL,SPECIAL,// EXEMPLARY,OR CONSEQUENTIAL DAMAGES (INCLUDING,BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,DATA,OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY,WHETHER IN CONTRACT,STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE,EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code,and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History: // 11/30/01 Initial version// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align// 04/04/03 Changed error codes for overflow and negative integers// 04/10/03 Changed code for overflow near zero handling// 12/16/03 Fixed parameter passing to/from error handling routine// 03/31/05 Reformatted delimiters between data tables////*********************************************************************////*********************************************************************//// Function: tgammaf(x) computes the principle value of the GAMMA// function of x.////*********************************************************************//// Resources Used://// Floating-Point Registers: f8-f15// f33-f75//// General Purpose Registers:// r8-r11// r14-r29// r32-r36// r37-r40 (Used to pass arguments to error handling routine)//// Predicate Registers: p6-p15////*********************************************************************//// IEEE Special Conditions://// tgammaf(+inf) = +inf// tgammaf(-inf) = QNaN // tgammaf(+/-0) = +/-inf // tgammaf(x<0, x - integer) = QNaN// tgammaf(SNaN) = QNaN// tgammaf(QNaN) = QNaN////*********************************************************************//// Overview//// The method consists of three cases.// // If 2 <= x < OVERFLOW_BOUNDARY use case tgamma_regular;// else if 0 < x < 2 use case tgamma_from_0_to_2;// else if -(i+1) < x < -i, i = 0...43 use case tgamma_negatives;//// Case 2 <= x < OVERFLOW_BOUNDARY// -------------------------------// Here we use algorithm based on the recursive formula// GAMMA(x+1) = x*GAMMA(x). For that we subdivide interval// [2; OVERFLOW_BOUNDARY] into intervals [8*n; 8*(n+1)] and// approximate GAMMA(x) by polynomial of 22th degree on each// [8*n; 8*n+1], recursive formula is used to expand GAMMA(x)// to [8*n; 8*n+1]. In other words we need to find n, i and r// such that x = 8 * n + i + r where n and i are integer numbers// and r is fractional part of x. So GAMMA(x) = GAMMA(8*n+i+r) =// = (x-1)*(x-2)*...*(x-i)*GAMMA(x-i) =// = (x-1)*(x-2)*...*(x-i)*GAMMA(8*n+r) ~// ~ (x-1)*(x-2)*...*(x-i)*P12n(r).//// Step 1: Reduction// -----------------// N = [x] with truncate// r = x - N, note 0 <= r < 1//// n = N & ~0xF - index of table that contains coefficient of// polynomial approximation // i = N & 0xF - is used in recursive formula// //// Step 2: Approximation// ---------------------// We use factorized minimax approximation polynomials// P12n(r) = A12*(r^2+C01(n)*r+C00(n))*// *(r^2+C11(n)*r+C10(n))*...*(r^2+C51(n)*r+C50(n))//// Step 3: Recursion// -----------------// In case when i > 0 we need to multiply P12n(r) by product// R(i,x)=(x-1)*(x-2)*...*(x-i). To reduce number of fp-instructions// we can calculate R as follow: // R(i,x) = ((x-1)*(x-2))*((x-3)*(x-4))*...*((x-(i-1))*(x-i)) if i is// even or R = ((x-1)*(x-2))*((x-3)*(x-4))*...*((x-(i-2))*(x-(i-1)))*// *(i-1) if i is odd. In both cases we need to calculate// R2(i,x) = (x^2-3*x+2)*(x^2-7*x+12)*...*(x^2+x+2*j*(2*j-1)) =// = ((x^2-x)+2*(1-x))*((x^2-x)+6*(2-x))*...*((x^2-x)+2*(2*j-1)*(j-x)) =// = (RA+2*RB)*(RA+6*(1-RB))*...*(RA+2*(2*j-1)*(j-1+RB))// where j = 1..[i/2], RA = x^2-x, RB = 1-x.//// Step 4: Reconstruction// ----------------------// Reconstruction is just simple multiplication i.e.// GAMMA(x) = P12n(r)*R(i,x)//// Case 0 < x < 2// --------------// To calculate GAMMA(x) on this interval we do following// if 1.0 <= x < 1.25 than GAMMA(x) = P7(x-1)// if 1.25 <= x < 1.5 than GAMMA(x) = P7(x-x_min) where// x_min is point of local minimum on [1; 2] interval.// if 1.5 <= x < 1.75 than GAMMA(x) = P7(x-1.5)// if 1.75 <= x < 2.0 than GAMMA(x) = P7(x-1.5)// and // if 0 < x < 1 than GAMMA(x) = GAMMA(x+1)/x//// Case -(i+1) < x < -i, i = 0...43// ----------------------------------// Here we use the fact that GAMMA(-x) = PI/(x*GAMMA(x)*sin(PI*x)) and// so we need to calculate GAMMA(x), sin(PI*x)/PI. Calculation of// GAMMA(x) is described above.//// Step 1: Reduction// -----------------// Note that period of sin(PI*x) is 2 and range reduction for // sin(PI*x) is like to range reduction for GAMMA(x) // i.e rs = x - round(x) and |rs| <= 0.5.//// Step 2: Approximation// ---------------------// To approximate sin(PI*x)/PI = sin(PI*(2*n+rs))/PI = // = (-1)^n*sin(PI*rs)/PI Taylor series is used.// sin(PI*rs)/PI ~ S17(rs).//// Step 3: Division// ----------------// To calculate 1/x and 1/(GAMMA(x)*S12(rs)) we use frcpa// instruction with following Newton-Raphson interations.// ////*********************************************************************GR_ad_Data = r8GR_TAG = r8GR_SignExp = r9GR_Sig = r10GR_ArgNz = r10GR_RqDeg = r11GR_NanBound = r14GR_ExpOf025 = r15GR_ExpOf05 = r16GR_ad_Co = r17GR_ad_Ce = r18GR_TblOffs = r19GR_Arg = r20GR_Exp2Ind = r21GR_TblOffsMask = r21GR_Offs = r22GR_OvfNzBound = r23GR_ZeroResBound = r24GR_ad_SinO = r25GR_ad_SinE = r26GR_Correction = r27GR_Tbl12Offs = r28GR_NzBound = r28GR_ExpOf1 = r29GR_fpsr = r29GR_SAVE_B0 = r33GR_SAVE_PFS = r34GR_SAVE_GP = r35GR_SAVE_SP = r36GR_Parameter_X = r37GR_Parameter_Y = r38GR_Parameter_RESULT = r39GR_Parameter_TAG = r40FR_X = f10FR_Y = f1FR_RESULT = f8FR_iXt = f11 FR_Xt = f12FR_r = f13FR_r2 = f14FR_r4 = f15FR_C01 = f33FR_A7 = f33FR_C11 = f34FR_A6 = f34FR_C21 = f35FR_A5 = f35FR_C31 = f36FR_A4 = f36FR_C41 = f37FR_A3 = f37FR_C51 = f38FR_A2 = f38FR_C00 = f39FR_A1 = f39FR_C10 = f40FR_A0 = f40FR_C20 = f41FR_C30 = f42FR_C40 = f43FR_C50 = f44FR_An = f45FR_OvfBound = f46FR_InvAn = f47FR_Multplr = f48FR_NormX = f49FR_X2mX = f50FR_1mX = f51FR_Rq0 = f51FR_Rq1 = f52FR_Rq2 = f53FR_Rq3 = f54FR_Rcp0 = f55FR_Rcp1 = f56FR_Rcp2 = f57FR_InvNormX1 = f58FR_InvNormX2 = f59FR_rs = f60FR_rs2 = f61FR_LocalMin = f62FR_10 = f63FR_05 = f64FR_S32 = f65FR_S31 = f66FR_S01 = f67FR_S11 = f68FR_S21 = f69FR_S00 = f70FR_S10 = f71FR_S20 = f72FR_GAMMA = f73FR_2 = f74FR_6 = f75// Data tables//==============================================================RODATA.align 16LOCAL_OBJECT_START(tgammaf_data)data8 0x3FDD8B618D5AF8FE // local minimum (0.461632144968362356785)data8 0x4024000000000000 // 10.0data8 0x3E90FC992FF39E13 // S32data8 0xBEC144B2760626E2 // S31////[2; 8)data8 0x4009EFD1BA0CB3B4 // C01data8 0x3FFFB35378FF4822 // C11data8 0xC01032270413B896 // C41data8 0xC01F171A4C0D6827 // C51data8 0x40148F8E197396AC // C20data8 0x401C601959F1249C // C30data8 0x3EE21AD881741977 // Andata8 0x4041852200000000 // overflow boundary (35.04010009765625)data8 0x3FD9CE68F695B198 // C21data8 0xBFF8C30AC900DA03 // C31data8 0x400E17D2F0535C02 // C00data8 0x4010689240F7FAC8 // C10data8 0x402563147DDCCF8D // C40data8 0x4033406D0480A21C // C50////[8; 16)data8 0x4006222BAE0B793B // C01data8 0x4002452733473EDA // C11data8 0xC0010EF3326FDDB3 // C41data8 0xC01492B817F99C0F // C51data8 0x40099C905A249B75 // C20data8 0x4012B972AE0E533D // C30data8 0x3FE6F6DB91D0D4CC // Andata8 0x4041852200000000 // overflow boundarydata8 0x3FF545828F7B73C5 // C21data8 0xBFBBD210578764DF // C31data8 0x4000542098F53CFC // C00data8 0x40032C1309AD6C81 // C10data8 0x401D7331E19BD2E1 // C40data8 0x402A06807295EF57 // C50////[16; 24)data8 0x4000131002867596 // C01data8 0x3FFAA362D5D1B6F2 // C11data8 0xBFFCB6985697DB6D // C41data8 0xC0115BEE3BFC3B3B // C51data8 0x3FFE62FF83456F73 // C20data8 0x4007E33478A114C4 // C30data8 0x41E9B2B73795ED57 // Andata8 0x4041852200000000 // overflow boundarydata8 0x3FEEB1F345BC2769 // C21data8 0xBFC3BBE6E7F3316F // C31data8 0x3FF14E07DA5E9983 // C00data8 0x3FF53B76BF81E2C0 // C10data8 0x4014051E0269A3DC // C40data8 0x40229D4227468EDB // C50////[24; 32)data8 0x3FFAF7BD498384DE // C01data8 0x3FF62AD8B4D1C3D2 // C11data8 0xBFFABCADCD004C32 // C41data8 0xC00FADE97C097EC9 // C51data8 0x3FF6DA9ED737707E // C20data8 0x4002A29E9E0C782C // C30data8 0x44329D5B5167C6C3 // Andata8 0x4041852200000000 // overflow boundarydata8 0x3FE8943CBBB4B727 // C21data8 0xBFCB39D466E11756 // C31data8 0x3FE879AF3243D8C1 // C00data8 0x3FEEC7DEBB14CE1E // C10data8 0x401017B79BA80BCB // C40data8 0x401E941DC3C4DE80 // C50////[32; 40)data8 0x3FF7ECB3A0E8FE5C // C01data8 0x3FF3815A8516316B // C11data8 0xBFF9ABD8FCC000C3 // C41data8 0xC00DD89969A4195B // C51data8 0x3FF2E43139CBF563 // C20data8 0x3FFF96DC3474A606 // C30data8 0x46AFF4CA9B0DDDF0 // Andata8 0x4041852200000000 // overflow boundarydata8 0x3FE4CE76DA1B5783 // C21data8 0xBFD0524DB460BC4E // C31data8 0x3FE35852DF14E200 // C00data8 0x3FE8C7610359F642 // C10data8 0x400BCF750EC16173 // C40data8 0x401AC14E02EA701C // C50////[40; 48)data8 0x3FF5DCE4D8193097 // C01data8 0x3FF1B0D8C4974FFA // C11data8 0xBFF8FB450194CAEA // C41data8 0xC00C9658E030A6C4 // C51data8 0x3FF068851118AB46 // C20data8 0x3FFBF7C7BB46BF7D // C30data8 0x3FF0000000000000 // Andata8 0x4041852200000000 // overflow boundarydata8 0x3FE231DEB11D847A // C21data8 0xBFD251ECAFD7E935 // C31data8 0x3FE0368AE288F6BF // C00data8 0x3FE513AE4215A70C // C10data8 0x4008F960F7141B8B // C40data8 0x40183BA08134397B // C50////[1.0; 1.25)data8 0xBFD9909648921868 // A7data8 0x3FE96FFEEEA8520F // A6data8 0xBFED0800D93449B8 // A3data8 0x3FEFA648D144911C // A2data8 0xBFEE3720F7720B4D // A5data8 0x3FEF4857A010CA3B // A4data8 0xBFE2788CCD545AA4 // A1data8 0x3FEFFFFFFFE9209E // A0////[1.25; 1.5)data8 0xBFB421236426936C // A7data8 0x3FAF237514F36691 // A6data8 0xBFC0BADE710A10B9 // A3data8 0x3FDB6C5465BBEF1F // A2data8 0xBFB7E7F83A546EBE // A5data8 0x3FC496A01A545163 // A4data8 0xBDEE86A39D8452EB // A1data8 0x3FEC56DC82A39AA2 // A0////[1.5; 1.75)data8 0xBF94730B51795867 // A7data8 0x3FBF4203E3816C7B // A6data8 0xBFE85B427DBD23E4 // A3data8 0x3FEE65557AB26771 // A2data8 0xBFD59D31BE3AB42A // A5data8 0x3FE3C90CC8F09147 // A4data8 0xBFE245971DF735B8 // A1data8 0x3FEFFC613AE7FBC8 // A0////[1.75; 2.0)data8 0xBF7746A85137617E // A7data8 0x3FA96E37D09735F3 // A6data8 0xBFE3C24AC40AC0BB // A3data8 0x3FEC56A80A977CA5 // A2data8 0xBFC6F0E707560916 // A5data8 0x3FDB262D949175BE // A4data8 0xBFE1C1AEDFB25495 // A1data8 0x3FEFEE1E644B2022 // A0//// sin(pi*x)/pidata8 0xC026FB0D377656CC // S01data8 0x3FFFB15F95A22324 // S11data8 0x406CE58F4A41C6E7 // S10data8 0x404453786302C61E // S20data8 0xC023D59A47DBFCD3 // S21data8 0x405541D7ABECEFCA // S00
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -