?? libm_reduce.s

?? glibc 庫, 不僅可以學習使用庫函數,還可以學習函數的具體實現,是提高功力的好資料
?? S
?? 第 1 頁 / 共 3 頁
字號:
12 3 下一頁
.file "libm_reduce.s"// Copyright (c) 2000 - 2003, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at// http://www.intel.com/software/products/opensource/libraries/num.htm.//// History:// 02/02/00 Initial Version// 05/13/02 Rescheduled for speed, changed interface to pass//          parameters in fp registers// 02/10/03 Reordered header: .section, .global, .proc, .align;//          used data8 for long double data storage////*********************************************************************//*********************************************************************//// Function:   __libm_pi_by_two_reduce(x) return r, c, and N where//             x = N * pi/4 + (r+c) , where |r+c| <= pi/4.//             This function is not designed to be used by the//             general user.////*********************************************************************//// Accuracy:       Returns double-precision values////*********************************************************************//// Resources Used:////    Floating-Point Registers://      f8  = Input x, return value r//      f9  = return value c//      f32-f70////    General Purpose Registers://      r8  = return value N//      r34-r64////    Predicate Registers:      p6-p14////*********************************************************************//// IEEE Special Conditions:////    No condions should be raised.////*********************************************************************//// I. Introduction// ===============//// For the forward trigonometric functions sin, cos, sincos, and// tan, the original algorithms for IA 64 handle arguments up to// 1 ulp less than 2^63 in magnitude. For double-extended arguments x,// |x| >= 2^63, this routine returns N and r_hi, r_lo where////    x  is accurately approximated by//    2*K*pi  +  N * pi/2  +  r_hi + r_lo,  |r_hi+r_lo| <= pi/4.//    CASE = 1 or 2.//    CASE is 1 unless |r_hi + r_lo| < 2^(-33).//// The exact value of K is not determined, but that information is// not required in trigonometric function computations.//// We first assume the argument x in question satisfies x >= 2^(63).// In particular, it is positive. Negative x can be handled by symmetry:////   -x  is accurately approximated by//         -2*K*pi  +  (-N) * pi/2  -  (r_hi + r_lo),  |r_hi+r_lo| <= pi/4.//// The idea of the reduction is that////       x  *  2/pi   =   N_big  +  N  +  f,      |f| <= 1/2//// Moreover, for double extended x, |f| >= 2^(-75). (This is an// non-obvious fact found by enumeration using a special algorithm// involving continued fraction.) The algorithm described below// calculates N and an accurate approximation of f.//// Roughly speaking, an appropriate 256-bit (4 X 64) portion of// 2/pi is multiplied with x to give the desired information.//// II. Representation of 2/PI// ==========================//// The value of 2/pi in binary fixed-point is////            .101000101111100110......//// We store 2/pi in a table, starting at the position corresponding// to bit position 63////   bit position  63 62 ... 0   -1 -2 -3 -4 -5 -6 -7  ....  -16576////              0  0  ... 0  . 1  0  1  0  1  0  1  ....    X////                              ^//                               |__ implied binary pt//// III. Algorithm// ==============//// This describes the algorithm in the most natural way using// unsigned interger multiplication. The implementation section// describes how the integer arithmetic is simulated.//// STEP 0. Initialization// ----------------------//// Let the input argument x be////     x = 2^m * ( 1. b_1 b_2 b_3 ... b_63 ),  63 <= m <= 16383.//// The first crucial step is to fetch four 64-bit portions of 2/pi.// To fulfill this goal, we calculate the bit position L of the// beginning of these 256-bit quantity by////     L :=  62 - m.//// Note that -16321 <= L <= -1 because 63 <= m <= 16383; and that// the storage of 2/pi is adequate.//// Fetch P_1, P_2, P_3, P_4 beginning at bit position L thus:////      bit position  L  L-1  L-2    ...  L-63////      P_1    =      b   b    b     ...    b//// each b can be 0 or 1. Also, let P_0 be the two bits correspoding to// bit positions L+2 and L+1. So, when each of the P_j is interpreted// with appropriate scaling, we have////      2/pi  =  P_big  + P_0 + (P_1 + P_2 + P_3 + P_4)  +  P_small//// Note that P_big and P_small can be ignored. The reasons are as follow.// First, consider P_big. If P_big = 0, we can certainly ignore it.// Otherwise, P_big >= 2^(L+3). Now,////        P_big * ulp(x) >=  2^(L+3) * 2^(m-63)//                   >=  2^(65-m  +  m-63 )//                   >=  2^2//// Thus, P_big * x is an integer of the form 4*K. So////       x = 4*K * (pi/2) + x*(P_0 + P_1 + P_2 + P_3 + P_4)*(pi/2)//                + x*P_small*(pi/2).//// Hence, P_big*x corresponds to information that can be ignored for// trigonometic function evaluation.//// Next, we must estimate the effect of ignoring P_small. The absolute// error made by ignoring P_small is bounded by////       |P_small * x|  <=  ulp(P_4) * x//                  <=  2^(L-255) * 2^(m+1)//                  <=  2^(62-m-255 + m + 1)//                  <=  2^(-192)//// Since for double-extended precision, x * 2/pi = integer + f,// 0.5 >= |f| >= 2^(-75), the relative error introduced by ignoring// P_small is bounded by 2^(-192+75) <= 2^(-117), which is acceptable.//// Further note that if x is split into x_hi + x_lo where x_lo is the// two bits corresponding to bit positions 2^(m-62) and 2^(m-63); then////       P_0 * x_hi//// is also an integer of the form 4*K; and thus can also be ignored.// Let M := P_0 * x_lo which is a small integer. The main part of the// calculation is really the multiplication of x with the four pieces// P_1, P_2, P_3, and P_4.//// Unless the reduced argument is extremely small in magnitude, it// suffices to carry out the multiplication of x with P_1, P_2, and// P_3. x*P_4 will be carried out and added on as a correction only// when it is found to be needed. Note also that x*P_4 need not be// computed exactly. A straightforward multiplication suffices since// the rounding error thus produced would be bounded by 2^(-3*64),// that is 2^(-192) which is small enough as the reduced argument// is bounded from below by 2^(-75).//// Now that we have four 64-bit data representing 2/pi and a// 64-bit x. We first need to calculate a highly accurate product// of x and P_1, P_2, P_3. This is best understood as integer// multiplication.////// STEP 1. Multiplication// ----------------------//////                     ---------   ---------   ---------//                    |  P_1  |   |  P_2  |   |  P_3  |//                    ---------   ---------   ---------////                                            ---------//             X                              |   X   |//                                            ---------//      ----------------------------------------------------////                                 ---------   ---------//                               |  A_hi |   |  A_lo |//                               ---------   ---------//////                    ---------   ---------//                   |  B_hi |   |  B_lo |//                   ---------   ---------//////        ---------   ---------//       |  C_hi |   |  C_lo |//       ---------   ---------////      ====================================================//       ---------   ---------   ---------   ---------//       |  S_0  |   |  S_1  |   |  S_2  |   |  S_3  |//       ---------   ---------   ---------   ---------//////// STEP 2. Get N and f// -------------------//// Conceptually, after the individual pieces S_0, S_1, ..., are obtained,// we have to sum them and obtain an integer part, N, and a fraction, f.// Here, |f| <= 1/2, and N is an integer. Note also that N need only to// be known to module 2^k, k >= 2. In the case when |f| is small enough,// we would need to add in the value x*P_4.////// STEP 3. Get reduced argument// ----------------------------//// The value f is not yet the reduced argument that we seek. The// equation////       x * 2/pi = 4K  + N  + f//// says that////         x   =  2*K*pi  + N * pi/2  +  f * (pi/2).//// Thus, the reduced argument is given by////       reduced argument =  f * pi/2.//// This multiplication must be performed to extra precision.//// IV. Implementation// ==================//// Step 0. Initialization// ----------------------//// Set sgn_x := sign(x); x := |x|; x_lo := 2 lsb of x.//// In memory, 2/pi is stored contigously as////  0x00000000 0x00000000 0xA2F....//                       ^//                       |__ implied binary bit//// Given x = 2^m * 1.xxxx...xxx; we calculate L := 62 - m. Thus// -1 <= L <= -16321. We fetch from memory 5 integer pieces of data.//// P_0 is the two bits corresponding to bit positions L+2 and L+1// P_1 is the 64-bit starting at bit position  L// P_2 is the 64-bit starting at bit position  L-64// P_3 is the 64-bit starting at bit position  L-128// P_4 is the 64-bit starting at bit position  L-192//// For example, if m = 63, P_0 would be 0 and P_1 would look like// 0xA2F...//// If m = 65, P_0 would be the two msb of 0xA, thus, P_0 is 10 in binary.// P_1 in binary would be  1 0 0 0 1 0 1 1 1 1 ....//// Step 1. Multiplication// ----------------------//// At this point, P_1, P_2, P_3, P_4 are integers. They are// supposed to be interpreted as////  2^(L-63)     * P_1;//  2^(L-63-64)  * P_2;//  2^(L-63-128) * P_3;// 2^(L-63-192) * P_4;//// Since each of them need to be multiplied to x, we would scale// both x and the P_j's by some convenient factors: scale each// of P_j's up by 2^(63-L), and scale x down by 2^(L-63).////   p_1 := fcvt.xf ( P_1 )//   p_2 := fcvt.xf ( P_2 ) * 2^(-64)//   p_3 := fcvt.xf ( P_3 ) * 2^(-128)//   p_4 := fcvt.xf ( P_4 ) * 2^(-192)//   x   := replace exponent of x by -1//          because 2^m    * 1.xxxx...xxx  * 2^(L-63)//          is      2^(-1) * 1.xxxx...xxx//// We are now faced with the task of computing the following////                     ---------   ---------   ---------//                    |  P_1  |   |  P_2  |   |  P_3  |//                    ---------   ---------   ---------////                                             ---------//             X                              |   X   |//                                            ---------//       ----------------------------------------------------////                                 ---------   ---------//                                |  A_hi |   |  A_lo |//                                ---------   ---------////                     ---------   ---------//                    |  B_hi |   |  B_lo |//                    ---------   ---------////         ---------   ---------//        |  C_hi |   |  C_lo |//        ---------   ---------////      ====================================================//       -----------   ---------   ---------   ---------//       |    S_0  |   |  S_1  |   |  S_2  |   |  S_3  |//       -----------   ---------   ---------   ---------//        ^          ^//        |          |___ binary point//        |//        |___ possibly one more bit//// Let FPSR3 be set to round towards zero with widest precision// and exponent range. Unless an explicit FPSR is given,// round-to-nearest with widest precision and exponent range is// used.//// Define sigma_C := 2^63; sigma_B := 2^(-1); sigma_C := 2^(-65).//// Tmp_C := fmpy.fpsr3( x, p_1 );// If Tmp_C >= sigma_C then//    C_hi := Tmp_C;//    C_lo := x*p_1 - C_hi ...fma, exact// Else//    C_hi := fadd.fpsr3(sigma_C, Tmp_C) - sigma_C//                   ...subtraction is exact, regardless//                   ...of rounding direction//    C_lo := x*p_1 - C_hi ...fma, exact// End If//// Tmp_B := fmpy.fpsr3( x, p_2 );// If Tmp_B >= sigma_B then//    B_hi := Tmp_B;//    B_lo := x*p_2 - B_hi ...fma, exact// Else//    B_hi := fadd.fpsr3(sigma_B, Tmp_B) - sigma_B//                   ...subtraction is exact, regardless//                   ...of rounding direction//    B_lo := x*p_2 - B_hi ...fma, exact// End If//// Tmp_A := fmpy.fpsr3( x, p_3 );// If Tmp_A >= sigma_A then//    A_hi := Tmp_A;//    A_lo := x*p_3 - A_hi ...fma, exact// Else//    A_hi := fadd.fpsr3(sigma_A, Tmp_A) - sigma_A//                   ...subtraction is exact, regardless//                   ...of rounding direction//    A_lo := x*p_3 - A_hi ...fma, exact// End If//// ...Note that C_hi is of integer value. We need only the// ...last few bits. Thus we can ensure C_hi is never a big// ...integer, freeing us from overflow worry.//// Tmp_C := fadd.fpsr3( C_hi, 2^(70) ) - 2^(70);// ...Tmp_C is the upper portion of C_hi// C_hi := C_hi - Tmp_C// ...0 <= C_hi < 2^7//// Step 2. Get N and f// -------------------//// At this point, we have all the components to obtain// S_0, S_1, S_2, S_3 and thus N and f. We start by adding// C_lo and B_hi. This sum together with C_hi gives a good// estimation of N and f.//// A := fadd.fpsr3( B_hi, C_lo )// B := max( B_hi, C_lo )// b := min( B_hi, C_lo )//// a := (B - A) + b      ...exact. Note that a is either 0//                   ...or 2^(-64).//// N := round_to_nearest_integer_value( A );// f := A - N;            ...exact because lsb(A) >= 2^(-64)//                   ...and |f| <= 1/2.//// f := f + a            ...exact because a is 0 or 2^(-64);//                   ...the msb of the sum is <= 1/2//                   ...lsb >= 2^(-64).//// N := convert to integer format( C_hi + N );// M := P_0 * x_lo;// N := N + M;//// If sgn_x == 1 (that is original x was negative)// N := 2^10 - N// ...this maintains N to be non-negative, but still// ...equivalent to the (negated N) mod 4.// End If//// If |f| >= 2^(-33)//// ...Case 1// CASE := 1// g := A_hi + B_lo;// s_hi := f + g;// s_lo := (f - s_hi) + g;//// Else//// ...Case 2// CASE := 2// A := fadd.fpsr3( A_hi, B_lo )// B := max( A_hi, B_lo )// b := min( A_hi, B_lo )//// a := (B - A) + b      ...exact. Note that a is either 0//                   ...or 2^(-128).//// f_hi := A + f;// f_lo := (f - f_hi) + A;// ...this is exact.// ...f-f_hi is exact because either |f| >= |A|, in which// ...case f-f_hi is clearly exact; or otherwise, 0<|f|<|A|// ...means msb(f) <= msb(A) = 2^(-64) => |f| = 2^(-64).// ...If f = 2^(-64), f-f_hi involves cancellation and is// ...exact. If f = -2^(-64), then A + f is exact. Hence// ...f-f_hi is -A exactly, giving f_lo = 0.//// f_lo := f_lo + a;//// If |f| >= 2^(-50) then//    s_hi := f_hi;//    s_lo := f_lo;// Else//    f_lo := (f_lo + A_lo) + x*p_4//    s_hi := f_hi + f_lo//    s_lo := (f_hi - s_hi) + f_lo// End If//// End If//// Step 3. Get reduced argument// ----------------------------//// If sgn_x == 0 (that is original x is positive)//// D_hi := Pi_by_2_hi// D_lo := Pi_by_2_lo// ...load from table//// Else//// D_hi := neg_Pi_by_2_hi// D_lo := neg_Pi_by_2_lo// ...load from table// End If//// r_hi :=  s_hi*D_hi// r_lo :=  s_hi*D_hi - r_hi         ...fma// r_lo := (s_hi*D_lo + r_lo) + s_lo*D_hi//// Return  N, r_hi, r_lo//FR_input_X = f8FR_r_hi    = f8FR_r_lo    = f9FR_X       = f32FR_N       = f33FR_p_1     = f34FR_TWOM33  = f35FR_TWOM50  = f36FR_g       = f37FR_p_2     = f38FR_f       = f39FR_s_lo    = f40FR_p_3     = f41FR_f_abs   = f42
12 3 下一頁
?? 文件大小 20450 K
?? 上傳用戶 hanyuangu
?? 所屬分類 Linux/Unix編程
??? 相關標簽

#glibc #庫函數 #函數
?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

?? libm_reduce.s

?? 快捷鍵說明