?? s_cosl.s

?? glibc 庫(kù), 不僅可以學(xué)習(xí)使用庫(kù)函數(shù),還可以學(xué)習(xí)函數(shù)的具體實(shí)現(xiàn),是提高功力的好資料
?? S
?? 第 1 頁(yè) / 共 4 頁(yè)
字號(hào):
12 3 4 下一頁(yè)
.file "sincosl.s"// Copyright (c) 2000 - 2004, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at// http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History:// 02/02/00 (hand-optimized)// 04/04/00 Unwind support added// 07/30/01 Improved speed on all paths// 08/20/01 Fixed bundling typo// 05/13/02 Changed interface to __libm_pi_by_2_reduce// 02/10/03 Reordered header: .section, .global, .proc, .align;//          used data8 for long double table values// 10/13/03 Corrected final .endp name to match .proc// 10/26/04 Avoided using r14-31 as scratch so not clobbered by dynamic loader////*********************************************************************//// Function:   Combined sinl(x) and cosl(x), where////             sinl(x) = sine(x), for double-extended precision x values//             cosl(x) = cosine(x), for double-extended precision x values////*********************************************************************//// Resources Used:////    Floating-Point Registers: f8 (Input and Return Value)//                              f32-f99////    General Purpose Registers://      r32-r58////    Predicate Registers:      p6-p13////*********************************************************************////  IEEE Special Conditions:////    Denormal  fault raised on denormal inputs//    Overflow exceptions do not occur//    Underflow exceptions raised when appropriate for sin//    (No specialized error handling for this routine)//    Inexact raised when appropriate by algorithm////    sinl(SNaN) = QNaN//    sinl(QNaN) = QNaN//    sinl(inf) = QNaN//    sinl(+/-0) = +/-0//    cosl(inf) = QNaN//    cosl(SNaN) = QNaN//    cosl(QNaN) = QNaN//    cosl(0) = 1////*********************************************************************////  Mathematical Description//  ========================////  The computation of FSIN and FCOS is best handled in one piece of//  code. The main reason is that given any argument Arg, computation//  of trigonometric functions first calculate N and an approximation//  to alpha where////  Arg = N pi/2 + alpha, |alpha| <= pi/4.////  Since////  cosl( Arg ) = sinl( (N+1) pi/2 + alpha ),////  therefore, the code for computing sine will produce cosine as long//  as 1 is added to N immediately after the argument reduction//  process.////  Let M = N if sine//      N+1 if cosine.////  Now, given////  Arg = M pi/2  + alpha, |alpha| <= pi/4,////  let I = M mod 4, or I be the two lsb of M when M is represented//  as 2's complement. I = [i_0 i_1]. Then////  sinl( Arg ) = (-1)^i_0  sinl( alpha )        if i_1 = 0,//             = (-1)^i_0  cosl( alpha )     if i_1 = 1.////  For example://       if M = -1, I = 11//         sin ((-pi/2 + alpha) = (-1) cos (alpha)//       if M = 0, I = 00//         sin (alpha) = sin (alpha)//       if M = 1, I = 01//         sin (pi/2 + alpha) = cos (alpha)//       if M = 2, I = 10//         sin (pi + alpha) = (-1) sin (alpha)//       if M = 3, I = 11//         sin ((3/2)pi + alpha) = (-1) cos (alpha)////  The value of alpha is obtained by argument reduction and//  represented by two working precision numbers r and c where////  alpha =  r  +  c     accurately.////  The reduction method is described in a previous write up.//  The argument reduction scheme identifies 4 cases. For Cases 2//  and 4, because |alpha| is small, sinl(r+c) and cosl(r+c) can be//  computed very easily by 2 or 3 terms of the Taylor series//  expansion as follows:////  Case 2://  -------////  sinl(r + c) = r + c - r^3/6        accurately//  cosl(r + c) = 1 - 2^(-67)        accurately////  Case 4://  -------////  sinl(r + c) = r + c - r^3/6 + r^5/120        accurately//  cosl(r + c) = 1 - r^2/2 + r^4/24                accurately////  The only cases left are Cases 1 and 3 of the argument reduction//  procedure. These two cases will be merged since after the//  argument is reduced in either cases, we have the reduced argument//  represented as r + c and that the magnitude |r + c| is not small//  enough to allow the usage of a very short approximation.////  The required calculation is either////  sinl(r + c)  =  sinl(r)  +  correction,  or//  cosl(r + c)  =  cosl(r)  +  correction.////  Specifically,////        sinl(r + c) = sinl(r) + c sin'(r) + O(c^2)//                   = sinl(r) + c cos (r) + O(c^2)//                   = sinl(r) + c(1 - r^2/2)  accurately.//  Similarly,////        cosl(r + c) = cosl(r) - c sinl(r) + O(c^2)//                   = cosl(r) - c(r - r^3/6)  accurately.////  We therefore concentrate on accurately calculating sinl(r) and//  cosl(r) for a working-precision number r, |r| <= pi/4 to within//  0.1% or so.////  The greatest challenge of this task is that the second terms of//  the Taylor series////        r - r^3/3! + r^r/5! - ...////  and////        1 - r^2/2! + r^4/4! - ...////  are not very small when |r| is close to pi/4 and the rounding//  errors will be a concern if simple polynomial accumulation is//  used. When |r| < 2^-3, however, the second terms will be small//  enough (6 bits or so of right shift) that a normal Horner//  recurrence suffices. Hence there are two cases that we consider//  in the accurate computation of sinl(r) and cosl(r), |r| <= pi/4.////  Case small_r: |r| < 2^(-3)//  --------------------------////  Since Arg = M pi/4 + r + c accurately, and M mod 4 is [i_0 i_1],//  we have////        sinl(Arg) = (-1)^i_0 * sinl(r + c)        if i_1 = 0//                 = (-1)^i_0 * cosl(r + c)         if i_1 = 1////  can be accurately approximated by////  sinl(Arg) = (-1)^i_0 * [sinl(r) + c]        if i_1 = 0//           = (-1)^i_0 * [cosl(r) - c*r] if i_1 = 1////  because |r| is small and thus the second terms in the correction//  are unneccessary.////  Finally, sinl(r) and cosl(r) are approximated by polynomials of//  moderate lengths.////  sinl(r) =  r + S_1 r^3 + S_2 r^5 + ... + S_5 r^11//  cosl(r) =  1 + C_1 r^2 + C_2 r^4 + ... + C_5 r^10////  We can make use of predicates to selectively calculate//  sinl(r) or cosl(r) based on i_1.////  Case normal_r: 2^(-3) <= |r| <= pi/4//  ------------------------------------////  This case is more likely than the previous one if one considers//  r to be uniformly distributed in [-pi/4 pi/4]. Again,////  sinl(Arg) = (-1)^i_0 * sinl(r + c)        if i_1 = 0//           = (-1)^i_0 * cosl(r + c)         if i_1 = 1.////  Because |r| is now larger, we need one extra term in the//  correction. sinl(Arg) can be accurately approximated by////  sinl(Arg) = (-1)^i_0 * [sinl(r) + c(1-r^2/2)]      if i_1 = 0//           = (-1)^i_0 * [cosl(r) - c*r*(1 - r^2/6)]    i_1 = 1.////  Finally, sinl(r) and cosl(r) are approximated by polynomials of//  moderate lengths.////        sinl(r) =  r + PP_1_hi r^3 + PP_1_lo r^3 +//                      PP_2 r^5 + ... + PP_8 r^17////        cosl(r) =  1 + QQ_1 r^2 + QQ_2 r^4 + ... + QQ_8 r^16////  where PP_1_hi is only about 16 bits long and QQ_1 is -1/2.//  The crux in accurate computation is to calculate////  r + PP_1_hi r^3   or  1 + QQ_1 r^2////  accurately as two pieces: U_hi and U_lo. The way to achieve this//  is to obtain r_hi as a 10 sig. bit number that approximates r to//  roughly 8 bits or so of accuracy. (One convenient way is////  r_hi := frcpa( frcpa( r ) ).)////  This way,////        r + PP_1_hi r^3 =  r + PP_1_hi r_hi^3 +//                                PP_1_hi (r^3 - r_hi^3)//                        =  [r + PP_1_hi r_hi^3]  +//                           [PP_1_hi (r - r_hi)//                              (r^2 + r_hi r + r_hi^2) ]//                        =  U_hi  +  U_lo////  Since r_hi is only 10 bit long and PP_1_hi is only 16 bit long,//  PP_1_hi * r_hi^3 is only at most 46 bit long and thus computed//  exactly. Furthermore, r and PP_1_hi r_hi^3 are of opposite sign//  and that there is no more than 8 bit shift off between r and//  PP_1_hi * r_hi^3. Hence the sum, U_hi, is representable and thus//  calculated without any error. Finally, the fact that////        |U_lo| <= 2^(-8) |U_hi|////  says that U_hi + U_lo is approximating r + PP_1_hi r^3 to roughly//  8 extra bits of accuracy.////  Similarly,////        1 + QQ_1 r^2  =  [1 + QQ_1 r_hi^2]  +//                            [QQ_1 (r - r_hi)(r + r_hi)]//                      =  U_hi  +  U_lo.////  Summarizing, we calculate r_hi = frcpa( frcpa( r ) ).////  If i_1 = 0, then////    U_hi := r + PP_1_hi * r_hi^3//    U_lo := PP_1_hi * (r - r_hi) * (r^2 + r*r_hi + r_hi^2)//    poly := PP_1_lo r^3 + PP_2 r^5 + ... + PP_8 r^17//    correction := c * ( 1 + C_1 r^2 )////  Else ...i_1 = 1////    U_hi := 1 + QQ_1 * r_hi * r_hi//    U_lo := QQ_1 * (r - r_hi) * (r + r_hi)//    poly := QQ_2 * r^4 + QQ_3 * r^6 + ... + QQ_8 r^16//    correction := -c * r * (1 + S_1 * r^2)////  End////  Finally,////        V := poly + ( U_lo + correction )////                 /    U_hi  +  V         if i_0 = 0//        result := |//                 \  (-U_hi) -  V         if i_0 = 1////  It is important that in the last step, negation of U_hi is//  performed prior to the subtraction which is to be performed in//  the user-set rounding mode.//////  Algorithmic Description//  =======================////  The argument reduction algorithm is tightly integrated into FSIN//  and FCOS which share the same code. The following is complete and//  self-contained. The argument reduction description given//  previously is repeated below.//////  Step 0. Initialization.////   If FSIN is invoked, set N_inc := 0; else if FCOS is invoked,//   set N_inc := 1.////  Step 1. Check for exceptional and special cases.////   * If Arg is +-0, +-inf, NaN, NaT, go to Step 10 for special//     handling.//   * If |Arg| < 2^24, go to Step 2 for reduction of moderate//     arguments. This is the most likely case.//   * If |Arg| < 2^63, go to Step 8 for pre-reduction of large//     arguments.//   * If |Arg| >= 2^63, go to Step 10 for special handling.////  Step 2. Reduction of moderate arguments.////  If |Arg| < pi/4         ...quick branch//     N_fix := N_inc        (integer)//     r     := Arg//     c     := 0.0//     Branch to Step 4, Case_1_complete//  Else                 ...cf. argument reduction//     N     := Arg * two_by_PI        (fp)//     N_fix := fcvt.fx( N )        (int)//     N     := fcvt.xf( N_fix )//     N_fix := N_fix + N_inc//     s     := Arg - N * P_1        (first piece of pi/2)//     w     := -N * P_2        (second piece of pi/2)////     If |s| >= 2^(-33)//        go to Step 3, Case_1_reduce//     Else//        go to Step 7, Case_2_reduce//     Endif//  Endif////  Step 3. Case_1_reduce.////  r := s + w//  c := (s - r) + w        ...observe order////  Step 4. Case_1_complete////  ...At this point, the reduced argument alpha is//  ...accurately represented as r + c.//  If |r| < 2^(-3), go to Step 6, small_r.////  Step 5. Normal_r.////  Let [i_0 i_1] by the 2 lsb of N_fix.//  FR_rsq  := r * r//  r_hi := frcpa( frcpa( r ) )//  r_lo := r - r_hi////  If i_1 = 0, then//    poly := r*FR_rsq*(PP_1_lo + FR_rsq*(PP_2 + ... FR_rsq*PP_8))//    U_hi := r + PP_1_hi*r_hi*r_hi*r_hi        ...any order//    U_lo := PP_1_hi*r_lo*(r*r + r*r_hi + r_hi*r_hi)//    correction := c + c*C_1*FR_rsq                ...any order//  Else//    poly := FR_rsq*FR_rsq*(QQ_2 + FR_rsq*(QQ_3 + ... + FR_rsq*QQ_8))//    U_hi := 1 + QQ_1 * r_hi * r_hi                ...any order//    U_lo := QQ_1 * r_lo * (r + r_hi)//    correction := -c*(r + S_1*FR_rsq*r)        ...any order//  Endif////  V := poly + (U_lo + correction)        ...observe order////  result := (i_0 == 0?   1.0 : -1.0)////  Last instruction in user-set rounding mode////  result := (i_0 == 0?   result*U_hi + V ://                        result*U_hi - V)////  Return////  Step 6. Small_r.////  ...Use flush to zero mode without causing exception//    Let [i_0 i_1] be the two lsb of N_fix.////  FR_rsq := r * r////  If i_1 = 0 then//     z := FR_rsq*FR_rsq; z := FR_rsq*z *r//     poly_lo := S_3 + FR_rsq*(S_4 + FR_rsq*S_5)//     poly_hi := r*FR_rsq*(S_1 + FR_rsq*S_2)//     correction := c//     result := r//  Else//     z := FR_rsq*FR_rsq; z := FR_rsq*z//     poly_lo := C_3 + FR_rsq*(C_4 + FR_rsq*C_5)//     poly_hi := FR_rsq*(C_1 + FR_rsq*C_2)//     correction := -c*r//     result := 1//  Endif////  poly := poly_hi + (z * poly_lo + correction)////  If i_0 = 1, result := -result////  Last operation. Perform in user-set rounding mode////  result := (i_0 == 0?     result + poly ://                          result - poly )//  Return////  Step 7. Case_2_reduce.////  ...Refer to the write up for argument reduction for//  ...rationale. The reduction algorithm below is taken from//  ...argument reduction description and integrated this.////  w := N*P_3//  U_1 := N*P_2 + w                ...FMA//  U_2 := (N*P_2 - U_1) + w        ...2 FMA//  ...U_1 + U_2 is  N*(P_2+P_3) accurately////  r := s - U_1//  c := ( (s - r) - U_1 ) - U_2////  ...The mathematical sum r + c approximates the reduced//  ...argument accurately. Note that although compared to//  ...Case 1, this case requires much more work to reduce//  ...the argument, the subsequent calculation needed for//  ...any of the trigonometric function is very little because//  ...|alpha| < 1.01*2^(-33) and thus two terms of the//  ...Taylor series expansion suffices.////  If i_1 = 0 then//     poly := c + S_1 * r * r * r        ...any order//     result := r//  Else//     poly := -2^(-67)//     result := 1.0//  Endif////  If i_0 = 1, result := -result////  Last operation. Perform in user-set rounding mode////  result := (i_0 == 0?     result + poly ://                           result - poly )////  Return//////  Step 8. Pre-reduction of large arguments.////  ...Again, the following reduction procedure was described//  ...in the separate write up for argument reduction, which//  ...is tightly integrated here.//  N_0 := Arg * Inv_P_0//  N_0_fix := fcvt.fx( N_0 )//  N_0 := fcvt.xf( N_0_fix)//  Arg' := Arg - N_0 * P_0//  w := N_0 * d_1//  N := Arg' * two_by_PI//  N_fix := fcvt.fx( N )//  N := fcvt.xf( N_fix )//  N_fix := N_fix + N_inc////  s := Arg' - N * P_1//  w := w - N * P_2////  If |s| >= 2^(-14)//     go to Step 3//  Else//     go to Step 9//  Endif////  Step 9. Case_4_reduce.////    ...first obtain N_0*d_1 and -N*P_2 accurately//   U_hi := N_0 * d_1                V_hi := -N*P_2//   U_lo := N_0 * d_1 - U_hi        V_lo := -N*P_2 - U_hi        ...FMAs////   ...compute the contribution from N_0*d_1 and -N*P_3//   w := -N*P_3//   w := w + N_0*d_2//   t := U_lo + V_lo + w                ...any order////   ...at this point, the mathematical value//   ...s + U_hi + V_hi  + t approximates the true reduced argument//   ...accurately. Just need to compute this accurately.////   ...Calculate U_hi + V_hi accurately://   A := U_hi + V_hi//   if |U_hi| >= |V_hi| then//      a := (U_hi - A) + V_hi//   else//      a := (V_hi - A) + U_hi//   endif//   ...order in computing "a" must be observed. This branch is//   ...best implemented by predicates.//   ...A + a  is U_hi + V_hi accurately. Moreover, "a" is//   ...much smaller than A: |a| <= (1/2)ulp(A).////   ...Just need to calculate   s + A + a + t//   C_hi := s + A                t := t + a//   C_lo := (s - C_hi) + A//   C_lo := C_lo + t////   ...Final steps for reduction//   r := C_hi + C_lo//   c := (C_hi - r) + C_lo////   ...At this point, we have r and c//   ...And all we need is a couple of terms of the corresponding//   ...Taylor series.////   If i_1 = 0//      poly := c + r*FR_rsq*(S_1 + FR_rsq*S_2)//      result := r//   Else//      poly := FR_rsq*(C_1 + FR_rsq*C_2)//      result := 1//   Endif////   If i_0 = 1, result := -result////   Last operation. Perform in user-set rounding mode////   result := (i_0 == 0?     result + poly ://                            result - poly )//   Return////   Large Arguments: For arguments above 2**63, a Payne-Hanek//   style argument reduction is used and pi_by_2 reduce is called.//RODATA.align 16LOCAL_OBJECT_START(FSINCOSL_CONSTANTS)sincosl_table_p:data8 0xA2F9836E4E44152A, 0x00003FFE // Inv_pi_by_2data8 0xC84D32B0CE81B9F1, 0x00004016 // P_0data8 0xC90FDAA22168C235, 0x00003FFF // P_1data8 0xECE675D1FC8F8CBB, 0x0000BFBD // P_2data8 0xB7ED8FBBACC19C60, 0x0000BF7C // P_3data8 0x8D848E89DBD171A1, 0x0000BFBF // d_1data8 0xD5394C3618A66F8E, 0x0000BF7C // d_2LOCAL_OBJECT_END(FSINCOSL_CONSTANTS)LOCAL_OBJECT_START(sincosl_table_d)data8 0xC90FDAA22168C234, 0x00003FFE // pi_by_4data8 0xA397E5046EC6B45A, 0x00003FE7 // Inv_P_0data4 0x3E000000, 0xBE000000         // 2^-3 and -2^-3data4 0x2F000000, 0xAF000000         // 2^-33 and -2^-33data4 0x9E000000, 0x00000000         // -2^-67data4 0x00000000, 0x00000000         // padLOCAL_OBJECT_END(sincosl_table_d)LOCAL_OBJECT_START(sincosl_table_pp)data8 0xCC8ABEBCA21C0BC9, 0x00003FCE // PP_8data8 0xD7468A05720221DA, 0x0000BFD6 // PP_7data8 0xB092382F640AD517, 0x00003FDE // PP_6data8 0xD7322B47D1EB75A4, 0x0000BFE5 // PP_5data8 0xFFFFFFFFFFFFFFFE, 0x0000BFFD // C_1
12 3 4 下一頁(yè)
?? 文件大小 20450 K
?? 上傳用戶(hù) hanyuangu
?? 所屬分類(lèi) Linux/Unix編程
??? 相關(guān)標(biāo)簽

#glibc #庫(kù)函數(shù) #函數(shù)
?? 快捷鍵說(shuō)明

復(fù)制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號(hào) Ctrl + =
減小字號(hào) Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

?? s_cosl.s

?? 快捷鍵說(shuō)明