?? s_tanl.s
字號:
.file "tancotl.s"// Copyright (c) 2000 - 2004, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History: //// 02/02/00 (hand-optimized)// 04/04/00 Unwind support added// 12/28/00 Fixed false invalid flags// 02/06/02 Improved speed// 05/07/02 Changed interface to __libm_pi_by_2_reduce// 05/30/02 Added cotl// 02/10/03 Reordered header: .section, .global, .proc, .align;// used data8 for long double table values// 05/15/03 Reformatted data tables// 10/26/04 Avoided using r14-31 as scratch so not clobbered by dynamic loader////*********************************************************************//// Functions: tanl(x) = tangent(x), for double-extended precision x values// cotl(x) = cotangent(x), for double-extended precision x values////*********************************************************************//// Resources Used://// Floating-Point Registers: f8 (Input and Return Value)// f9-f15// f32-f121//// General Purpose Registers:// r32-r70//// Predicate Registers: p6-p15////*********************************************************************//// IEEE Special Conditions for tanl://// Denormal fault raised on denormal inputs// Overflow exceptions do not occur// Underflow exceptions raised when appropriate for tan// (No specialized error handling for this routine)// Inexact raised when appropriate by algorithm//// tanl(SNaN) = QNaN// tanl(QNaN) = QNaN// tanl(inf) = QNaN// tanl(+/-0) = +/-0////*********************************************************************//// IEEE Special Conditions for cotl://// Denormal fault raised on denormal inputs// Overflow exceptions occur at zero and near zero// Underflow exceptions do not occur// Inexact raised when appropriate by algorithm//// cotl(SNaN) = QNaN// cotl(QNaN) = QNaN// cotl(inf) = QNaN// cotl(+/-0) = +/-Inf and error handling is called////*********************************************************************//// Below are mathematical and algorithmic descriptions for tanl.// For cotl we use next identity cot(x) = -tan(x + Pi/2).// So, to compute cot(x) we just need to increment N (N = N + 1)// and invert sign of the computed result.////*********************************************************************//// Mathematical Description//// We consider the computation of FPTANL of Arg. Now, given//// Arg = N pi/2 + alpha, |alpha| <= pi/4,//// basic mathematical relationship shows that//// tan( Arg ) = tan( alpha ) if N is even;// = -cot( alpha ) otherwise.//// The value of alpha is obtained by argument reduction and// represented by two working precision numbers r and c where//// alpha = r + c accurately.//// The reduction method is described in a previous write up.// The argument reduction scheme identifies 4 cases. For Cases 2// and 4, because |alpha| is small, tan(r+c) and -cot(r+c) can be// computed very easily by 2 or 3 terms of the Taylor series// expansion as follows://// Case 2:// -------//// tan(r + c) = r + c + r^3/3 ...accurately// -cot(r + c) = -1/(r+c) + r/3 ...accurately//// Case 4:// -------//// tan(r + c) = r + c + r^3/3 + 2r^5/15 ...accurately// -cot(r + c) = -1/(r+c) + r/3 + r^3/45 ...accurately////// The only cases left are Cases 1 and 3 of the argument reduction// procedure. These two cases will be merged since after the// argument is reduced in either cases, we have the reduced argument// represented as r + c and that the magnitude |r + c| is not small// enough to allow the usage of a very short approximation.//// The greatest challenge of this task is that the second terms of// the Taylor series for tan(r) and -cot(r)//// r + r^3/3 + 2 r^5/15 + ...//// and//// -1/r + r/3 + r^3/45 + ...//// are not very small when |r| is close to pi/4 and the rounding// errors will be a concern if simple polynomial accumulation is// used. When |r| < 2^(-2), however, the second terms will be small// enough (5 bits or so of right shift) that a normal Horner// recurrence suffices. Hence there are two cases that we consider// in the accurate computation of tan(r) and cot(r), |r| <= pi/4.//// Case small_r: |r| < 2^(-2)// --------------------------//// Since Arg = N pi/4 + r + c accurately, we have//// tan(Arg) = tan(r+c) for N even,// = -cot(r+c) otherwise.//// Here for this case, both tan(r) and -cot(r) can be approximated// by simple polynomials://// tan(r) = r + P1_1 r^3 + P1_2 r^5 + ... + P1_9 r^19// -cot(r) = -1/r + Q1_1 r + Q1_2 r^3 + ... + Q1_7 r^13//// accurately. Since |r| is relatively small, tan(r+c) and// -cot(r+c) can be accurately approximated by replacing r with// r+c only in the first two terms of the corresponding polynomials.//// Note that P1_1 (and Q1_1 for that matter) approximates 1/3 to// almost 64 sig. bits, thus//// P1_1 (r+c)^3 = P1_1 r^3 + c * r^2 accurately.//// Hence,//// tan(r+c) = r + P1_1 r^3 + P1_2 r^5 + ... + P1_9 r^19// + c*(1 + r^2)//// -cot(r+c) = -1/(r+c) + Q1_1 r + Q1_2 r^3 + ... + Q1_7 r^13// + Q1_1*c////// Case normal_r: 2^(-2) <= |r| <= pi/4// ------------------------------------//// This case is more likely than the previous one if one considers// r to be uniformly distributed in [-pi/4 pi/4].//// The required calculation is either//// tan(r + c) = tan(r) + correction, or// -cot(r + c) = -cot(r) + correction.//// Specifically,//// tan(r + c) = tan(r) + c tan'(r) + O(c^2)// = tan(r) + c sec^2(r) + O(c^2)// = tan(r) + c SEC_sq ...accurately// as long as SEC_sq approximates sec^2(r)// to, say, 5 bits or so.//// Similarly,//// -cot(r + c) = -cot(r) - c cot'(r) + O(c^2)// = -cot(r) + c csc^2(r) + O(c^2)// = -cot(r) + c CSC_sq ...accurately// as long as CSC_sq approximates csc^2(r)// to, say, 5 bits or so.//// We therefore concentrate on accurately calculating tan(r) and// cot(r) for a working-precision number r, |r| <= pi/4 to within// 0.1% or so.//// We will employ a table-driven approach. Let//// r = sgn_r * 2^k * 1.b_1 b_2 ... b_5 ... b_63// = sgn_r * ( B + x )//// where//// B = 2^k * 1.b_1 b_2 ... b_5 1// x = |r| - B//// Now,// tan(B) + tan(x)// tan( B + x ) = ------------------------// 1 - tan(B)*tan(x)//// / \// | tan(B) + tan(x) |// = tan(B) + | ------------------------ - tan(B) |// | 1 - tan(B)*tan(x) |// \ ///// sec^2(B) * tan(x)// = tan(B) + ------------------------// 1 - tan(B)*tan(x)//// (1/[sin(B)*cos(B)]) * tan(x)// = tan(B) + --------------------------------// cot(B) - tan(x)////// Clearly, the values of tan(B), cot(B) and 1/(sin(B)*cos(B)) are// calculated beforehand and stored in a table. Since//// |x| <= 2^k * 2^(-6) <= 2^(-7) (because k = -1, -2)//// a very short polynomial will be sufficient to approximate tan(x)// accurately. The details involved in computing the last expression// will be given in the next section on algorithm description.////// Now, we turn to the case where cot( B + x ) is needed.////// 1 - tan(B)*tan(x)// cot( B + x ) = ------------------------// tan(B) + tan(x)//// / \// | 1 - tan(B)*tan(x) |// = cot(B) + | ----------------------- - cot(B) |// | tan(B) + tan(x) |// \ ///// [tan(B) + cot(B)] * tan(x)// = cot(B) - ----------------------------// tan(B) + tan(x)//// (1/[sin(B)*cos(B)]) * tan(x)// = cot(B) - --------------------------------// tan(B) + tan(x)////// Note that the values of tan(B), cot(B) and 1/(sin(B)*cos(B)) that// are needed are the same set of values needed in the previous// case.//// Finally, we can put all the ingredients together as follows://// Arg = N * pi/2 + r + c ...accurately//// tan(Arg) = tan(r) + correction if N is even;// = -cot(r) + correction otherwise.//// For Cases 2 and 4,//// Case 2:// tan(Arg) = tan(r + c) = r + c + r^3/3 N even// = -cot(r + c) = -1/(r+c) + r/3 N odd// Case 4:// tan(Arg) = tan(r + c) = r + c + r^3/3 + 2r^5/15 N even// = -cot(r + c) = -1/(r+c) + r/3 + r^3/45 N odd////// For Cases 1 and 3,//// Case small_r: |r| < 2^(-2)//// tan(Arg) = r + P1_1 r^3 + P1_2 r^5 + ... + P1_9 r^19// + c*(1 + r^2) N even//// = -1/(r+c) + Q1_1 r + Q1_2 r^3 + ... + Q1_7 r^13// + Q1_1*c N odd//// Case normal_r: 2^(-2) <= |r| <= pi/4//// tan(Arg) = tan(r) + c * sec^2(r) N even// = -cot(r) + c * csc^2(r) otherwise//// For N even,//// tan(Arg) = tan(r) + c*sec^2(r)// = tan( sgn_r * (B+x) ) + c * sec^2(|r|)// = sgn_r * ( tan(B+x) + sgn_r*c*sec^2(|r|) )// = sgn_r * ( tan(B+x) + sgn_r*c*sec^2(B) )//// since B approximates |r| to 2^(-6) in relative accuracy.//// / (1/[sin(B)*cos(B)]) * tan(x)// tan(Arg) = sgn_r * | tan(B) + --------------------------------// \ cot(B) - tan(x)// \// + CORR |// /// where//// CORR = sgn_r*c*tan(B)*SC_inv(B); SC_inv(B) = 1/(sin(B)*cos(B)).//// For N odd,//// tan(Arg) = -cot(r) + c*csc^2(r)// = -cot( sgn_r * (B+x) ) + c * csc^2(|r|)// = sgn_r * ( -cot(B+x) + sgn_r*c*csc^2(|r|) )// = sgn_r * ( -cot(B+x) + sgn_r*c*csc^2(B) )//// since B approximates |r| to 2^(-6) in relative accuracy.//// / (1/[sin(B)*cos(B)]) * tan(x)// tan(Arg) = sgn_r * | -cot(B) + --------------------------------// \ tan(B) + tan(x)// \// + CORR |// /// where//// CORR = sgn_r*c*cot(B)*SC_inv(B); SC_inv(B) = 1/(sin(B)*cos(B)).////// The actual algorithm prescribes how all the mathematical formulas// are calculated.////// 2. Algorithmic Description// ==========================//// 2.1 Computation for Cases 2 and 4.// ----------------------------------//// For Case 2, we use two-term polynomials.//// For N even,//// rsq := r * r// Poly := c + r * rsq * P1_1// Result := r + Poly ...in user-defined rounding//// For N odd,// S_hi := -frcpa(r) ...8 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...16 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...32 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...64 bits// S_lo := S_hi*( (1 + S_hi*r) + S_hi*c )// ...S_hi + S_lo is -1/(r+c) to extra precision// S_lo := S_lo + Q1_1*r//// Result := S_hi + S_lo ...in user-defined rounding//// For Case 4, we use three-term polynomials//// For N even,//// rsq := r * r// Poly := c + r * rsq * (P1_1 + rsq * P1_2)// Result := r + Poly ...in user-defined rounding//// For N odd,// S_hi := -frcpa(r) ...8 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...16 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...32 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...64 bits// S_lo := S_hi*( (1 + S_hi*r) + S_hi*c )// ...S_hi + S_lo is -1/(r+c) to extra precision// rsq := r * r// P := Q1_1 + rsq*Q1_2// S_lo := S_lo + r*P//// Result := S_hi + S_lo ...in user-defined rounding////// Note that the coefficients P1_1, P1_2, Q1_1, and Q1_2 are// the same as those used in the small_r case of Cases 1 and 3// below.////// 2.2 Computation for Cases 1 and 3.// ----------------------------------// This is further divided into the case of small_r,// where |r| < 2^(-2), and the case of normal_r, where |r| lies between// 2^(-2) and pi/4.//// Algorithm for the case of small_r// ---------------------------------//// For N even,// rsq := r * r// Poly1 := rsq*(P1_1 + rsq*(P1_2 + rsq*P1_3))// r_to_the_8 := rsq * rsq// r_to_the_8 := r_to_the_8 * r_to_the_8// Poly2 := P1_4 + rsq*(P1_5 + rsq*(P1_6 + ... rsq*P1_9))// CORR := c * ( 1 + rsq )// Poly := Poly1 + r_to_the_8*Poly2// Poly := r*Poly + CORR// Result := r + Poly ...in user-defined rounding// ...note that Poly1 and r_to_the_8 can be computed in parallel// ...with Poly2 (Poly1 is intentionally set to be much// ...shorter than Poly2 so that r_to_the_8 and CORR can be hidden)//// For N odd,// S_hi := -frcpa(r) ...8 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...16 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...32 bits// S_hi := S_hi + S_hi*(1 + S_hi*r) ...64 bits// S_lo := S_hi*( (1 + S_hi*r) + S_hi*c )// ...S_hi + S_lo is -1/(r+c) to extra precision// S_lo := S_lo + Q1_1*c//// ...S_hi and S_lo are computed in parallel with// ...the following// rsq := r*r// P := Q1_1 + rsq*(Q1_2 + rsq*(Q1_3 + ... + rsq*Q1_7))//// Poly := r*P + S_lo// Result := S_hi + Poly ...in user-defined rounding////// Algorithm for the case of normal_r// ----------------------------------//// Here, we first consider the computation of tan( r + c ). As// presented in the previous section,//// tan( r + c ) = tan(r) + c * sec^2(r)// = sgn_r * [ tan(B+x) + CORR ]// CORR = sgn_r * c * tan(B) * 1/[sin(B)*cos(B)]//// because sec^2(r) = sec^(|r|), and B approximate |r| to 6.5 bits.//// tan( r + c ) =// / (1/[sin(B)*cos(B)]) * tan(x)// sgn_r * | tan(B) + -------------------------------- +// \ cot(B) - tan(x)// \// CORR |// ///// The values of tan(B), cot(B) and 1/(sin(B)*cos(B)) are// calculated beforehand and stored in a table. Specifically,// the table values are//
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -