?? s_atanl.s
字號:
.file "atanl.s"// Copyright (c) 2000 - 2005, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.//////*********************************************************************//// History// 02/02/00 (hand-optimized)// 04/04/00 Unwind support added// 08/15/00 Bundle added after call to __libm_error_support to properly// set [the previously overwritten] GR_Parameter_RESULT.// 03/13/01 Fixed flags when denormal raised on intermediate result// 01/08/02 Improved speed.// 02/06/02 Corrected .section statement// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align;// used data8 for long double table values// 03/31/05 Reformatted delimiters between data tables////*********************************************************************//// Function: atanl(x) = inverse tangent(x), for double extended x values// Function: atan2l(y,x) = atan(y/x), for double extended y, x values//// API//// long double atanl (long double x)// long double atan2l (long double y, long double x)////*********************************************************************//// Resources Used://// Floating-Point Registers: f8 (Input and Return Value)// f9 (Input for atan2l)// f10-f15, f32-f83//// General Purpose Registers:// r32-r51// r49-r52 (Arguments to error support for 0,0 case)//// Predicate Registers: p6-p15////*********************************************************************//// IEEE Special Conditions://// Denormal fault raised on denormal inputs// Underflow exceptions may occur // Special error handling for the y=0 and x=0 case// Inexact raised when appropriate by algorithm//// atanl(SNaN) = QNaN// atanl(QNaN) = QNaN// atanl(+/-0) = +/- 0// atanl(+/-Inf) = +/-pi/2 //// atan2l(Any NaN for x or y) = QNaN// atan2l(+/-0,x) = +/-0 for x > 0 // atan2l(+/-0,x) = +/-pi for x < 0 // atan2l(+/-0,+0) = +/-0 // atan2l(+/-0,-0) = +/-pi // atan2l(y,+/-0) = pi/2 y > 0// atan2l(y,+/-0) = -pi/2 y < 0// atan2l(+/-y, Inf) = +/-0 for finite y > 0// atan2l(+/-Inf, x) = +/-pi/2 for finite x // atan2l(+/-y, -Inf) = +/-pi for finite y > 0 // atan2l(+/-Inf, Inf) = +/-pi/4// atan2l(+/-Inf, -Inf) = +/-3pi/4////*********************************************************************//// Mathematical Description// ---------------------------//// The function ATANL( Arg_Y, Arg_X ) returns the "argument"// or the "phase" of the complex number//// Arg_X + i Arg_Y//// or equivalently, the angle in radians from the positive// x-axis to the line joining the origin and the point// (Arg_X,Arg_Y)////// (Arg_X, Arg_Y) x// \// \// \// \// \ angle between is ATANL(Arg_Y,Arg_X)// \// ------------------> X-axis// Origin//// Moreover, this angle is reported in the range [-pi,pi] thus//// -pi <= ATANL( Arg_Y, Arg_X ) <= pi.//// From the geometry, it is easy to define ATANL when one of// Arg_X or Arg_Y is +-0 or +-inf:////// \ Y |// X \ | +0 | -0 | +inf | -inf | finite non-zero// \ | | | | |// ______________________________________________________// | | | |// +-0 | Invalid/ | pi/2 | -pi/2 | sign(Y)*pi/2// | qNaN | | |// --------------------------------------------------------// | | | | |// +inf | +0 | -0 | pi/4 | -pi/4 | sign(Y)*0// --------------------------------------------------------// | | | | |// -inf | +pi | -pi | 3pi/4 | -3pi/4 | sign(Y)*pi// --------------------------------------------------------// finite | X>0? | pi/2 | -pi/2 | normal case// non-zero| sign(Y)*0: | | |// | sign(Y)*pi | | |////// One must take note that ATANL is NOT the arctangent of the// value Arg_Y/Arg_X; but rather ATANL and arctan are related// in a slightly more complicated way as follows://// Let U := max(|Arg_X|, |Arg_Y|); V := min(|Arg_X|, |Arg_Y|);// sign_X be the sign bit of Arg_X, i.e., sign_X is 0 or 1;// s_X be the sign of Arg_X, i.e., s_X = (-1)^sign_X;//// sign_Y be the sign bit of Arg_Y, i.e., sign_Y is 0 or 1;// s_Y be the sign of Arg_Y, i.e., s_Y = (-1)^sign_Y;//// swap be 0 if |Arg_X| >= |Arg_Y| and 1 otherwise.//// Then, ATANL(Arg_Y, Arg_X) =//// / arctan(V/U) \ sign_X = 0 & swap = 0// | pi/2 - arctan(V/U) | sign_X = 0 & swap = 1// s_Y * | |// | pi - arctan(V/U) | sign_X = 1 & swap = 0// \ pi/2 + arctan(V/U) / sign_X = 1 & swap = 1////// This relationship also suggest that the algorithm's major// task is to calculate arctan(V/U) for 0 < V <= U; and the// final Result is given by//// s_Y * { (P_hi + P_lo) + sigma * arctan(V/U) }//// where//// (P_hi,P_lo) represents M(sign_X,swap)*(pi/2) accurately//// M(sign_X,swap) = 0 for sign_X = 0 and swap = 0// 1 for swap = 1// 2 for sign_X = 1 and swap = 0//// and//// sigma = { (sign_X XOR swap) : -1.0 : 1.0 }//// = (-1) ^ ( sign_X XOR swap )//// Both (P_hi,P_lo) and sigma can be stored in a table and fetched// using (sign_X,swap) as an index. (P_hi, P_lo) can be stored as a// double-precision, and single-precision pair; and sigma can// obviously be just a single-precision number.//// In the algorithm we propose, arctan(V/U) is calculated to high accuracy// as A_hi + A_lo. Consequently, the Result ATANL( Arg_Y, Arg_X ) is// given by//// s_Y*P_hi + s_Y*sigma*A_hi + s_Y*(sigma*A_lo + P_lo)//// We now discuss the calculation of arctan(V/U) for 0 < V <= U.//// For (V/U) < 2^(-3), we use a simple polynomial of the form//// z + z^3*(P_1 + z^2*(P_2 + z^2*(P_3 + ... + P_8)))//// where z = V/U.//// For the sake of accuracy, the first term "z" must approximate V/U to// extra precision. For z^3 and higher power, a working precision// approximation to V/U suffices. Thus, we obtain://// z_hi + z_lo = V/U to extra precision and// z = V/U to working precision//// The value arctan(V/U) is delivered as two pieces (A_hi, A_lo)//// (A_hi,A_lo) = (z_hi, z^3*(P_1 + ... + P_8) + z_lo).////// For 2^(-3) <= (V/U) <= 1, we use a table-driven approach.// Consider//// (V/U) = 2^k * 1.b_1 b_2 .... b_63 b_64 b_65 ....//// Define//// z_hi = 2^k * 1.b_1 b_2 b_3 b_4 1//// then// / \// | (V/U) - z_hi |// arctan(V/U) = arctan(z_hi) + acrtan| -------------- |// | 1 + (V/U)*z_hi |// \ ///// / \// | V - z_hi*U |// = arctan(z_hi) + acrtan| -------------- |// | U + V*z_hi |// \ ///// = arctan(z_hi) + acrtan( V' / U' )////// where//// V' = V - U*z_hi; U' = U + V*z_hi.//// Let//// w_hi + w_lo = V'/U' to extra precision and// w = V'/U' to working precision//// then we can approximate arctan(V'/U') by//// arctan(V'/U') = w_hi + w_lo// + w^3*(Q_1 + w^2*(Q_2 + w^2*(Q_3 + w^2*Q_4)))//// = w_hi + w_lo + poly//// Finally, arctan(z_hi) is calculated beforehand and stored in a table// as Tbl_hi, Tbl_lo. Thus,//// (A_hi, A_lo) = (Tbl_hi, w_hi+(poly+(w_lo+Tbl_lo)))//// This completes the mathematical description.////// Algorithm// -------------//// Step 0. Check for unsupported format.//// If// ( expo(Arg_X) not zero AND msb(Arg_X) = 0 ) OR// ( expo(Arg_Y) not zero AND msb(Arg_Y) = 0 )//// then one of the arguments is unsupported. Generate an// invalid and return qNaN.//// Step 1. Initialize//// Normalize Arg_X and Arg_Y and set the following//// sign_X := sign_bit(Arg_X)// s_Y := (sign_bit(Arg_Y)==0? 1.0 : -1.0)// swap := (|Arg_X| >= |Arg_Y|? 0 : 1 )// U := max( |Arg_X|, |Arg_Y| )// V := min( |Arg_X|, |Arg_Y| )//// execute: frcpa E, pred, V, U// If pred is 0, go to Step 5 for special cases handling.//// Step 2. Decide on branch.//// Q := E * V// If Q < 2^(-3) go to Step 4 for simple polynomial case.//// Step 3. Table-driven algorithm.//// Q is represented as//// 2^(-k) * 1.b_1 b_2 b_3 ... b_63; k = 0,-1,-2,-3//// and that if k = 0, b_1 = b_2 = b_3 = b_4 = 0.//// Define//// z_hi := 2^(-k) * 1.b_1 b_2 b_3 b_4 1//// (note that there are 49 possible values of z_hi).//// ...We now calculate V' and U'. While V' is representable// ...as a 64-bit number because of cancellation, U' is// ...not in general a 64-bit number. Obtaining U' accurately// ...requires two working precision numbers//// U_prime_hi := U + V * z_hi ...WP approx. to U'// U_prime_lo := ( U - U_prime_hi ) + V*z_hi ...observe order// V_prime := V - U * z_hi ...this is exact//// C_hi := frcpa (1.0, U_prime_hi) ...C_hi approx 1/U'_hi//// loop 3 times// C_hi := C_hi + C_hi*(1.0 - C_hi*U_prime_hi)//// ...at this point C_hi is (1/U_prime_hi) to roughly 64 bits//// w_hi := V_prime * C_hi ...w_hi is V_prime/U_prime to// ...roughly working precision//// ...note that we want w_hi + w_lo to approximate// ...V_prime/(U_prime_hi + U_prime_lo) to extra precision// ...but for now, w_hi is good enough for the polynomial// ...calculation.//// wsq := w_hi*w_hi// poly := w_hi*wsq*(Q_1 + wsq*(Q_2 + wsq*(Q_3 + wsq*Q_4)))//// Fetch// (Tbl_hi, Tbl_lo) = atan(z_hi) indexed by (k,b_1,b_2,b_3,b_4)// ...Tbl_hi is a double-precision number// ...Tbl_lo is a single-precision number//// (P_hi, P_lo) := M(sign_X,swap)*(Pi_by_2_hi, Pi_by_2_lo)// ...as discussed previous. Again; the implementation can// ...chose to fetch P_hi and P_lo from a table indexed by// ...(sign_X, swap).// ...P_hi is a double-precision number;// ...P_lo is a single-precision number.//// ...calculate w_lo so that w_hi + w_lo is V'/U' accurately// w_lo := ((V_prime - w_hi*U_prime_hi) -// w_hi*U_prime_lo) * C_hi ...observe order////// ...Ready to deliver arctan(V'/U') as A_hi, A_lo// A_hi := Tbl_hi// A_lo := w_hi + (poly + (Tbl_lo + w_lo)) ...observe order//// ...Deliver final Result// ...s_Y*P_hi + s_Y*sigma*A_hi + s_Y*(sigma*A_lo + P_lo)//// sigma := ( (sign_X XOR swap) ? -1.0 : 1.0 )// ...sigma can be obtained by a table lookup using// ...(sign_X,swap) as index and stored as single precision// ...sigma should be calculated earlier//// P_hi := s_Y*P_hi// A_hi := s_Y*A_hi//// Res_hi := P_hi + sigma*A_hi ...this is exact because// ...both P_hi and Tbl_hi// ...are double-precision// ...and |Tbl_hi| > 2^(-4)// ...P_hi is either 0 or// ...between (1,4)//// Res_lo := sigma*A_lo + P_lo//// Return Res_hi + s_Y*Res_lo in user-defined rounding control//// Step 4. Simple polynomial case.//// ...E and Q are inherited from Step 2.//// A_hi := Q ...Q is inherited from Step 2 Q approx V/U//// loop 3 times// E := E + E2(1.0 - E*U1// ...at this point E approximates 1/U to roughly working precision//// z := V * E ...z approximates V/U to roughly working precision// zsq := z * z// z4 := zsq * zsq; z8 := z4 * z4//// poly1 := P_4 + zsq*(P_5 + zsq*(P_6 + zsq*(P_7 + zsq*P_8)))// poly2 := zsq*(P_1 + zsq*(P_2 + zsq*P_3))//// poly := poly1 + z8*poly2//// z_lo := (V - A_hi*U)*E//// A_lo := z*poly + z_lo// ...A_hi, A_lo approximate arctan(V/U) accurately//// (P_hi, P_lo) := M(sign_X,swap)*(Pi_by_2_hi, Pi_by_2_lo)// ...one can store the M(sign_X,swap) as single precision// ...values//// ...Deliver final Result// ...s_Y*P_hi + s_Y*sigma*A_hi + s_Y*(sigma*A_lo + P_lo)//// sigma := ( (sign_X XOR swap) ? -1.0 : 1.0 )// ...sigma can be obtained by a table lookup using// ...(sign_X,swap) as index and stored as single precision// ...sigma should be calculated earlier//// P_hi := s_Y*P_hi// A_hi := s_Y*A_hi//// Res_hi := P_hi + sigma*A_hi ...need to compute// ...P_hi + sigma*A_hi// ...exactly//// tmp := (P_hi - Res_hi) + sigma*A_hi//// Res_lo := s_Y*(sigma*A_lo + P_lo) + tmp//// Return Res_hi + Res_lo in user-defined rounding control//// Step 5. Special Cases//// These are detected early in the function by fclass instructions.//// We are in one of those special cases when X or Y is 0,+-inf or NaN//// If one of X and Y is NaN, return X+Y (which will generate// invalid in case one is a signaling NaN). Otherwise,// return the Result as described in the table//////// \ Y |// X \ | +0 | -0 | +inf | -inf | finite non-zero// \ | | | | |// ______________________________________________________// | | | |// +-0 | Invalid/ | pi/2 | -pi/2 | sign(Y)*pi/2// | qNaN | | |// --------------------------------------------------------// | | | | |// +inf | +0 | -0 | pi/4 | -pi/4 | sign(Y)*0// --------------------------------------------------------// | | | | |// -inf | +pi | -pi | 3pi/4 | -3pi/4 | sign(Y)*pi// --------------------------------------------------------// finite | X>0? | pi/2 | -pi/2 |// non-zero| sign(Y)*0: | | | N/A// | sign(Y)*pi | | |////ArgY_orig = f8Result = f8FR_RESULT = f8ArgX_orig = f9ArgX = f10FR_X = f10ArgY = f11FR_Y = f11s_Y = f12U = f13V = f14E = f15Q = f32z_hi = f33U_prime_hi = f34U_prime_lo = f35V_prime = f36C_hi = f37w_hi = f38w_lo = f39
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -