?? e_pow.s
字號:
.file "pow.s"// Copyright (c) 2000 - 2005, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at// http://www.intel.com/software/products/opensource/libraries/num.htm.//// History//==============================================================// 02/02/00 Initial version// 02/03/00 Added p12 to definite over/under path. With odd power we did not// maintain the sign of x in this path.// 04/04/00 Unwind support added// 04/19/00 pow(+-1,inf) now returns NaN// pow(+-val, +-inf) returns 0 or inf, but now does not call error// support// Added s1 to fcvt.fx because invalid flag was incorrectly set.// 08/15/00 Bundle added after call to __libm_error_support to properly// set [the previously overwritten] GR_Parameter_RESULT.// 09/07/00 Improved performance by eliminating bank conflicts and other stalls,// and tweaking the critical path// 09/08/00 Per c99, pow(+-1,inf) now returns 1, and pow(+1,nan) returns 1// 09/28/00 Updated NaN**0 path// 01/20/01 Fixed denormal flag settings.// 02/13/01 Improved speed.// 03/19/01 Reordered exp polynomial to improve speed and eliminate monotonicity// problem in round up, down, and to zero modes. Also corrected// overflow result when x negative, y odd in round up, down, zero.// 06/14/01 Added brace missing from bundle// 12/10/01 Corrected case where x negative, 2^52 <= |y| < 2^53, y odd integer.// 12/20/01 Fixed monotonity problem in round to nearest.// 02/08/02 Fixed overflow/underflow cases that were not calling error support.// 05/20/02 Cleaned up namespace and sf0 syntax// 08/29/02 Improved Itanium 2 performance// 09/21/02 Added branch for |y*log(x)|<2^-11 to fix monotonicity problems.// 02/10/03 Reordered header: .section, .global, .proc, .align// 03/31/05 Reformatted delimiters between data tables//// API//==============================================================// double pow(double x, double y)//// Overview of operation//==============================================================//// Three steps...// 1. Log(x)// 2. y Log(x)// 3. exp(y log(x))//// This means we work with the absolute value of x and merge in the sign later.// Log(x) = G + delta + r -rsq/2 + p// G,delta depend on the exponent of x and table entries. The table entries are// indexed by the exponent of x, called K.//// The G and delta come out of the reduction; r is the reduced x.//// B = frcpa(x)// xB-1 is small means that B is the approximate inverse of x.//// Log(x) = Log( (1/B)(Bx) )// = Log(1/B) + Log(Bx)// = Log(1/B) + Log( 1 + (Bx-1))//// x = 2^K 1.x_1x_2.....x_52// B= frcpa(x) = 2^-k Cm// Log(1/B) = Log(1/(2^-K Cm))// Log(1/B) = Log((2^K/ Cm))// Log(1/B) = K Log(2) + Log(1/Cm)//// Log(x) = K Log(2) + Log(1/Cm) + Log( 1 + (Bx-1))//// If you take the significand of x, set the exponent to true 0, then Cm is// the frcpa. We tabulate the Log(1/Cm) values. There are 256 of them.// The frcpa table is indexed by 8 bits, the x_1 thru x_8.// m = x_1x_2...x_8 is an 8-bit index.//// Log(1/Cm) = log(1/frcpa(1+m/256)) where m goes from 0 to 255.//// We tabluate as two doubles, T and t, where T +t is the value itself.//// Log(x) = (K Log(2)_hi + T) + (Log(2)_hi + t) + Log( 1 + (Bx-1))// Log(x) = G + delta + Log( 1 + (Bx-1))//// The Log( 1 + (Bx-1)) can be calculated as a series in r = Bx-1.//// Log( 1 + (Bx-1)) = r - rsq/2 + p//// Then,//// yLog(x) = yG + y delta + y(r-rsq/2) + yp// yLog(x) = Z1 + e3 + Z2 + Z3 + (e2 + e3)////// exp(yLog(x)) = exp(Z1 + Z2 + Z3) exp(e1 + e2 + e3)////// exp(Z3) is another series.// exp(e1 + e2 + e3) is approximated as f3 = 1 + (e1 + e2 + e3)//// Z1 (128/log2) = number of log2/128 in Z1 is N1// Z2 (128/log2) = number of log2/128 in Z2 is N2//// s1 = Z1 - N1 log2/128// s2 = Z2 - N2 log2/128//// s = s1 + s2// N = N1 + N2//// exp(Z1 + Z2) = exp(Z)// exp(Z) = exp(s) exp(N log2/128)//// exp(r) = exp(Z - N log2/128)//// r = s + d = (Z - N (log2/128)_hi) -N (log2/128)_lo// = Z - N (log2/128)//// Z = s+d +N (log2/128)//// exp(Z) = exp(s) (1+d) exp(N log2/128)//// N = M 128 + n//// N log2/128 = M log2 + n log2/128//// n is 8 binary digits = n_7n_6...n_1//// n log2/128 = n_7n_6n_5 16 log2/128 + n_4n_3n_2n_1 log2/128// n log2/128 = n_7n_6n_5 log2/8 + n_4n_3n_2n_1 log2/128// n log2/128 = I2 log2/8 + I1 log2/128//// N log2/128 = M log2 + I2 log2/8 + I1 log2/128//// exp(Z) = exp(s) (1+d) exp(log(2^M) + log(2^I2/8) + log(2^I1/128))// exp(Z) = exp(s) (1+d1) (1+d2)(2^M) 2^I2/8 2^I1/128// exp(Z) = exp(s) f1 f2 (2^M) 2^I2/8 2^I1/128//// I1, I2 are table indices. Use a series for exp(s).// Then get exp(Z)//// exp(yLog(x)) = exp(Z1 + Z2 + Z3) exp(e1 + e2 + e3)// exp(yLog(x)) = exp(Z) exp(Z3) f3// exp(yLog(x)) = exp(Z)f3 exp(Z3)// exp(yLog(x)) = A exp(Z3)//// We actually calculate exp(Z3) -1.// Then,// exp(yLog(x)) = A + A( exp(Z3) -1)//// Table Generation//==============================================================// The log values// ==============// The operation (K*log2_hi) must be exact. K is the true exponent of x.// If we allow gradual underflow (denormals), K can be represented in 12 bits// (as a two's complement number). We assume 13 bits as an engineering// precaution.//// +------------+----------------+-+// | 13 bits | 50 bits | |// +------------+----------------+-+// 0 1 66// 2 34//// So we want the lsb(log2_hi) to be 2^-50// We get log2 as a quad-extended (15-bit exponent, 128-bit significand)//// 0 fffe b17217f7d1cf79ab c9e3b39803f2f6af (4...)//// Consider numbering the bits left to right, starting at 0 thru 127.// Bit 0 is the 2^-1 bit; bit 49 is the 2^-50 bit.//// ...79ab// 0111 1001 1010 1011// 44// 89//// So if we shift off the rightmost 14 bits, then (shift back only// the top half) we get//// 0 fffe b17217f7d1cf4000 e6af278ece600fcb dabc000000000000//// Put the right 64-bit signficand in an FR register, convert to double;// it is exact. Put the next 128 bits into a quad register and round to double.// The true exponent of the low part is -51.//// hi is 0 fffe b17217f7d1cf4000// lo is 0 ffcc e6af278ece601000//// Convert to double memory format and get//// hi is 0x3fe62e42fefa39e8// lo is 0x3cccd5e4f1d9cc02//// log2_hi + log2_lo is an accurate value for log2.////// The T and t values// ==================// A similar method is used to generate the T and t values.//// K * log2_hi + T must be exact.//// Smallest T,t// ----------// The smallest T,t is// T t// 0x3f60040155d58800, 0x3c93bce0ce3ddd81 log(1/frcpa(1+0/256))= +1.95503e-003//// The exponent is 0x3f6 (biased) or -9 (true).// For the smallest T value, what we want is to clip the significand such that// when it is shifted right by 9, its lsb is in the bit for 2^-51. The 9 is the// specific for the first entry. In general, it is 0xffff - (biased 15-bit// exponent).// Independently, what we have calculated is the table value as a quad// precision number.// Table entry 1 is// 0 fff6 80200aaeac44ef38 338f77605fdf8000//// We store this quad precision number in a data structure that is// sign: 1// exponent: 15// signficand_hi: 64 (includes explicit bit)// signficand_lo: 49// Because the explicit bit is included, the significand is 113 bits.//// Consider significand_hi for table entry 1.////// +-+--- ... -------+--------------------+// | |// +-+--- ... -------+--------------------+// 0 1 4444444455555555556666// 2345678901234567890123//// Labeled as above, bit 0 is 2^0, bit 1 is 2^-1, etc.// Bit 42 is 2^-42. If we shift to the right by 9, the bit in// bit 42 goes in 51.//// So what we want to do is shift bits 43 thru 63 into significand_lo.// This is shifting bit 42 into bit 63, taking care to retain shifted-off bits.// Then shifting (just with signficaand_hi) back into bit 42.//// The shift_value is 63-42 = 21. In general, this is// 63 - (51 -(0xffff - 0xfff6))// For this example, it is// 63 - (51 - 9) = 63 - 42 = 21//// This means we are shifting 21 bits into significand_lo. We must maintain more// that a 128-bit signficand not to lose bits. So before the shift we put the// 128-bit significand into a 256-bit signficand and then shift.// The 256-bit significand has four parts: hh, hl, lh, and ll.//// Start off with// hh hl lh ll// <64> <49><15_0> <64_0> <64_0>//// After shift by 21 (then return for significand_hi),// <43><21_0> <21><43> <6><58_0> <64_0>//// Take the hh part and convert to a double. There is no rounding here.// The conversion is exact. The true exponent of the high part is the same as// the true exponent of the input quad.//// We have some 64 plus significand bits for the low part. In this example, we// have 70 bits. We want to round this to a double. Put them in a quad and then// do a quad fnorm.// For this example the true exponent of the low part is// true_exponent_of_high - 43 = true_exponent_of_high - (64-21)// In general, this is// true_exponent_of_high - (64 - shift_value)////// Largest T,t// ----------// The largest T,t is// 0x3fe62643fecf9742, 0x3c9e3147684bd37d log(1/frcpa(1+255/256))=+6.92171e-001//// Table entry 256 is// 0 fffe b1321ff67cba178c 51da12f4df5a0000//// The shift value is// 63 - (51 -(0xffff - 0xfffe)) = 13//// The true exponent of the low part is// true_exponent_of_high - (64 - shift_value)// -1 - (64-13) = -52// Biased as a double, this is 0x3cb//////// So then lsb(T) must be >= 2^-51// msb(Klog2_hi) <= 2^12//// +--------+---------+// | 51 bits | <== largest T// +--------+---------+// | 9 bits | 42 bits | <== smallest T// +------------+----------------+-+// | 13 bits | 50 bits | |// +------------+----------------+-+// Special Cases//==============================================================// double float// overflow error 24 30// underflow error 25 31// X zero Y zero// +0 +0 +1 error 26 32// -0 +0 +1 error 26 32// +0 -0 +1 error 26 32// -0 -0 +1 error 26 32// X zero Y negative// +0 -odd integer +inf error 27 33 divide-by-zero// -0 -odd integer -inf error 27 33 divide-by-zero// +0 !-odd integer +inf error 27 33 divide-by-zero
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -