?? softfloat.txt
字號(hào):
SoftFloat Release 2b General Documentation
John R. Hauser
2002 May 27
----------------------------------------------------------------------------
Introduction
SoftFloat is a software implementation of floating-point that conforms to
the IEC/IEEE Standard for Binary Floating-Point Arithmetic. As many as four
formats are supported: single precision, double precision, extended double
precision, and quadruple precision. All operations required by the standard
are implemented, except for conversions to and from decimal.
This document gives information about the types defined and the routines
implemented by SoftFloat. It does not attempt to define or explain the
IEC/IEEE Floating-Point Standard. Details about the standard are available
elsewhere.
----------------------------------------------------------------------------
Limitations
SoftFloat is written in C and is designed to work with other C code. The
SoftFloat header files assume an ISO/ANSI-style C compiler. No attempt
has been made to accomodate compilers that are not ISO-conformant. In
particular, the distributed header files will not be acceptable to any
compiler that does not recognize function prototypes.
Support for the extended double-precision and quadruple-precision formats
depends on a C compiler that implements 64-bit integer arithmetic. If the
largest integer format supported by the C compiler is 32 bits, SoftFloat
is limited to only single and double precisions. When that is the case,
all references in this document to extended double precision, quadruple
precision, and 64-bit integers should be ignored.
----------------------------------------------------------------------------
Contents
Introduction
Limitations
Contents
Legal Notice
Types and Functions
Rounding Modes
Extended Double-Precision Rounding Precision
Exceptions and Exception Flags
Function Details
Conversion Functions
Standard Arithmetic Functions
Remainder Functions
Round-to-Integer Functions
Comparison Functions
Signaling NaN Test Functions
Raise-Exception Function
Contact Information
----------------------------------------------------------------------------
Legal Notice
SoftFloat was written by John R. Hauser. This work was made possible in
part by the International Computer Science Institute, located at Suite 600,
1947 Center Street, Berkeley, California 94704. Funding was partially
provided by the National Science Foundation under grant MIP-9311980. The
original version of this code was written as part of a project to build
a fixed-point vector processor in collaboration with the University of
California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort
has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO
PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL
LOSSES, COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO
FURTHERMORE EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER
SCIENCE INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES,
COSTS, OR OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE
SOFTWARE.
----------------------------------------------------------------------------
Types and Functions
When 64-bit integers are supported by the compiler, the `softfloat.h'
header file defines four types: `float32' (single precision), `float64'
(double precision), `floatx80' (extended double precision), and `float128'
(quadruple precision). The `float32' and `float64' types are defined in
terms of 32-bit and 64-bit integer types, respectively, while the `float128'
type is defined as a structure of two 64-bit integers, taking into account
the byte order of the particular machine being used. The `floatx80' type
is defined as a structure containing one 16-bit and one 64-bit integer, with
the machine's byte order again determining the order within the structure.
When 64-bit integers are _not_ supported by the compiler, the `softfloat.h'
header file defines only two types: `float32' and `float64'. Because
ISO/ANSI C guarantees at least one built-in integer type of 32 bits,
the `float32' type is identified with an appropriate integer type. The
`float64' type is defined as a structure of two 32-bit integers, with the
machine's byte order determining the order of the fields.
In either case, the types in `softfloat.h' are defined such that if a system
implements the usual C `float' and `double' types according to the IEC/IEEE
Standard, then the `float32' and `float64' types should be indistinguishable
in memory from the native `float' and `double' types. (On the other hand,
when `float32' or `float64' values are placed in processor registers by
the compiler, the type of registers used may differ from those used for the
native `float' and `double' types.)
SoftFloat implements the following arithmetic operations:
-- Conversions among all the floating-point formats, and also between
integers (32-bit and 64-bit) and any of the floating-point formats.
-- The usual add, subtract, multiply, divide, and square root operations
for all floating-point formats.
-- For each format, the floating-point remainder operation defined by the
IEC/IEEE Standard.
-- For each floating-point format, a ``round to integer'' operation that
rounds to the nearest integer value in the same format. (The floating-
point formats can hold integer values, of course.)
-- Comparisons between two values in the same floating-point format.
The only functions required by the IEC/IEEE Standard that are not provided
are conversions to and from decimal.
----------------------------------------------------------------------------
Rounding Modes
All four rounding modes prescribed by the IEC/IEEE Standard are implemented
for all operations that require rounding. The rounding mode is selected
by the global variable `float_rounding_mode'. This variable may be set
to one of the values `float_round_nearest_even', `float_round_to_zero',
`float_round_down', or `float_round_up'. The rounding mode is initialized
to nearest/even.
----------------------------------------------------------------------------
Extended Double-Precision Rounding Precision
For extended double precision (`floatx80') only, the rounding precision
of the standard arithmetic operations is controlled by the global variable
`floatx80_rounding_precision'. The operations affected are:
floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt
When `floatx80_rounding_precision' is set to its default value of 80, these
operations are rounded (as usual) to the full precision of the extended
double-precision format. Setting `floatx80_rounding_precision' to 32
or to 64 causes the operations listed to be rounded to reduced precision
equivalent to single precision (`float32') or to double precision
(`float64'), respectively. When rounding to reduced precision, additional
bits in the result significand beyond the rounding point are set to zero.
The consequences of setting `floatx80_rounding_precision' to a value other
than 32, 64, or 80 is not specified. Operations other than the ones listed
above are not affected by `floatx80_rounding_precision'.
----------------------------------------------------------------------------
Exceptions and Exception Flags
All five exception flags required by the IEC/IEEE Standard are
implemented. Each flag is stored as a unique bit in the global variable
`float_exception_flags'. The positions of the exception flag bits within
this variable are determined by the bit masks `float_flag_inexact',
`float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and
`float_flag_invalid'. The exception flags variable is initialized to all 0,
meaning no exceptions.
An individual exception flag can be cleared with the statement
float_exception_flags &= ~ float_flag_<exception>;
where `<exception>' is the appropriate name. To raise a floating-point
exception, the SoftFloat function `float_raise' should be used (see below).
In the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess
for underflow either before or after rounding. The choice is made by
the global variable `float_detect_tininess', which can be set to either
`float_tininess_before_rounding' or `float_tininess_after_rounding'.
Detecting tininess after rounding is better because it results in fewer
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -