Deterministic cross-platform floating point arithmetics

The scope of this document is to provide information on the implementation of floating point arithmetics in the C language on different platforms and details on different compilers. It will explain why it is problematic to simply use floating point math without considering the implications. It will also try to provide a set of preprocessor macros that allow the consistent usage of floating point arithmetics across as many platforms and compilers as possible.

This document is far from complete. Please be encouraged to contact me if you have any further information on this topic whatsoever. I'm especially interested in floating point arithmetics of non-x86 platforms, for example sparc, ia64, arm or alpha. I'd also be interested in someone with knowledge of Windows64 or at least the possibility to test some things with different compilers there. (especially Visual Studio 9, MinGW-gcc, Cygwin-gcc)

This document currently only deals with floating point precision. Two other topics will be discussed in future versions of this documents: floting point exceptions and internal rounding control.

Please note that some understanding of the assembly language and the internal doings of microprocessors is extremely helpful for understanding this document completely. Nevertheless the conclusion should not be problematic.

General overview of floating point number types

For a general explanation on floating point numbers, please refer to Wikipedia article on IEEE 754. The following is only a small overview.

There are four main types of floating point numbers that are widely implemented today:

Single precision: Stored in 32 bits: 1 bit for the sign, 8 bits for the exponent and 23 bits for the fraction.
Double precision: Stored in 64 bits: 1 bit for the sign, 11 bits for the exponent and 52 bits for the fraction.
Double extended precision: This is not exactly defined by IEEE 754. But the most common version is the x87 implementation which has 1 bit for the sign, 15 bits for the exponent, 63 bits for the fraction and 1 bit for the integer part before the fraction (which is nearly always 1, except for denormals).
Quad precision: This is defined in the revised IEEE 754 standard (IEEE 754-2008), stored in 128 bits, uses 1 bit for the sign, 15 bits for the exponent and 112 bits for the fraction. Several non-x86 (integer-64bit) processors support this data type.

Problems with internal precision

Some processors (most notably the x87 fpu) have a so-called internal precision, i.e. instead of having different registers and instructions for different floating point types (as is the case with integers) they have a single internal floating point register type (in the x87 case double extended precision) and a single operation type for doing math.

Many floating point experts state the case that using a higher internal precision for floating point calculations is better in order to ensure algorithms remain stable. While I basically agree with this argument, the problem with x87 FPUs is that they hide the usage of this internal precision from the the programmer. This causes two different kinds of headaches:

First of all, simple C code is highly platform and compiler-dependent. With integers, this is not so much the case, since the C standard gaurantees an integer to have at least a certain size and when performing calculations within that size, the code remains completely portable and produces the same results on every platform (i.e. is deterministic). With floating point calculations, this is not the case anymore. A change in internal precision can result in different results for different platforms, thus the code is not deterministic anymore.
Also, one would expect that the floating point number 0.002877 and the expression 2877.0 / 1000000.0 would be represented by the same floating point number (since both 2877 and 1000000 are exactly representable). If the conversion and calculation are done within the same precision, this is actually the case. However, if an internal precision is used, it may not be. Say, for example, that 0.002877 is converted to the closest double representation of that number. Then calculate the expression 2877.0 / 1000000.0 in double-extended precision and after the calculation truncate it to double precision. This will actually yield a slightly different number.

Note that the second problem apparently does not occur for single-precision floating point numbers since the number of fraction bits is so small that there appears to be no short decimal number that maps to a floating point number with these characteristics. Thus for single precision, calculating in a higher precision and then truncating appears to have the same effect as directly calculating in single precision.

Hardware implementations

This section will take a look at different hardware implementations and analyze them.

x86 platforms

x86 platforms support two types of floating point instruction sets nowadays: The classical x87 floating point unit and the new MMX/SSE instructions that can even perform parallel calculations.

Traditional x87 FPU

The characteristics of the x87 FPU are:

Different instructions for loading and storing different types of floating point numbers from and to memory.
A single instruction set for performing calculations.
An internal precision with which all calculations are done.
Instructions to change the internal precision
Support for single, double and double-extended precision.

MMX/SSE instructions

The characteristics of the MMX/SSE instructions are:

Different instructions for loading and storing different types of floating point numbers from and to memory.
Different instructions for performing caluclations depending on the data types in the registers.
Calculation precision is dependent on operhand types.
Supports only single and double precision
Support for parallel computation.

PowerPC

According to the Documentation (see: Book I: PowerPC User Instruction Set Architecture) and some tests done by Scott MacVicar, the following holds for PPC:

Different instructions for loading and storing different types of floating point numbers from and to memory.
A single instruction set for performing calculations.
All calculations are done in double precision.
Supports single and double precision, but not double-extended precision
At least the Power4 processor (integer-64bit PPC) also supports quad precision

The PPC processor also seems to support a non-IEEE-mode for the FPU. The documentation does not clearly state how this mode is defined and for portability reasons it clearly should be avoided.

SPARC

Johannes Schlüter did some tests on UltraSPARC IIIi platform which yielded the following results:

Different instructions for loading and storing different types of floating point numbers from and to memory.
Different instructions for performing caluclations depending on the data types in the registers.
Calculation precision is dependent on operhand types.
Supports single, double and quad precision

Other

I'd love to have some details on how other platforms implement floating point operations. The following is a non-exhaustive list of platforms I don't have any information on:

Linux kernel x87 emulation for systems without x87
IA64 (Not equal to EMT64/AMD64!)
Alpha
...

Some notes:

I could get somebody to test some things on Linux on a Nokia N810 (ARM) but that does not mean other ARM devices react in the same way.

Software implementation

This section focusses on the implementation of floating point operations in compilers and system libraries, i.e. how C code is transformed into assembly instructions and how they interact with the internal processor state.

Calling conventions

Calling conventions define how function parameters and return values are passed. This is platform, operating-system and compiler dependent. Please note that quite a few systems pass floating point parameters and return values directly in the FPU registers and not im memory. This can have interesting side-effects, take for example a method that returns a float (single-precision) but instead of simply storing the return value the function is directly used within another function. With certain calling conventions the FP registers may directly be passed through to the second function - without ever storing the value in memory and thus not truncating it from internal precision. On the other hand, different systems may have calling conventions that use the stack for parameters and thus the value is stored in memory and is truncated. This also causes non-deterministic behaviour in the sense that different systems react differently to the same piece of code. I will discuss this issue separately for each compiler if applicable.

Microsoft Windows

I did extensive tests with cl.exe from Microsoft Visual C++ 2005 Express Edition under Windows 2000 and with MinGW (gcc 3.4.5 mingw-special). I could only test on 32bit Windows, I'd be happy to hear about 64bit Windows.

Microsoft Visual C++ Compiler

Most notably, the Visual C++ Compiler does not support double-extended precision, it only supports single precision and double precision. ([1], [2]) It does, however, know a long double type which is treated the same way as double for arithmetic reasons but is a different type with regard to type coerxion. It also switches the processor's internal precision to double precision, so all calculations are executed in double precision.

TODO: It would be nice to know when this exactly happens. Is it the kernel or some part of the runtime initialization?

The Microsoft Visual C++ Runtime also supports the functions _controlfp _controlfp_s to switch x87 FP mode. They are documented MSDN: _controlfp, _controlfp_s. The latter is available since Visual C++ 2005. In order to set the current precision, use:

 // Both are defined here
 #include <float.h>
 
 // old version
 unsigned int fpu_oldcw;
 fpu_oldcw = _controlfp(0, 0); // store old vlaue
 _controlfp(precision, _MCW_PC);
 // calculation here
 _controlfp(fpu_oldcw, _MCW_PC);
 
 // new version
 unsigned int fpu_oldcw, fpu_cw;
 _controlfp_s(&fpu_cw, 0, 0);
 fpu_oldcw = fpu_cw;
 _controlfp_s(&fpu_cw, precision, _MCW_PC);
 // calculation here
 _controlfp_s(&fpu_cw, fpu_oldcw, _MCW_PC);

Since the new version only does additional range checking on the values, it is ok to use the old version as long as the supplied values are valid.

The precision may be one of:

_PC_24: Single precision
_PC_53: Double precision
_PC_64: Double-extended precision

Please note that the double-extended precision setting is for internal precision only. Values will always be truncated to double precision with Visual C++.

In order to ensure that the optimizer does not make wrong assumptions, please add #pragma fenv_access(on) to the source code. This will tell the compiler that the floating point mode may be changed by the source.

Calling conventions: Windows does use FPU registers for return values but it uses a small trick to ensure that values react deterministaclly. For single precision values the compiler temporarily stores the value in a variable and loads it again - the value is then truncated to single precision. (double precision values don't matter since the compiler assumes the processor is always in double precision) Consider the following source code:

float divf (float a, float b) {
  return a / b;
}

This is transformed to:

_divf   PROC                        ; COMDAT
; Line 36
    fld DWORD PTR _a$[esp-4]
    fdiv    DWORD PTR _b$[esp-4]
    fstp    DWORD PTR tv130[esp-4]
    fld DWORD PTR tv130[esp-4]
; Line 37
    ret 0
_divf   ENDP

Note: Is it possible to have cl.exe generate SSE code?

Conclusion:

If you want to use single precision for calculations, just use the float datatype for all operhands. Since single-precision values will always be truncated before use, the calculations will always appear to have been in single precision. Also, it couldn't hurt to use _controlfp just in case.
If you want to use double precision for calculations, just use the double datatype for all operhands. Since the internal precision set by the compiler is double precision, it will be used automatically. However, it cannot hurt to use _controlfp to force the precision just in case.
If you want to use double-extended precision for calculations, it gets complicated. You can of course use _controlfp to force the internal precision to double-extended. But be aware of the fact that whenever a value is stored back to memory (which may happen arbitrarily based on the decision of the optimizer), it will be truncated to double precision. Using double-extended precision with Microsoft Visual C++ is not feasible.

MinGW: gcc 3.4.5

This compiler supports double-extended precision (datatype long double). It sets the processor (or leaves it) in double-extended internal precision.

MinGW offers access to _controlfp but not _controlfp_s. The semantics are the same as with Visual C++, only that MinGW can actually use double-extended precision.

Calling conventions: Please note that MinGW never truncates to either double or float before returning a value. You can, however, force it to do that by either using the compiler option -ffloat-store or stroring the result in an explicit variable that is declared volatile and then returning that variable. This will result in the desired code.

Conclusion:

If you want to use single precision for caluclations, use the float datatype for all operhands and use the compiler option -ffloat-store. If it is not feasible to specify a compiler option, use the following code:
```
{ volatile float result = /* actual result */; return result; }
```
instead of
```
return /* actual result */;
```
Also, it couldn't hurt to use _controlfp just in case.
If you want to use double precision for calculations, use the double datatype and use _controlfp to set and restore the precision at the beginning of the method (-ffloat-store will not suffice!)
If you want to use double-extended precision for calculations, simply use the long double datatype. If you haven't touched the internal precision, you don't need to explicitely set it but it wouldn't hurt, just in case.

Other compilers

Not tested, feedback welcome.

Linux x86

GCC 4.1.2

As does the GCC that comes with MinGW, this compiler supports double-extended precision and also uses that for the internal processor precision.

glibc provides a header file fpu_control.h which defines several useful macros to set the precision of the FPU. These macros evaluate to inline assembler code that directly sets the FPU state. Example usage:

 #include <fpu_control.h>
 
 fpu_control_t fpu_oldcw, fpu_cw;
 _FPU_GETCW(fpu_oldcw); // store old cw
 fpu_cw = (fpu_oldcw & ~_FPU_EXTENDED & ~_FPU_DOUBLE & ~_FPU_SINGLE) | precision;
 _FPU_SETCW(fpu_cw);
 // calculations here
 _FPU_SETCW(fpu_oldcw); // restore old cw

Precision my be one of:

_FPU_SINGLE: Single precision
_FPU_DOUBLE: Double precision
_FPU_EXTENDED: Double-extended precision

Calling conventions: In order for the GCC to truncate variables to the correct precision before returning them, use the -ffloat-store compiler option or declare an explicit return value variable volatile (as with MinGW). Please note that this alone will not solve the problem described in this article since the above mentioned example 0.002877 will result in different doubles even after truncation.

Caution: Since the FPU control macros only expand to inline assembler, the optimizer will sometimes reorder the actual calculation after the reset of the FPU control word. Example:

double divcw (double a, double b) {
  double res;
  fpu_control_t _oldcw, _cw;
  _FPU_GETCW(_oldcw);
  _cw = (_oldcw & ~_FPU_EXTENDED & ~_FPU_SINGLE) | _FPU_DOUBLE;
  _FPU_SETCW(_cw);
  res = a / b;
  _FPU_SETCW(_oldcw);
  return res;
}

Without optimization, this will be expanded to:

divcw:
    # stuff left out for brevity
#APP
    fldcw -12(%ebp)
#NO_APP
    fldl    -24(%ebp)
    fdivl   -32(%ebp)
    fstpl   -8(%ebp)
#APP
    fldcw -10(%ebp)
#NO_APP
    fldl    -8(%ebp)
    leave
    ret
    .size   divcw, .-divcw

With optimization (-O2, but -O1 has the same problem) however, this will result in:

divcw:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $16, %esp
    fldl    8(%ebp)
#APP
    fnstcw -2(%ebp)
#NO_APP
    movzwl  -2(%ebp), %eax
    andl    $-769, %eax
    orl $512, %eax
    movw    %ax, -4(%ebp)
#APP
    fldcw -4(%ebp)
    fldcw -2(%ebp)
#NO_APP
    fdivl   16(%ebp)
    leave
    ret
    .size   divcw, .-divcw

As one can see, the optimizer reordered the instructions. I consider this a bug in GCC. GCC 3.3.6 and 3.4.6 were not affected according to my tests (but will only use -ffloat-store for explicit variables). I did not yet test it with newer versions.

This reordering can be avoided by two means: Either use the -ffloat-store compiler option or declare the return varaible (here: result) volatile.

Using MMX/SSE: GCC also supports a compiler option -mfpmath=sse. In order for it to take effect, at least the additional options -msse and -march=XXX (where XXX is e.g. pentium4 or any other arch value that supports SSE) must be supplied. In that case, for single and double precision floating point calculations the MMX/SSE extensions will be used instead of the FPU and thus eliminating the need for changing the FPU precision because the MMX/SSE extensions always operate with the same precision as the operhands.

Conclusion:

If you want to use single precision for caluclations, use the float datatype for all operhands and use the compiler option -ffloat-store. If it is not feasible to specify a compiler option, use the following code:
```
{ volatile float result = /* actual result */; return result; }
```
instead of
```
return /* actual result */;
```
Also, it couldn't hurt to use _FPU_SETCW just in case.
If you want to use double precision for calculations, use the double datatype and use _FPU_SETCW to set and restore the precision at the beginning of the method (-ffloat-store will not suffice!). Please note that in addition to _FPU_SETCW, -ffloat-store or the volatile trick (just with double instead of float) must be used, else the compiler will use the wrong optimization.
If you want to use double-extended precision for calculations, simply use the long double datatype. If you haven't touched the internal precision, you don't need to explicitely set it but it wouldn't hurt, just in case.

Other compilers

Not tested, feedback welcome.

FreeBSD x86

GCC 3.4.6

With FreeBSD the x87 fpu is in double precision mode by default. It does, however, support the double-extended data type (long double). Calculations with that data type will however only be executed within double precision by default.

FreeBSD also supports several macros to change the processor's float mode:

 #include <machine/ieeefp.h>
 
 fp_prec_t fpu_oldprec;
 fpu_oldprec = fpgetprec();
 fpsetprec(precision);
 // calculations here
 fpsetprec(fpu_oldprec);

Valid precision values are:

FP_PS: Single precision
FP_PD: Double precision
FP_PE: Double-extended precision

Calling conventions: The compiler does not generally store the result of an operation in memory and back into the registers to truncate it. -ffloat-store itself in only interpreted for actual variables, not for expressions (i.e. double result = a / b; return result; and return a / b; will result in different code with -ffloat-store active - even in optimizer mode.

Conclusion:

If you want to use single precision for calculations, use the float datatype. Also use a temporary floating point varible to store the result:
```
{ volatile float result = /* actual result */; return result; }
```
instead of
```
return /* actual result */;
```
Using -ffloat-store would require an additional variable anyway to have effect, so just declaring a new volatile variable will suffice anyway. Using fpsetprec couldn't hurt either.
If you want to use double precision for calculations, use the double datatype. Since the default internal precision is double by default, one does not need to change anything. Using fpsetprec and a volatile return variable couldn't hurt, though.
If you want to use double-extended precision for calculations, use the long double datatype. Also be sure to use fpsetprec to ensure the correct internal precision.

Other compilers

Not tested, feedback welcome.

OpenSolaris 2008.05 x86

It does not appear to be possible to switch to double precision on OpenSolaris on Intel platforms without resorting to inline assembler or relying on GCC and SSE extensions (and OpenSolaris runs on non-SSE-x86 processors, so that's not portable).

GCC 3.4.3

The compiler supports double-extended precision and uses that for the default floating point precision on the platform. There seems to be no possibility to explicitely change the FP precision setting without resorting to inline assembler. (there are some macros / functions defined in sys/ieeefp.h but the interesting part is enclosed in an #ifdef _KERNEL) The only possibility appears to be to use MMX/SSE instructions: -mfpmath=sse -msse -march=pentium4 or similar - or to use inline assembly to set the internal precision.

Conclusion:

If you want to use single precision for calculations, use the float datatype. Also use a temporary floating point variable to store the result:
```
{ volatile float result = /* actual result */; return result; }
```
instead of
```
return /* actual result */;
```
Using -ffloat-store would require an additional variable anyway to have effect, so just declaring a new volatile variable will suffice anyway. sing inline assembly to set the precision manually couldn't hurt either.
If you want to use double precision for calculations, use the double datatype and use inline assembly to manually set the precision (see below).
If you want to use double-extended precision for calculations, use the long double datatype. Using inline assembly to force the precision couldn't hurt either.

SUN C 5.9 Patch 124868-01 2007/07/12

The compiler behaves more or less identical with GCC on OpenSolaris, there are only two differences:

For single-precision values the values are always truncated before returning them via the store-and-reload trick.
There appears to be no option to enable SSE for the compiler.

Conclusion:

If you want to use single precision for calculations, simply use the float datatype for all operhands. Since the compiler will always truncate results, the semantics will work correctly.
For double and double-extended precision, the same as for GCC on OpenSolaris applies.

Other compilers

Not tested, feedback welcome.

Mac OS X 10.5 x86

This section is based on tests that other people did for me.

GCC 4.0.1 (Apple build 5465)

The FPU mode is double-extended by default and that datatype is also supported. But since Mac OS X runs only on x86 platforms that already support MMX/SSE, the compiler will always use MMX/SSE for single and double precision floating point calculations. Only double-extended precision floating point calculations will be done with the x87 FPU.

This is actually the most desirable behaviour since the precision is always determined by the largest operhand type in C.

Because of this, Mac OS X does not provide any C wrapper macros to change the internal precision setting of the x87 FPU. It is simply not necessary. Should this really be wanted, inline assembler would probably be possible, I haven't tested this, however.

Not tested: See what happens if -mfpmath=i387 is used as a compiler option.

Conclusion:

Simply use the correct datatype and the operations performed will have the correct semantics: float for single precision, double for double precision and long double for double-extended precision.

Other compilers

Not tested, feedback welcome.

Linux AMD64/EMT64 (NOT IA64!)

GCC 4.1.2

GCC on 64bit x86 Linux behaves the same as GCC on Mac OS X: FPU mode is extended by default but since SSE is supported by all x86_64 processors, SSE is always used for single and double precision floating point calculations and thus the largest operhand in C defines the calculation precision.

However, it is possible to use the -mfpmath=i387 switch. In that case, glibc also provides the fpu_control.h with the same macros as for the x86 case.

Conclusion:

Mac OS X PPC64 (Power4)

Scott MacVicar did some tests on the Power4 platform with GCC 4.0.1 which yielded the expected results that the compiler generates different instructions for different datatypes, thus using the correct precision. An interesting detail: quad precision calculations are done via software emulation by default, perhaps for compability with older processor revisions.

Conclusion:

Simply use the correct datatype and the operations performed will have the correct semantics: float for single precision, double for double precision and long double for quad precision.

Solaris SPARC (UltraSPARC IIIi)

Johannes Schlüter did some tests on the UltraSPARC IIIi platform with both GCC 3.4.3 and Sun's CC which yielded the expected results that both compilers generate different instructions for different datatypes, thus using the correct precision. An interesting detail: quad precision calculations are done via software emulation by default (by both (!) compilers), perhaps for compability with older processor revisions.

Conclusion:

Simply use the correct datatype and the operations performed will have the correct semantics: float for single precision, double for double precision and long double for quad precision.

Linux ARM (Nokia N810)

A friend of mine ran the test suite below on a Nokia N810 with Linux installed. Since the N810 does not posess a FPU, all FPU operations are emulated via function calls.

GCC 3.4.4

The emulation uses the same precision as the largest operhand type. long double is treated the same as double, i.e. there is no extended precision.

Conclusion:

Simply use the correct datatype and the operations performed will have the correct semantics: float for single precision and double for double precision.

Conclusion

The best solution would probably be using the MPFR library (or something similar). That has its drawbacks though: The performance is slower than native FPU performance and it requires memory allocation on the heap. Also, having such a big library as a requirement for the own software may not always be the optimal choice. Thus a solution using the native FP unit will be built.

Using single or double precision for floating point calculations seems to be the only portable alternative without using an entire library. Double-extended precision is not supported on PPC and SPARC, quad precision is not supported on x86 and extended precision is also not supported by Microsoft's Visual C++ compiler.

Since there is probably no good reason to restrict calculations to single precision (lower precision means less accuracy), this document will focus on using double precision portably.

In order to acchieve this, the following steps should be taken:

Define several macros that automate the switching of the FPU precision setting.
Every function doing FP math should only use double-precision values, i.e. the C double datatype.
On entering a function that does FP math, the current FPU precision setting should be saved, the FPU should be set to double precision, then all calculations should be done and at the end of the function, the previous FPU precision should be restored.
When calling a function from a foreign library that potentially does floating point calculations, the FPU state should be temporarily restored for the duration of the call.
Since it is not possible to ensure that -ffloat-store is always used when using GCC, a volatile return variable should be used for returning data, this has the same effect.

Macros for saving, switching and restoring FPU precision

The following macros provide platform-independent support for saving, switching and restoring FPU precision. If a platform does not support this or it is not necessary, they will simply expand to no-ops. XPFPA stands for "cross platform floating point arithmetic".

XPFPA_DECLARE(): Declare all necessary variables to save the current FP precision setting.
XPFPA_SWITCH_DOUBLE(): Save the current precision and switch to double precision
XPFPA_RESTORE(): Restore the previous precision
XPFPA_RETURN_DOUBLE(val): Restore the previous precision setting and return a value. Handles the compiler-misoptimization discussed above.
XPFPA_SWITCH_SINGLE(): Save the current precision and switch to single precision. This macro is defined for completeness purposes only.
XPFPA_RETURN_SINGLE(val): Restore the previous precision setting and return a value. Handles the compiler-misoptimization discussed above.
XPFPA_SWITCH_DOUBLE_EXTENDED(): Save the current precision and switch to double-extended precision. May not work at all (e.g. PPC) or not as expected (MS Visual C++). This macro is defined for completeness purposes only.

Example usage:

 #include "xpfpa.h"

 double somefn (double a) {
   XPFPA_DECLARE()
   double localvar1;
   
   XPFPA_SWITCH_DOUBLE()
   
   // do some calculation
   localvar1 = a * a;
   
   // restore state temporarily, call an external library function
   XPFPA_RESTORE()
   localvar1 = external_lib_solve_linear_equation(localvar1);
   XPFPA_SWITCH_DOUBLE()
   
   // do some more calculations
   localvar1 = localvar1 / 2.0;
   
   // return the value
   XPFPA_RETURN_DOUBLE(localvar1);
 }

Please be sure to always use the correct datatype for the calculations since several architectures do not posess an internal precision but rather determine the precision according to the datatype of the operhands!

The header file xpfpa.h defines the above macros on systems where the internal precision can be changed. On other systems, the macros expand to nothing and we assume that the correct precision is chosen by the compiler based on the operhand types. The file can also be downloaded in an archive: xpfpa.tar.gz

Checks for autoconf

In order to set the correct defines for the macros, I have written an m4 file that may be included in your configure.in. It's name is float_precision.m4 and it is available in the archive autoconf-macros.tar.gz. Usage:

dnl ... top of configure.in or acinclude.m4 or whatever:
sinclude(float_precision.m4)
dnl ... below:
CHECK_FLOAT_PRECISION

It will set one or more of the following variables, please make sure your config.h.in contains them:

HAVE__FPU_SETCW
HAVE_FPSETPREC
HAVE__CONTROLFP
HAVE__CONTROLFP_S
HAVE_FPU_INLINE_ASM_X86

Checks for CMake

I have also provided a macro for CMake that may be included in your CMakeLists.txt. It's name is CheckFloatPrecision.cmake and it is available in the archive cmake-macros.tar.gz. Usage:

include(CheckFloatPrecision.cmake)
# or, if in module path
include(CheckFloatPrecision)
check_float_precision()

It will set one or more of the following variables, please make sure your config.h.in contains them:

HAVE__FPU_SETCW
HAVE_FPSETPREC
HAVE__CONTROLFP
HAVE__CONTROLFP_S
HAVE_FPU_INLINE_ASM_X86

Tests for other platforms / compilers

I have started to create an easy test suite that does some checks on the default floating point behaviour. It will only run on POSIX-like platforms with a C compiler that supports options in the style of GCC (e.g. GCC itself, ICC, Sun C Compiler, ...).

It can be downloaded here: fp-testsuite.tar.bz2 / fp-testsuite.tar.gz.

If you have a new platform / new compiler that I haven't discussed yet, please run the following (in the directory where you unpacked the archive):

CC=/path/to/your/compiler ./fptest.sh | tee messages.txt

Then please send me messages.txt and results.tar.gz.

Author: Christian Seiler, Last change: 2008-10-28