Updated: 2024-08-23
Created: 2024-08
tolerance.
NaNpropagation and comparisons.
For single precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^23. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^13. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.0005.
For double precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^52. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^42. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.0005.
For floating-point integers (I'll give my answer in terms of IEEE double-precision), every integer between 1 and 2^53 is exactly representable. Beyond 2^53, integers that are exactly representable are spaced apart by increasing powers of two.
In base 2, 1/10 is the infinitely repeating fraction
0.0001100110011001100110011001100110011001100110011...Stop at any finite number of bits, and you get an approximation. On most machines today, floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two. In the case of 1/10, the binary fraction is 3602879701896397 / 2 ** 55 which is close to but not exactly equal to the true value of 1/10.
[...] On most machines, if Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display:
>>> 0.1 0.1000000000000000055511151231257827021181583404541015625That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead:
>>> 1 / 10 0.1Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction.
1.23456789e-9 + 0.87654321e-2 => 0.00000012e-2 + 0.87654321e-2 => 1.2345678e-9 + 876543.21e-6
$ perl -e 'printf ("%0.26f\n",0.1)' 0.10000000000000000555111512
$ perl -e 'printf ("%0.26f\n",0.3)' 0.29999999999999998889776975
$ perl -e 'printf ("%0.26f\n",0.3000000000000001)' 0.30000000000000009992007222 $ perl -e 'printf ("%0.26f\n",0.30000000000000001)' 0.29999999999999998889776975
$ perl -e 'printf ("%0.26f\n",0.1+0.2)' 0.30000000000000004440892099
$ perl -e 'printf ("%0.26F\n",0.00000000000000001+0.1)' 0.10000000000000001942890293
$ perl -e 'printf ("%d\n",(0.1+0.2) == 0.3)' 0
$ perl -e 'printf ("%d\n",(1.0+2.0) == 3.0)' 1
$ perl -e 'printf ("%0.26f\n",(0.1+0.2) - 0.3)' 0.00000000000000005551115123
$ perl -e 'printf ("%d\n",((0.1+0.2)*10.0) == 3.0)' 0
#includeint main() { float inf = 1.0f/0.0f; float nan = 0.0f/0.0f; printf("1.0f/0.0f: %0.26f\n",inf); printf("inf == inf: %d\n", inf == inf); printf("inf != inf: %d\n", inf < inf); printf("inf > inf: %d\n", inf > inf); printf("inf < inf: %d\n", inf < inf); printf("0.0f/0.0f: %0.26f\n",nan); printf("nan == nan: %d\n", nan == nan); printf("nan != nan: %d\n", nan < nan); printf("nan > nan: %d\n", nan > nan); printf("nan < nan: %d\n", nan < nan); return 0; }
1.0f/0.0f: inf inf == inf: 1 inf != inf: 0 inf > inf: 0 inf < inf: 0
0.0f/0.0f: -nan nan == nan: 0 nan != nan: 0 nan > nan: 0 nan < nan: 0
$ perl -e 'my ($a,$b,$c)=(0.1,0.00000003,0.09999997,0.1); printf ("%0.26f\n",($a-($b+$c)))' 0.00000000000000001387778781
$ perl -e 'my ($a,$b,$c)=(0.1,0.00000003,0.09999997,0.1); printf ("%d\n",($a-($b+$c)) < (0.000000000000001*($a+$b+$c)/2))' 1
$ perl -e 'my ($a,$b,$c)=(0.1,0.00000003,0.09999997,0.1); printf ("%d\n",($a-($b+$c)) < (0.0000000000000001*($a+$b+$c)/2))' 0
parallaxissues.
A few years ago I had a some spherical geometry that needed to be very accurate, and still fast. 80 bit double on PC's was not cutting it, so I added some types to the program that sorted terms before performing commutative operations. Problem solved.