比较浮点值有多危险？

本文翻译自：How dangerous is it to compare floating point values?

I know UIKit uses CGFloat because of the resolution independent coordinate system. 我知道UIKit使用CGFloat是因为分辨率独立于坐标系。

But every time I want to check if for example frame.origin.x is 0 it makes me feel sick: 但是每次我想检查例如frame.origin.x是否为0时，我都会感到恶心：

if (theView.frame.origin.x == 0) {// do important operation
}

Isn't CGFloat vulnerable to false positives when comparing with == , <= , >= , < , > ? 与== ， <= ， >= ， < ， >比较时， CGFloat是否不容易受到误报的影响？ It is a floating point and they have unprecision problems: 0.0000000000041 for example. 这是一个浮点，并且存在不精确的问题：例如0.0000000000041 。

Is Objective-C handling this internally when comparing or can it happen that a origin.x which reads as zero does not compare to 0 as true? 比较时， Objective-C是否在内部处理此问题？或者可能发生的情况是，读取为零的origin.x不会与真实的0比较吗？

#1楼

参考：https://stackoom.com/question/hMWW/比较浮点值有多危险

#2楼

[The 'right answer' glosses over selecting K . [“正确答案”掩盖了选择K的可能性。 Selecting K ends up being just as ad-hoc as selecting VISIBLE_SHIFT but selecting K is less obvious because unlike VISIBLE_SHIFT it is not grounded on any display property. 选择K最终与选择VISIBLE_SHIFT一样特别，但是选择K不太明显，因为与VISIBLE_SHIFT不同，它不基于任何显示属性。 Thus pick your poison - select K or select VISIBLE_SHIFT . 因此，选择您的毒药-选择K或选择VISIBLE_SHIFT 。 This answer advocates selecting VISIBLE_SHIFT and then demonstrates the difficulty in selecting K ] 该答案主张选择VISIBLE_SHIFT ，然后说明选择K ]的困难

Precisely because of round errors, you should not use comparison of 'exact' values for logical operations. 正是由于舍入错误，您不应将“精确”值的比较用于逻辑运算。 In your specific case of a position on a visual display, it can't possibly matter if the position is 0.0 or 0.0000000003 - the difference is invisible to the eye. 对于特定的视觉显示位置，位置为0.0或0.0000000003无关紧要-差异是肉眼看不到的。 So your logic should be something like: 因此，您的逻辑应类似于：

#define VISIBLE_SHIFT    0.0001        // for example
if (fabs(theView.frame.origin.x) < VISIBLE_SHIFT) { /* ... */ }

However, in the end, 'invisible to the eye' will depend on your display properties. 但是，最后，“肉眼看不见”将取决于您的显示属性。 If you can upper bound the display (you should be able to); 如果可以将显示上限（应该可以）； then choose VISIBLE_SHIFT to be a fraction of that upper bound. 然后选择VISIBLE_SHIFT作为该上限的一小部分。

Now, the 'right answer' rests upon K so let's explore picking K . 现在，“正确答案”取决于K因此让我们探讨选择K The 'right answer' above says: 上面的“正确答案”说：

K is a constant you choose such that the accumulated error of your computations is definitely bounded by K units in the last place (and if you're not sure you got the error bound calculation right, make K a few times bigger than what your calculations say it should be) K是一个常数，您可以选择使计算的累计误差最后由K个单位来限制（如果不确定不确定误差的计算正确性，则使K比计算大几倍）说应该）

So we need K . 所以我们需要K If getting K is more difficult, less intuitive than selecting my VISIBLE_SHIFT then you'll decide what works for you. 如果获取K比选择我的VISIBLE_SHIFT困难得多，而且不那么直观，那么您将确定最适合自己的方法。 To find K we are going to write a test program that looks at a bunch of K values so we can see how it behaves. 为了找到K我们将编写一个测试程序，该程序查看一系列K值，以便我们了解其行为。 Ought to be obvious how to choose K , if the 'right answer' is usable. 如果“正确答案”可用，那么应该如何选择K显然。 No? 没有？

We are going to use, as the 'right answer' details: 我们将使用“正确答案”详细信息：

if (fabs(x-y) < K * DBL_EPSILON * fabs(x+y) || fabs(x-y) < DBL_MIN)

Let's just try all values of K: 让我们尝试K的所有值：

#include <math.h>
#include <float.h>
#include <stdio.h>void main (void)
{double x = 1e-13;double y = 0.0;double K = 1e22;int i = 0;for (; i < 32; i++, K = K/10.0){printf ("K:%40.16lf -> ", K);if (fabs(x-y) < K * DBL_EPSILON * fabs(x+y) || fabs(x-y) < DBL_MIN)printf ("YES\n");elseprintf ("NO\n");}
}
ebg@ebg$ gcc -o test test.c
ebg@ebg$ ./test
K:10000000000000000000000.0000000000000000 -> YES
K: 1000000000000000000000.0000000000000000 -> YES
K:  100000000000000000000.0000000000000000 -> YES
K:   10000000000000000000.0000000000000000 -> YES
K:    1000000000000000000.0000000000000000 -> YES
K:     100000000000000000.0000000000000000 -> YES
K:      10000000000000000.0000000000000000 -> YES
K:       1000000000000000.0000000000000000 -> NO
K:        100000000000000.0000000000000000 -> NO
K:         10000000000000.0000000000000000 -> NO
K:          1000000000000.0000000000000000 -> NO
K:           100000000000.0000000000000000 -> NO
K:            10000000000.0000000000000000 -> NO
K:             1000000000.0000000000000000 -> NO
K:              100000000.0000000000000000 -> NO
K:               10000000.0000000000000000 -> NO
K:                1000000.0000000000000000 -> NO
K:                 100000.0000000000000000 -> NO
K:                  10000.0000000000000000 -> NO
K:                   1000.0000000000000000 -> NO
K:                    100.0000000000000000 -> NO
K:                     10.0000000000000000 -> NO
K:                      1.0000000000000000 -> NO
K:                      0.1000000000000000 -> NO
K:                      0.0100000000000000 -> NO
K:                      0.0010000000000000 -> NO
K:                      0.0001000000000000 -> NO
K:                      0.0000100000000000 -> NO
K:                      0.0000010000000000 -> NO
K:                      0.0000001000000000 -> NO
K:                      0.0000000100000000 -> NO
K:                      0.0000000010000000 -> NO

Ah, so K should be 1e16 or larger if I want 1e-13 to be 'zero'. 啊，所以如果我希望1e-13为“零”，则K应该为1e16或更大。

So, I'd say you have two options: 因此，我想您有两种选择：

Do a simple epsilon computation using your engineering judgement for the value of 'epsilon', as I've suggested. 正如我所建议的，使用工程学判断 “ε”的值进行简单的ε计算。 If you are doing graphics and 'zero' is meant to be a 'visible change' than examine your visual assets (images, etc) and judge what epsilon can be. 如果您正在做图形，而“零”意味着“可见变化”，那么您不应该检查视觉资产（图像等）并判断可能是ε。
Don't attempt any floating point computations until you've read the non-cargo-cult answer's reference (and gotten your Ph.D in the process) and then use your non-intuitive judgement to select K . 在您阅读非载运性答案的参考文献（并在此过程中获得博士学位）之前，不要尝试任何浮点计算，然后使用非直觉判断来选择K

#3楼

Since 0 is exactly representable as an IEEE754 floating-point number (or using any other implementation of fp numbers I've ever worked with) comparison with 0 is probably safe. 由于0可以精确地表示为IEEE754浮点数（或使用我曾经使用过的fp数的任何其他实现），所以与0进行比较可能是安全的。 You might get bitten, however, if your program computes a value (such as theView.frame.origin.x ) which you have reason to believe ought to be 0 but which your computation cannot guarantee to be 0. 但是，如果程序计算出一个值（例如theView.frame.origin.x ），则您可能会被咬住，您有理由认为该值应为0，但不能保证其值为0。

To clarify a little, a computation such as : 为了澄清一点，计算如下：

areal = 0.0

will (unless your language or system is broken) create a value such that (areal==0.0) returns true but another computation such as 将（除非您的语言或系统已损坏）创建一个值，以使（areal == 0.0）返回true，但另一个计算如

areal = 1.386 - 2.1*(0.66)

may not. 不得。

If you can assure yourself that your computations produce values which are 0 (and not just that they produce values which ought to be 0) then you can go ahead and compare fp values with 0. If you can't assure yourself to the required degree, best stick to the usual approach of 'toleranced equality'. 如果您可以确保自己的计算得出的值为0（而不仅是它们得出的值应该为0），则可以继续将fp值与0进行比较。如果不能保证自己达到所需的程度，，最好坚持“容忍平等”的通常做法。

In the worst cases the careless comparison of fp values can be extremely dangerous: think avionics, weapons-guidance, power-plant operations, vehicle navigation, almost any application in which computation meets the real world. 在最坏的情况下，粗心地比较fp值可能会非常危险：考虑航空电子设备，武器制导，电厂操作，车辆导航，以及几乎任何能够满足实际情况的应用。

For Angry Birds, not so dangerous. 对于愤怒的小鸟，没有那么危险。

#4楼

First of all, floating point values are not "random" in their behavior. 首先，浮点值的行为不是“随机的”。 Exact comparison can and does make sense in plenty of real-world usages. 精确的比较可以并且确实在大量实际使用中有意义。 But if you're going to use floating point you need to be aware of how it works. 但是，如果您要使用浮点，则需要知道它是如何工作的。 Erring on the side of assuming floating point works like real numbers will get you code that quickly breaks. 假设浮点运算像实数一样容易出错，这会使您的代码快速中断。 Erring on the side of assuming floating point results have large random fuzz associated with them (like most of the answers here suggest) will get you code that appears to work at first but ends up having large-magnitude errors and broken corner cases. 假设浮点结果具有与之相关的较大的随机模糊（如此处的大多数答案所示），那会犯错误，这会使您的代码起初看起来可以工作，但最终会出现大幅度错误和断角情况。

First of all, if you want to program with floating point, you should read this: 首先，如果要使用浮点编程，则应阅读以下内容：

What Every Computer Scientist Should Know About Floating-Point Arithmetic 每个计算机科学家都应了解的浮点运算法则

Yes, read all of it. 是的，请阅读所有内容。 If that's too much of a burden, you should use integers/fixed point for your calculations until you have time to read it. 如果这负担太大，则应在计算之前使用整数/不动点进行计算。 :-) :-)

Now, with that said, the biggest issues with exact floating point comparisons come down to: 如此说来，精确浮点比较的最大问题归结为：

The fact that lots of values you may write in the source, or read in with scanf or strtod , do not exist as floating point values and get silently converted to the nearest approximation. 您可能会在源代码中写入许多值，或者使用scanf或strtod读取值的事实并不作为浮点值存在，而是会悄悄地转换为最接近的近似值。 This is what demon9733's answer was talking about. 这就是demon9733的答案。
The fact that many results get rounded due to not having enough precision to represent the actual result. 由于没有足够的精度来表示实际结果，因此许多结果会四舍五入。 An easy example where you can see this is adding x = 0x1fffffe and y = 1 as floats. 一个简单的示例，您可以看到这是将x = 0x1fffffe和y = 1为浮点数。 Here, x has 24 bits of precision in the mantissa (ok) and y has just 1 bit, but when you add them, their bits are not in overlapping places, and the result would need 25 bits of precision. 在这里， x的尾数精度为24位（确定），而y的尾数仅为1位，但是将它们相加时，它们的位不在重叠的位置，因此结果需要25位的精度。 Instead, it gets rounded (to 0x2000000 in the default rounding mode). 取而代之的是将其舍入（在默认舍入模式下为0x2000000 ）。
The fact that many results get rounded due to needing infinitely many places for the correct value. 由于需要无限多个位置来获取正确的值，因此许多结果会四舍五入。 This includes both rational results like 1/3 (which you're familiar with from decimal where it takes infinitely many places) but also 1/10 (which also takes infinitely many places in binary, since 5 is not a power of 2), as well as irrational results like the square root of anything that's not a perfect square. 这既包括合理的结果，例如1/3（您从十进制开始熟悉的结果，在这个地方它占据了无数个位置），还包括1/10（它也存储了二进制中的许多个位置，因为5不是2的幂），以及任何非理想平方的平方根之类的非理性结果。
Double rounding. 双舍入。 On some systems (particularly x86), floating point expressions are evaluated in higher precision than their nominal types. 在某些系统（尤其是x86）上，以比其标称类型更高的精度评估浮点表达式。 This means that when one of the above types of rounding happens, you'll get two rounding steps, first a rounding of the result to the higher-precision type, then a rounding to the final type. 这意味着当发生以上一种舍入类型时，您将获得两个舍入步骤，首先是将结果舍入为高精度类型，然后舍入为最终类型。 As an example, consider what happens in decimal if you round 1.49 to an integer (1), versus what happens if you first round it to one decimal place (1.5) then round that result to an integer (2). 例如，考虑如果将1.49四舍五入为整数（1），则十进制会发生什么，而如果先将其四舍五入至一个小数位（1.5），然后将结果四舍五入为整数（2），则会发生什么。 This is actually one of the nastiest areas to deal with in floating point, since the behaviour of the compiler (especially for buggy, non-conforming compilers like GCC) is unpredictable. 实际上，这是浮点处理中最令人讨厌的领域之一，因为编译器的行为（尤其是对于有缺陷的，不合格的编译器，如GCC）是无法预测的。
Transcendental functions ( trig , exp , log , etc.) are not specified to have correctly rounded results; 未指定先验函数（ trig ， exp ， log等）以具有正确的舍入结果； the result is just specified to be correct within one unit in the last place of precision (usually referred to as 1ulp ). 仅在最后一个精度（通常称为1ulp ）内将结果指定为在一个单位内是正确的。

When you're writing floating point code, you need to keep in mind what you're doing with the numbers that could cause the results to be inexact, and make comparisons accordingly. 在编写浮点代码时，需要牢记您对可能导致结果不精确的数字所做的操作，并进行相应的比较。 Often times it will make sense to compare with an "epsilon", but that epsilon should be based on the magnitude of the numbers you are comparing , not an absolute constant. 通常，与“ epsilon”进行比较会很有意义，但是该epsilon应该基于所比较的数字的大小，而不是绝对常数。 (In cases where an absolute constant epsilon would work, that's strongly indicative that fixed point, not floating point, is the right tool for the job!) （在绝对恒定的ε起作用的情况下，这强烈表明固定点而不是浮点是完成任务的正确工具！）

Edit: In particular, a magnitude-relative epsilon check should look something like: 编辑：特别是，幅度相对的epsilon检查应如下所示：

if (fabs(x-y) < K * FLT_EPSILON * fabs(x+y))

Where FLT_EPSILON is the constant from float.h (replace it with DBL_EPSILON for double s or LDBL_EPSILON for long double s) and K is a constant you choose such that the accumulated error of your computations is definitely bounded by K units in the last place (and if you're not sure you got the error bound calculation right, make K a few times bigger than what your calculations say it should be). 其中FLT_EPSILON是float.h的常数（对于double s，将其DBL_EPSILON为LDBL_EPSILON对于long double s，将其DBL_EPSILON为LDBL_EPSILON ），而K是您选择的常数，以使计算的累积误差肯定由最后一个K单位限制（如果不确定不确定错误的边界计算是否正确，则将K设为计算结果的几倍。

Finally, note that if you use this, some special care may be needed near zero, since FLT_EPSILON does not make sense for denormals. 最后，请注意，如果使用此选项，由于FLT_EPSILON对异常值没有意义，因此可能需要特别注意零附近的情况。 A quick fix would be to make it: 一个快速的解决方法是：

if (fabs(x-y) < K * FLT_EPSILON * fabs(x+y) || fabs(x-y) < FLT_MIN)

and likewise substitute DBL_MIN if using doubles. 如果使用双精度，则同样替换DBL_MIN 。

#5楼

I'd say the right thing is to declare each number as an object, and then define three things in that object: 1) an equality operator. 我会说正确的事情是将每个数字声明为一个对象，然后在该对象中定义三件事：1）一个相等运算符。 2) a setAcceptableDifference method. 2）一个setAcceptableDifference方法。 3)the value itself. 3）价值本身。 The equality operator returns true if the absolute difference of two values is less than the value set as acceptable. 如果两个值的绝对差小于设置为可接受的值，则相等运算符返回true。

You can subclass the object to suit the problem. 您可以子类化对象以适合该问题。 For example, round bars of metal between 1 and 2 inches might be considered of equal diameter if their diameters differed by less than 0.0001 inches. 例如，如果直径在1-2英寸之间的金属圆棒的直径差异小于0.0001英寸，则可以认为它们相等。 So you'd call setAcceptableDifference with parameter 0.0001, and then use the equality operator with confidence. 因此，您可以使用参数0.0001调用setAcceptableDifference，然后放心使用相等运算符。

#6楼

The last time I checked the C standard, there was no requirement for floating point operations on doubles (64 bits total, 53 bit mantissa) to be accurate to more than that precision. 我上次检查C标准时，没有要求双精度浮点运算（总共64位，尾数为53位）要比该精度更高。 However, some hardware might do the operations in registers of greater precision, and the requirement was interpreted to mean no requirement to clear lower order bits (beyond the precision of the numbers being loaded into the registers). 但是，某些硬件可能会在精度更高的寄存器中执行操作，并且该要求被解释为无需清除低阶位（超出装入寄存器的数字的精度）。 So you could get unexpected results of comparisons like this depending on what was left over in the registers from whoever slept there last. 这样一来，您可能会得到类似的比较结果，这取决于最后一个睡觉的人在寄存器中留下的内容。

That said, and despite my efforts to expunge it whenever I see it, the outfit where I work has lots of C code that is compiled using gcc and run on linux, and we have not noticed any of these unexpected results in a very long time. 就是说，尽管我努力在每次看到它时都将其删除，但是我工作的公司还是有许多使用gcc编译并在linux上运行的C代码，而且很长一段时间我们都没有注意到这些意外结果。 I have no idea whether this is because gcc is clearing the low-order bits for us, the 80-bit registers are not used for these operations on modern computers, the standard has been changed, or what. 我不知道这是否是因为gcc正在为我们清除低序位，80位寄存器未在现代计算机上用于这些操作，标准已更改或什么。 I'd like to know if anyone can quote chapter and verse. 我想知道是否有人可以引用章节和经文。

比较浮点值有多危险？相关推荐

JavaScript 解决浮点值运算Bug
浮点值得精确计算. --JavaScript的浮点值运算,总会存在些奇怪的结果. 由于有些小数用二进制表示时是无穷的,故有些精确度丢失是无法避免的. 如:0.2+0.1的的运算结果实际上是:0.300 ...
linux 串口格式化输出字符串,如何把电压这个浮点值转字符串输出到串口
问题一:如何把电压这个浮点值输出到串口屏? 要解决这个问题,要使用C语言的sprintf这个语句,具体语句的详细内容可以自行百度下,这里仅提供简单使用方法: sprintf是需要stdio.h来声明的 ...
浮点数转换为整数四舍五入_定义宏以将浮点值四舍五入为C中最接近的整数
浮点数转换为整数四舍五入 Given a float value and we have to round the value to the nearest integer with the help ...
如何把电压这个浮点值转字符串输出到串口
2019独角兽企业重金招聘Python工程师标准>>> 问题一:如何把电压这个浮点值输出到串口屏? 要解决这个问题,要使用C语言的sprintf这个语句,具体语句的详细内容可以自行百 ...
int输出%f浮点值是0，double/float 浮点数%d输出0的原因
#include <cstdio> using namespace std; int main() {int a = 3;printf("int a print float : ...
c语言中怎么用scanf给二维数组赋值,关于VC++6.0无法用scanf()输入浮点值赋给二维数组的问题...
VC++6.0在编译时为了节省资源的占用,如果检测到你的整个代码里面没有需要用到浮点数据时,就不会加入浮点链接库,所以在遇到复杂的结构如二维数组的输入中,就会出现错误: VC++6.0只能向一维数组 ...
c语言求平均数double,编写程序以计算浮点值的平均值
定义一个函数,用于计算任意数量的浮点值的平均值.double类型值的数组在数组参数中传递给函数.读取从键盘输入的任意数量的值并输出平均值. 实现代码 #define __STDC_WANT_LIB_E ...
php ceil(向上)、floor(向下取整)、 intval(取整数值) 、floatval（取浮点值）
1.ceil() 函数向上舍入为最接近的整数 echo(ceil(0.40); echo(ceil(5); echo(ceil(-5.1); //1.5.-5 2.floor() 函数向下舍入为最接近 ...
Webots：ERROR: “E:/**/wheelfinready.wbt”：33：23：错误：Expected ‘浮点值‘, found ‘[‘. {1‘?} {2:23:?}
Webots:ERROR: "E:/**/wheelfinready.wbt":33:23:错误:Expected '浮点值', found '['. {1'?} {2:23:?} ...

比较浮点值有多危险？

#1楼

#2楼

#3楼

#4楼

#5楼

#6楼

比较浮点值有多危险？相关推荐

最新文章

热门文章