译文:在闭包中使用循环变量是有害的

有些我确实翻译不了,希望读者能够在留言下面进行翻译。让我们共同进步。谢谢大家了。
原文来自:
https://blogs.msdn.microsoft.com/ericlippert/2009/11/12/closing-over-the-loop-variable-considered-harmful/

https://blogs.msdn.microsoft.com/ericlippert/2009/11/16/closing-over-the-loop-variable-part-two/

(This is part one of a two-part series on the loop-variable-closure problem. Part two is here.)

这是循环闭包问题两篇文章中的第一篇,第二篇在这里。

UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The “for” loop will not be changed. We return you now to our original article.

更新: 我们正在采取突破性的变化。在C#5.0中,foreach 循环中的循环变量在逻辑循环内部,因此闭包每次回创建一个新的变量。但是在for循环中不会改变。现在我们返回到我们的话题中。

I don’t know why I haven’t blogged about this one before; this is the single most common incorrect bug report we get. That is, someone thinks they have found a bug in the compiler, but in fact the compiler is correct and their code is wrong. That’s a terrible situation for everyone; we very much wish to design a language which does not have “gotcha” features like this.

循环闭包问题是最常见的bug,我不知道为什么我之前没有在博客上写过关于循环闭包的问题;有人认为他们已经在编译器中发现了一个错误，但事实上，编译器是正确的，他们的代码是错误的。对每个人而言,这是一个非常可怕的局面;我们非常希望设计一种没有疑难杂症的语言。

But I’m getting ahead of myself. What’s the output of this fragment?

但是在这种语言出现之前,下面的片段将会输出什么内容呢？

var values = new List<int>() { 100, 110, 120 };
var funcs = new List<Func<int>>();
foreach(var v in values) funcs.Add( ()=>v );
foreach(var f in funcs) Console.WriteLine(f());

Most people expect it to be 100 / 110 / 120. It is in fact 120 / 120 / 120. Why?
这个问题编译器已经修改了，现在测试出来的是100、110、120。
大部分人期望输出的结果为:100、110、120.事实上输出的结果为:120、120、120。为什么呢？

Because ()=>v means “return the current value of variable v“, not “return the value v was back when the delegate was created”. Closures close over variables, not over values. And when the methods run, clearly the last value that was assigned to v was 120, so it still has that value.

因为（）=V意味着“返回变量V的当前值,而不是“委托创建时返回值V”。闭包是变量的集合，而不是他们值的集合。当方法运行时,变量V的最后一个值为120，所以输出上面的结果。

This is very confusing. The correct way to write the code is:

因此我们考虑,正确的书写代码应该为:

foreach(var v in values)
{var v2 = v;funcs.Add( ()=>v2 );
}

Now what happens? Every time we re-start the loop body, we logically create a fresh new variable v2. Each closure is closed over a different v2, which is only assigned to once, so it always keeps the correct value.

现在的代码发生了什么呢？每次我们运行循环体,总是创建一个新的变量v2。闭包集合中为每一次创建不同的变量v2,因此它总能保持正确的值。

Basically, the problem arises because we specify that the foreach loop is a syntactic sugar for

基本上，这个问题的产生是因为foreach特殊的循环规则

 {IEnumerator<int> e = ((IEnumerable<int>)values).GetEnumerator();try{ int m; // OUTSIDE THE ACTUAL LOOPwhile(e.MoveNext()){m = (int)(int)e.Current;funcs.Add(()=>m);}}finally{ if (e != null) ((IDisposable)e).Dispose();}}If we specified that the expansion was
如果我们指定扩展是try{ while(e.MoveNext()){int m; // INSIDEm = (int)(int)e.Current;funcs.Add(()=>m);}

then the code would behave as expected.

然后代码会像预期的那样。

It’s compelling to consider fixing this for a hypothetical future version of C#, and I’d like to hear your feedback on whether we should do so or not. The reasons FOR making the change are clear; this is a big confusing “gotcha” that real people constantly run into, and LINQ, unfortunately, only makes it worse, because it is likely to increase the number of times a customer is going to use a closure in a loop. Also, it seems reasonable that the user of the foreach loop might think of there being a “fresh” loop variable every time, not just a fresh value in the same old variable. Since the foreach loop variable is not mutable by user code, this reinforces the idea that it is a succession of values, one per loop iteration, and not “really” the same variable over and over again. And finally, the change has no effect whatsoever on non-closure semantics. (In fact, in C# 1 the spec was not clear about whether the loop variable went inside or outside, since in a world without closures, it makes no difference.)

But that said, there are some very good reasons for not making this change.
但是这也有一些好的原因不去改变现在这个情况。

The first reason is that obviously this would be a breaking change, and we hates them, my precious. Any developers who depend on this feature, who require the closed-over variable to contain the last value of the loop variable, would be broken. I can only hope that the number of such people is vanishingly small; this is a strange thing to depend on. Most of the time, people do not expect or depend on this behaviour.

Second, it makes the foreach syntax lexically inconsistent. Consider foreach(int x in M()) The header of the loop has two parts, a declaration int x and a collection expression, M(). The int x is to the left of the M(). Clearly the M() is not inside the body of the loop; that thing only executes once, before the loop starts. So why should something to the collection expression’s left be inside the loop? This seems inconsistent with our general rule that stuff to the left logically “happens before” stuff to the right. The declaration is lexically NOT in the body of the loop, so why should we treat it as though it were?

Third, it would make the “foreach” semantics inconsistent with “for” semantics. We have this same problem in “for” blocks, but “for” blocks are much looser about what “the loop variable” is; there can be more than one variable declared in the for loop header, it can be incremented in odd ways, and it seems implausible that people would consider each iteration of the “for” loop to contain a fresh crop of variables. When you say for(int i; i < 10; i += 1) it seems dead obvious that the “i += 1” means “increment the loop variable” and that there is one loop variable for the whole loop, not a new fresh variable “i” every time through! We certainly would not make this proposed change apply to “for” loops.

And fourth, though this is a nasty gotcha, there is an easy workaround, and tools like ReSharper detect this pattern and suggest how to fix it. We could take a page from that playbook and simply issue a compiler warning on this pattern. (Though adding new warnings brings up a whole raft of issues of its own, which I might get into in another post.) Though this is vexing, it really doesn’t bite that many people that hard, and it’s not a big deal to fix, so why go to the trouble and expense of taking a breaking change for something with an easy fix?

Design is, of course, the art of compromise in the face of many competing principles. “Eliminate gotchas” in this case directly opposes other principles like “no breaking changes”, and “be consistent with other language features”. Any thoughts you have on pros or cons of us taking this breaking change in a hypothetical future version of C# would be greatly appreciated.

(This is part one of a two-part series on the loop-variable-closure problem. Part two is here.)

下面是第二篇部分:

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)

这是循环闭包问题的第二篇文章,第一篇原文在这这里。

Thanks to everyone who left thoughtful and insightful comments on last week’s post.
感谢每一位对上周的文章都留下了深刻见解和精辟见解的人。

More countries really ought to implement Instant Runoff Voting; it would certainly appeal to the geek crowd. Many people left complex opinions of the form “I’d prefer to make the change, but if you can’t do that then make it a warning”. Or “don’t make the change, do make it a warning”, and so on. But what I can deduce from reading the comments is that there is a general lack of consensus on what the right thing to do here is. In fact, I just did a quick tally:

Commenters who expressed support for a warning: 26
Commenters who expressed the sentiment “it’s better to not make the change”: 24
Commenters who expressed the sentiment “it’s better to make the change”: 25

Wow. I guess we’ll flip a coin.
真的.我想我们会抛硬币。

Four people suggested to actually make it an error to do this. That’s a pretty big breaking change, particularly since we would be breaking not just “already broken” code, but plenty of code that works perfectly well today — see below. That’s not likely to happen.

People also left a number of interesting suggestions. I thought I’d discuss some of those a little bit.
人们还留下了一些有趣的建议。我想我会讨论一些。

First off, I want to emphasize that what we’re attempting to address here is the problem that the language encourages people to write code that has different semantics than they think it has. The problem is NOT that the language has no way to express the desired semantics; clearly it does. Just introduce a new variable explicitly into the loop.

A number of suggestions were for ways that the language could more elegantly express that notion. Some of the suggestions:
一些建议是语言可以更优雅地表达这个概念。其中一些建议：

foreach(var x in c) inner
foreachnew(var x in c)
foreach(new var x in c)
foreach(var x from c)
foreach(var x inside c)

Though we could do any of those, none of them by themselves solve the problem at hand. Today, you have to know to use a particular pattern with foreach to get the semantics you want: declare a variable inside the loop. With one of these changes, you still have to know to use a particular keyword to get the semantics you want, and it is still easy to accidentally do the wrong thing.

Furthermore, a change so small and so targetted at such a narrow scenario probably does not provide enough benefit to justify the large cost of creating a new syntax, particularly one which is still easily confused with an existing syntax.

C++ luminary Herb Sutter happened to be in town and was kind enough to stop by my office to describe to me how they are solving a related problem in C++. Apparently the next version of the C++ standard will include lambdas, and they’re doing this:

[q, &r] (int x) -> int { return M(x, q, r); }

This means that the lambda captures outer variable q by value, captures r by reference, takes an int and returns an int. Whether the lambda captures values or references is controllable! An interesting approach but one that doesn’t immediately solve our problem here; we cannot make lambdas capture by value by default without a huge breaking change. Capturing by value would have to require new syntax, and then we’re in the same boat again: the user has to know to use the new syntax when in a foreach loop.

A number of people also asked what the down sides of adding a warning are. The down side is that a warning which warns about correct behaviour is a very bad warning; it makes people change working code, and frequently they break working code in order to eliminate a warning that shouldn’t have been present in the first place. Consider:

foreach(var insect in insects)
{var query = frogs.Where(frog=>frog.Eats(insect));Console.WriteLine(“{0} is eaten by {1} frogs.”, insect, query.Count());
}

This makes a lambda closed over insect; the lambda never escapes the loop, so there’s no problem here. But the compiler doesn’t know that. The compiler sees that the lambda is being passed to a method called Where, and Where is allowed to do anything with that delegate, including storing it away to be called later. Which is exactly what Where does! Where stores away the lambda into a monad that represents the execution of the query. The fact that the query object doesn’t survive the loop is what keeps this safe. But how is the compiler supposed to suss out that tortuous chain of reasoning? We’d have to give a warning for this case, even though it is perfectly safe.

It gets worse. A lot of people are required by their organizations to compile with “warnings are errors” turned on. Therefore, any time we introduce a new warning for a pattern that is often actually safe and frequently used, we are effectively causing an enormous breaking change. A vaccine which kills more healthy people than the disease would have is probably not a good bet. (**)

This is not to say that a warning is a bad idea, but that it is not the obvious slam dunk good idea that it initially appears to be.

A number of people suggested that the problem was in the training of the developers, not in the design of the language. I disagree. Obviously modern languages are complex tools that require training to use, but we are working hard to make a language where people’s natural intuitions about how things work lead them to write correct code. I have myself made this error a number of times, usually in the form of writing code like the code above, and then refactoring it in such a manner that suddenly some part of it escapes the loop and the bug is introduced. It is very easy to make this mistake, even for experienced developers who thoroughly understand closure semantics. That’s a flaw in the design of the language.

And finally, a number of people made suggestions of the form “make it a warning in C# 4, and an error in C# 5”, or some such thing. FYI, C# 4 is DONE. We are only making a few last-minute “user is electrocuted”-grade bug fixes, mostly based on your excellent feedback from the betas. (If you have bug reports from the beta, please keep sending them, but odds are good they won’t get fixed for the initial release.) We are certainly not capable of introducing any sort of major design change or new feature at this point. And we try to not introduce semantic changes or new features in service packs. We’re going to have to live with this problem for at least another cycle, unfortunately.

(*) Mr. Smiley Face indicates that Eric is indulging in humourous japery.

(**) I wish to emphasize that I am 100% in favour of vaccinations for deadly infectious diseases, even vaccines that are potentially dangerous. The number of people made ill or killed by the smallpox vaccine was tiny compared to the number of people who did not contract this deadly, contagious (and now effectively extinct) disease as a result of mass vaccination. I am a strong supporter of vaccine research. I’m just making an analogy here people.

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)

译文:在闭包中使用循环变量是有害的相关推荐

Swift之常见闭包与defer关键字的使用分析和闭包中的循环引用 | CSDN创作打卡
一.什么是闭包? 在 Swift 中,可以通过 func 定义一个函数,也可以通过闭包表达式定义一个函数,闭包是一个捕获了上下文的常量或者是变量的函数.闭包(Closures)是自包含的功能代码块,可 ...
Matlab中for循环中的循环变量在循环体中不能改变
如下代码: for i = 1:10if i==3i=i+5;end end 其中的 i=i+5 这一行不会改变for循环中i的值,for循环变量i会取哪些值,是第一次进入循环时就决定了的.无论在循环 ...
Javascript中的循环变量声明，到底应该放在哪儿？
不放走任何一个细节.相信很多Javascript开发者都在声明循环变量时犹豫过var i到底应该放在哪里:放在不同的位置会对程序的运行产生怎样的影响?哪一种方式符合Javascript的语言规范?哪 ...
python——闭包与闭包中修改外部变量
在函数嵌套的前提下,内部函数引用了外部函数的变量,并且外部函数返回(return)了内部函数,即外部函数返回了引用了外部函数变量的内部函数,这时我们称内部函数为闭包. 比如说如下例子: # 外部函数 ...
python循环变颜色_在Python中使用循环变量在matplotlib中指定颜色
我有很多数据文件,我想在同一个图上绘制所有数据,但颜色不同.我使用以下代码 from pylab import loadtxt, average, std, argsort from os impor ...
python中一个对象只能被一个变量引用吗_Python中for循环里的变量无法被引用的解决方法...
在之前的编程语言里,学到for循环里面是可以重置变量i的值的,然后让整个for循环从头开始,但是在python里面却行不通这是为什么呢? 在python中,for循环相当于一个迭代器(Iterator ...
循环变量到底应该使用int还是unsigned int？
通常循环变量在循环中会充当数组下标,所以为了保证不出线向下越界,直觉上我们会选择使用unsigned int类型的循环变量. 但在运行下面这段代码的时候,问题出现了. template <cla ...
编程语言中，循环变量通常都用 i？你知道为什么吗？
01 前天,我在朋友圈发了一个问题: 为什么编程中,循环变量通常都是用 i ? 没想到,回复的人这么多!要连翻好几页. 这个问题,有 2/3 的人回答正确,有少部分人知道,但是不太确定. 习惯性用 i ...
Python坑:bool是int的子类、列表循环中的变量泄露、lambda在闭包中会保存局部变量、重用全局变量
bool是int的子类 a = True print isinstance(a, int) print True == 1 print False == 0 运行结果: True True True ...

译文:在闭包中使用循环变量是有害的

译文:在闭包中使用循环变量是有害的相关推荐

最新文章

热门文章