循环直到 python_如果您在Python中存在慢循环，则可以对其进行修复……直到无法解决为止...

循环直到 python

by Maxim Mamaev

马克西姆·马马耶夫(Maxim Mamaev)

Let’s take a computational problem as an example, write some code, and see how we can improve the running time. Here we go.

让我们以一个计算问题为例，编写一些代码，看看如何改善运行时间。开始了。

场景设置：背包问题 (Setting the scene: the knapsack problem)

This is the computational problem we’ll use as the example:

这是计算问题，我们将使用它作为示例：

The knapsack problem is a well-known problem in combinatorial optimization. In this section, we will review its most common flavor, the 0–1 knapsack problem, and its solution by means of dynamic programming. If you are familiar with the subject, you can skip this part.

背包问题是组合优化中的一个众所周知的问题。在本节中，我们将回顾其最常见的特征， 0-1背包问题及其通过动态编程的解决方案。如果您熟悉此主题，则可以跳过此部分。

You are given a knapsack of capacity Cand a collection of N items. Each item has weight w[i] and value v[i]. Your task is to pack the knapsack with the most valuable items. In other words, you are to maximize the total value of items that you put into the knapsack subject, with a constraint: the total weight of the taken items cannot exceed the capacity of the knapsack.

给您一个容量为C的背包和N个物品的集合。每个项目的权重w [i]和值v [i] 。您的任务是用最有价值的物品包装背包。换句话说，您要最大程度地增加放入背包主题的物品的总价值，并要施加以下限制：所取物品的总重量不能超过背包的容量。

Once you’ve got a solution, the total weight of the items in the knapsack is called “solution weight,” and their total value is the “solution value”.

找到解决方案后，背包中物品的总重量称为“解决方案重量”，其总价值为“解决方案价值”。

The problem has many practical applications. For example, you’ve decided to invest $1600 into the famed FAANG stock (the collective name for the shares of Facebook, Amazon, Apple, Netflix, and Google aka Alphabet). Each share has a current market price and the one-year price estimate. As of one day in 2018, they are as follows:

该问题有许多实际应用。例如，您已决定向著名的FAANG股票(Facebook，Amazon，Apple，Netflix和Google aka Alphabet的股份统称)投资1600美元。每股都有当前的市场价格和一年的价格估算。截至2018年的一天，它们如下所示：

========= ======= ======= =========Company   Ticker  Price   Estimate========= ======= ======= =========Alphabet  GOOG    1030    1330Amazon    AMZN    1573    1675Apple     AAPL    162     193 Facebook  FB      174     216 Netflix   NFLX    312     327========= ======= ======= =========

For the simplicity of the example, we’ll assume that you’d never put all your eggs in one basket. You are willing to buy no more than one share of each stock. What shares do you buy to maximize your profit?

为了简化示例，我们假设您不会将所有鸡蛋都放在一个篮子里。您愿意购买不超过一股的股票。您购买什么股票以最大化利润？

This is a knapsack problem. Your budget ($1600) is the sack’s capacity (C). The shares are the items to be packed. The current prices are the weights (w). The price estimates are the values. The problem looks trivial. However, the solution is not evident at the first glance — whether you should buy one share of Amazon, or one share of Google plus one each of some combination of Apple, Facebook, or Netflix.

这是一个背包问题。您的预算($ 1600)是麻袋的容量(C) 。股份是要包装的物品。当前价格是重量(w) 。价格估计是价值。这个问题看起来微不足道。但是，乍看之下，该解决方案并不明显-您应该购买一股亚马逊，还是一股Google，再加上苹果，Facebook或Netflix的某种组合。

Of course, in this case, you may do quick calculations by hand and arrive at the solution: you should buy Google, Netflix, and Facebook. This way you spend $1516 and expect to gain $1873.

当然，在这种情况下，您可以手动进行快速计算并得出解决方案：您应该购买Google，Netflix和Facebook。这样，您花费$ 1516并期望获得$ 1873。

Now you believe that you’ve discovered a Klondike. You shatter your piggy bank and collect $10,000. Despite your excitement, you stay adamant with the rule “one stock — one buy”. Therefore, with that larger budget, you have to broaden your options. You decide to consider all stocks from the NASDAQ 100 list as candidates for buying.

现在您相信自己已经发现了克朗代克。您击碎了存钱罐并收集了$ 10,000。尽管很兴奋，但您仍然坚持“买一送一”的规则。因此，使用较大的预算，您必须扩大选择范围。您决定考虑将纳斯达克100名单中的所有股票视为购买候选者。

The future has never been brighter, but suddenly you realize that, in order to identify your ideal investment portfolio, you will have to check around 2¹⁰⁰ combinations. Even if you are super optimistic about the imminence and the ubiquity of the digital economy, any economy requires — at the least — a universe where it runs. Unfortunately, in a few trillion years when your computation ends, our universe won’t probably exist.

未来从未有过光明，但您突然意识到，要确定理想的投资组合，您将不得不检查大约2¹⁰⁰的组合。即使您对数字经济的迫在眉睫和无所不在感到非常乐观，但任何一种经济至少都需要一个运行它的宇宙。不幸的是，在您的计算结束的数万亿年中，我们的宇宙可能不存在。

动态规划算法 (Dynamic programming algorithm)

We have to drop the brute force approach and program some clever solution. Small knapsack problems (and ours is a small one, believe it or not) are solved by dynamic programming. The basic idea is to start from a trivial problem whose solution we know and then add complexity step-by-step.

我们必须放弃暴力手段，并制定一些明智的解决方案。小背包问题(无论信不信由你，我们的问题都很小)可以通过动态编程解决。基本思想是从一个我们知道的简单问题开始，然后逐步增加复杂性。

If you find the following explanations too abstract, here is an annotated illustration of the solution to a very small knapsack problem. This will help you visualize what is happening.

如果您发现以下说明过于抽象，则这里是带注释的图示，说明了非常小的背包问题的解决方案。这将帮助您可视化正在发生的事情。

Assume that, given the first i items of the collection, we know the solution values s(i, k) for all knapsack capacities k in the range from 0 to C.

假设给定集合的前i个项，我们知道所有背包容量k在0到C之间的解值s(i，k) 。

In other words, we sewed C+1 “auxiliary” knapsacks of all sizes from 0 to C. Then we sorted our collection, took the first i item and temporarily put aside all the rest. And now we assume that, by some magic, we know how to optimally pack each of the sacks from this working set of i items. The items that we pick from the working set may be different for different sacks, but at the moment we are not interested what items we take or skip. It is only the solution value s(i, k) that we record for each of our newly sewn sacks.

换句话说，我们缝制了大小从0到C的 C + 1个 “辅助”背包。然后，我们对集合进行了排序，拿走了第一个i项目，并暂时将其余所有项目搁置一旁。现在，我们假设，通过某种魔术，我们知道如何从该i个工作项集中最佳地包装每个麻袋。对于不同的麻袋，我们从工作集中挑选的物品可能会有所不同，但目前我们对我们拿走或跳过哪些物品不感兴趣。我们仅为每个新缝制的麻袋记录的解值s(i，k) 。

Now we fetch the next, (i+1)th, item from the collection and add it to the working set. Let’s find solution values for all auxiliary knapsacks with this new working set. In other words, we find s(i+1, k) for all k=0..C given s(i, k).

现在我们从集合中获取下一个(i + 1)项并将其添加到工作集中。让我们用这个新工作集找到所有辅助背包的解决方案值。换句话说，对于给定s(i，k)的所有k = 0..C ，我们找到s(i + 1 ，k) 。

If k is less than the weight of the new item w[i+1], we cannot take this item. Indeed, even if we took only this item, it alone would not fit into the knapsack. Therefore, s(i+1, k) = s(i, k) for all k < w[i+1].

如果k小于新项目w [i + 1]的权重，我们将不接受该项目。确实，即使我们只采取了这个项目，仅靠它就不能放入背包。因此，对于所有k <w [i +1]， s(i + 1，k)= s(i，k) 。

For the values k >= w[i+1] we have to make a choice: either we take the new item into the knapsack of capacity k or we skip it. We need to evaluate these two options to determine which one gives us more value packed into the sack.

为值K> = W [I 1]，我们必须做出选择：要么我们采取新项目插入卡帕奇吨 ÿk的背包或我们跳过它。我们需要评估这两个选项，以确定哪个选项可以给我们带来更多的价值。

If we take the (i+1)th item, we acquire the value v[i+1] and consume the part of the knapsack’s capacity to accommodate the weight w[i+1]. That leaves us with the capacity k–w[i+1] which we have to optimally fill using (some of) the first i items. This optimal filling has the solution value s(i, k–w[i+1]). This number is already known to us because, by assumption, we know all solution values for the working set of i items. Hence, the candidate solution value for the knapsack k with the item i+1 taken would be s(i+1, k | i+1 taken) = v[i+1] + s(i, k–w[i+1]).

如果我们采用第(i + 1)项，则获取值v [i + 1]并消耗背包容量的一部分来容纳权重w [i + 1] 。剩下的容量为k–w [i + 1] 我们必须使用前i个(某些)来最佳填充。该最优填充具有解值s(i，k–w [i + 1]) 。我们已经知道这个数字，因为根据假设，我们知道i个项目的工作集的所有解决方案值。因此，用于与项目i + 1取将是S中的背包k中的候选解决方案值(i + 1，K | I截取+ 1)= V [I + 1] + S(I，K-W [I + 1]) 。

The other option is to skip the item i+1. In this case, nothing changes in our knapsack, and the candidate solution value would be the same as s(i, k).

另一种选择是跳过项i + 1 。在这种情况下，背包中没有任何变化，候选解值将与s(i，k)相同 。

To decide on the best choice we compare the two candidates for the solution values:s(i+1, k | i+1 taken) = v[i+1] + s(i, k–w[i+1])s(i+1, k | i+1 skipped) = s(i, k)

为了确定最佳选择，我们比较了两个候选值的解： s(i + 1，k | i + 1取)= v [i + 1] + s(i，k–w [i + 1]) s(i + 1，k | i + 1已跳过)= s(i，k)

The maximum of these becomes the solution s(i+1, k).

这些的最大值变为解s(i + 1，k) 。

In summary:

综上所述：

if k < w[i+1]:    s(i+1, k) = s(i, k)else:    s(i+1, k) = max( v[i+1] + s(i, k-w[i+1]), s(i, k) )

Now we can solve the knapsack problem step-by-step. We start with the empty working set (i=0). Obviously, s(0, k) = 0 for any k. Then we take steps by adding items to the working set and finding solution values s(i, k) until we arrive at s(i+1=N, k=C) which is the solution value of the original problem.

现在我们可以逐步解决背包问题。我们从空工作集( i = 0 )开始 。显然，对于任何k ， s(0，k)= 0 。然后，我们采取步骤，将项目添加到工作集中并找到解值s(i，k)，直到得出原始问题的解值s(i + 1 = N，k = C) 。

Note that, by the way of doing this, we have built the grid of NxC solution values.

请注意，通过这样做，我们已经建立了NxC的网格解决方案值。

Yet, despite having learned the solution value, we do not know exactly what items have been taken into the knapsack. To find this out, we backtrack the grid. Starting from s(i=N, k=C), we compare s(i, k) with s(i–1, k).

然而，尽管已经了解了解决方案的价值，但我们不确切知道背包中已经带走了哪些物品。为了找出答案，我们回溯了网格。从s(i = N，k = C)开始 ，我们将s(i，k)与s(i-1，k)进行比较 。

If s(i, k) = s(i–1, k), the ith item has not been taken. We reiterate with i=i–1 keeping the value of k unchanged. Otherwise, the ith item has been taken and for the next examination step we shrink the knapsack by w[i] — we’ve set i=i–1, k=k–w[i].

如果s(i，k)= s(i–1，k) ，则第i个项目未被采用。我们以i = i–1重申k的值不变。否则，将采用第i个项目，在接下来的检查步骤中，我们将背包缩小w [i] -设置i = i–1，k = k–w [i] 。

This way we examine all items from the Nth to the first, and determine which of them have been put into the knapsack. This gives us the solution to the knapsack problem.

这样，我们检查了从第N个到第一个的所有物品，并确定其中哪些物品已放入背包中。这为我们提供了背包问题的解决方案。

代码与分析 (Code and analysis)

Now, as we have the algorithm, we will compare several implementations, starting from a straightforward one. The code is available on GitHub.

现在，有了算法，我们将从一个简单的方法开始比较几种方法。该代码可在GitHub上获得。

The data is the Nasdaq 100 list, containing current prices and price estimates for one hundred stock equities (as of one day in 2018). Our investment budget is $10,000.

该数据是纳斯达克100清单，包含当前价格和一百种股票的价格估计(截至2018年的一天)。我们的投资预算为10,000美元。

Recall that share prices are not round dollar numbers, but come with cents. Therefore, to get the accurate solution, we have to count everything in cents — we definitely want to avoid float numbers. Hence the capacity of our knapsack is ($)10000 x 100 cents = ($)1000000, and the total size of our problem N x C = 1 000 000.

回想一下，股价不是整数，而是美分。因此，要获得准确的解决方案，我们必须以美分来计数所有内容-我们绝对希望避免使用浮点数。因此，我们背包的容量为($)10000 x 100美分=($)1000000，而我们问题的总大小N x C = 1 000 000。

With an integer taking 4 bytes of memory, we expect that the algorithm will consume roughly 400 MB of RAM. So, the memory is not going to be a limitation. It is the execution time we should care about.

对于一个占用4个字节内存的整数，我们预计该算法将消耗大约400 MB的RAM。因此，存储器将不会受到限制。这是我们应该关心的执行时间。

Of course, all our implementations will yield the same solution. For your reference, the investment (the solution weight) is 999930 ($9999.30) and the expected return (the solution value) is 1219475 ($12194.75). The list of stocks to buy is rather long (80 of 100 items). You can obtain it by running the code.

当然，我们所有的实现都将产生相同的解决方案。供您参考，投资(解决方案权重)为999930($ 9999.30)，预期收益(解决方案价值)为1219475($ 12194.75)。购买的股票清单相当长(100件商品中的80件)。您可以通过运行代码获得它。

And, please, remember that this is a programming exercise, not investment advice. By the time you read this article, the prices and the estimates will have changed from what is used here as an example.

并且，请记住， 这是编程练习，而不是投资建议 。到您阅读本文时，价格和估计值已经与此处的示例有所不同。

普通的旧“ for”循环 (Plain old “for” loops)

The straightforward implementation of the algorithm is given below.

下面给出了该算法的直接实现。

There are two parts.

有两个部分。

In the first part (lines 3–7 above), two nested for loops are used to build the solution grid.

在第一部分(上面的3-7行)中，使用两个嵌套的for循环来构建解决方案网格。

The outer loop adds items to the working set until we reach N (the value of N is passed in the parameter items). The row of solution values for each new working set is initialized with the values computed for the previous working set.

外循环将项目添加到工作集中，直到达到N ( N的值在参数items传递)。每个新工作集的解决方案值行都使用为先前工作集计算的值进行初始化。

The inner loop for each working set iterates the values of k from the weight of the newly added item to C (the value of C is passed in the parameter capacity).

每个工作集的内部循环从新添加的item的权重中迭代k的值到C(C的值在参数capacity传递)。

Note that we do not need to start the loop from k=0. When k is less than the weight of item, the solution values are always the same as those computed for the previous working set, and these numbers have been already copied to the current row by initialisation.

注意，我们不需要从k = 0开始循环。当k 小于item的权重，解决方案值始终与为先前工作集计算的解决方案值相同，并且这些数字已通过初始化复制到当前行。

When the loops are completed, we have the solution grid and the solution value.

循环完成后，我们将获得解决方案网格和解决方案值。

The second part (lines 9–17) is a single for loop of N iterations. It backtracks the grid to find what items have been taken into the knapsack.

第二部分(第9-17行)是N的单个for循环迭代。它回溯网格以查找已放入背包的物品。

Further on, we will focus exclusively on the first part of the algorithm as it has O(N*C) time and space complexity. The backtracking part requires just O(N) time and does not spend any additional memory — its resource consumption is relatively negligible.

进一步地，我们将仅专注于算法的第一部分，因为它具有O(N * C)的时间和空间复杂度。回溯部分只需要O(N)时间，而不会花费任何额外的内存-其资源消耗相对可以忽略不计。

It takes 180 seconds for the straightforward implementation to solve the Nasdaq 100 knapsack problem on my computer.

直接实现需要180秒才能解决计算机上的Nasdaq 100背包问题。

How bad is it? On the one hand, with the speeds of the modern age, we are not used to spending three minutes waiting for a computer to do stuff. On the other hand, the size of the problem — a hundred million — looks indeed intimidating, so, maybe, three minutes are ok?

有多糟一方面，随着现代时代的飞速发展，我们不习惯于花三分钟时间等待计算机来做事情。另一方面，问题的规模(一亿)看起来确实令人生畏，所以也许三分钟就可以了吗？

To obtain some benchmark, let’s program the same algorithm in another language. We need a statically-typed compiled language to ensure the speed of computation. No, not C. It is not fancy. We’ll stick to fashion and write in Go:

为了获得一些基准，让我们用另一种语言编写相同的算法。我们需要一种静态类型的编译语言来确保计算速度。不，不是C。这不是幻想。我们将坚持时尚并用Go语言编写：

As you can see, the Go code is quite similar to that in Python. I even copy-pasted one line, the longest, as is.

如您所见，Go代码与Python中的代码非常相似。我什至可以复制粘贴最长的一行。

What is the running time? 400 milliseconds! In other words, Python came out 500 times slower than Go. The gap will probably be even bigger if we tried it in C. This is definitely a disaster for Python.

几点钟了？ 400毫秒 ！换句话说，Python的发布速度比Go慢500倍。如果我们在C中进行尝试，差距可能会更大。对于Python来说，这绝对是一场灾难。

To find out what slows down the Python code, let’s run it with line profiler. You can find profiler’s output for this and subsequent implementations of the algorithm at GitHub.

要找出导致Python代码变慢的原因，让我们使用line profiler运行它。您可以在GitHub上找到此算法及其后续实现的探查器输出。

In the straightforward solver, 99.7% of the running time is spent in two lines. These two lines comprise the inner loop, that is executed 98 million times:

在简单的求解器中，99.7％的运行时间花费在两行中。这两行组成内部循环，执行了9800万次：

I apologize for the excessively long lines, but the line profiler cannot properly handle line breaks within the same statement.

对于过长的行，我深表歉意，但是行探查器无法正确处理同一条语句中的换行符。

I’ve heard that Python’s for operator is slow but, interestingly, the most time is spent not in the for line but in the loop’s body.

我听说Python的for运算符很慢，但是有趣的是，大多数时间不是花在for行上，而是花在循环体内。

We can break down the loop’s body into individual operations to see if any particular operation is too slow:

我们可以将循环的主体分解为单独的操作，以查看任何特定的操作是否太慢：

It appears that no particular operation stands out. The running times of individual operations within the inner loop are pretty much the same as the running times of analogous operations elsewhere in the code.

似乎没有什么特别的操作引人注目。内部循环中单个操作的运行时间与代码中其他位置的类似操作的运行时间几乎相同。

Note how breaking the code down increased the total running time. The inner loop now takes 99.9% of the running time. The dumber your Python code, the slower it gets. Interesting, isn’t it?

注意分解代码如何增加总运行时间。现在，内部循环占用了99.9％的运行时间。使您的Python代码变得笨拙，速度变慢。有趣，不是吗？

内置地图功能 (Built-in map function)

Let’s make the code more optimised and replace the inner for loop with a built-in map() function:

让我们使代码更优化，并用内置的map()函数替换内部的for循环：

The execution time of this code is 102 seconds, being 78 seconds off the straightforward implementation’s score. Indeed, map() runs noticeably, but not overwhelmingly, faster.

该代码的执行时间为102秒 ，比简单实现的得分低78秒。确实， map()运行速度明显加快了，但并非绝对如此。

清单理解 (List comprehension)

You may have noticed that each run of the inner loop produces a list (which is added to the solution grid as a new row). The Pythonic way of creating lists is, of course, list comprehension. Let’s try it instead of map().

您可能已经注意到，内部循环的每次运行都会产生一个列表(将其作为新行添加到解决方案网格中)。创建列表的Python方式当然是列表理解。让我们尝试一下，而不是map() 。

This finished in 81 seconds. We’ve achieved another improvement and cut the running time by half in comparison to the straightforward implementation (180 sec). Out of the context, this would be praised as significant progress. Alas, we are still light years away from our benchmark 0.4 sec.

这完成了81秒 。与简单的实现(180秒)相比，我们已经实现了另一项改进，并将运行时间缩短了一半。从上下文来看，这将被视为重大进展。 las，我们离基准测试0.4秒还有很短的路程。

NumPy数组 (NumPy arrays)

At last, we have exhausted built-in Python tools. Yes, I can hear the roar of the audience chanting “NumPy! NumPy!” But to appreciate NumPy’s efficiency, we should have put it into context by trying for, map() and list comprehension beforehand.

最后，我们用尽了内置的Python工具。是的，我可以听到观众高呼“ NumPy！ NumPy！” 但是要欣赏NumPy的效率，我们应该事先尝试for ， map()和列表理解将其置于上下文中。

Ok, now it is NumPy time. So, we abandon lists and put our data into numpy arrays:

好的，现在是NumPy时间。因此，我们放弃列表并将数据放入numpy数组中：

Suddenly, the result is discouraging. This code runs 1.5 times slower than the vanilla list comprehension solver (123 sec versus 81 sec). How can that be?

突然，结果令人沮丧。该代码的运行速度比普通列表理解求解器慢了1.5倍( 123秒对81秒)。怎么可能？

Let’s examine the line profiles for both solvers.

让我们检查两个求解器的线轮廓。

Initialization of grid[0] as a numpy array (line 274) is three times faster than when it is a Python list (line 245). Inside the outer loop, initialization of grid[item+1] is 4.5 times faster for a NumPy array (line 276) than for a list (line 248). So far, so good.

将grid[0]初始化为numpy数组(第274行)比将其作为Python列表(第245行)快三倍。在外循环内，对于NumPy数组(第276行)， grid[item+1]初始化速度比列表(第248行)的4.5倍快。到目前为止，一切都很好。

However, the execution of line 279 is 1.5 times slower than its numpy-less analog in line 252. The problem is that list comprehension creates a list of values, but we store these values in a NumPy array which is found on the left side of the expression. Hence, this line implicitly adds an overhead of converting a list into a NumPy array. With line 279 accounting for 99.9% of the running time, all the previously noted advantages of numpy become negligible.

但是，第279行的执行速度比第252行中的无numpy模拟执行速度慢1.5倍。问题是列表理解会创建一个值列表，但我们将这些值存储在NumPy数组中 ，该数组位于该函数的左侧表达方式。因此，此行隐式增加了将列表转换为NumPy数组的开销。由于279行占运行时间的99.9％，因此numpy之前提到的所有优点都可以忽略不计。

But we still need a means to iterate through arrays in order to do the calculations. We have already learned that list comprehension is the fastest iteration tool. (By the way, if you try to build NumPy arrays within a plain old for loop avoiding list-to-NumPy-array conversion, you’ll get the whopping 295 sec running time.) So, are we stuck and is NumPy of no use? Of course, not.

但是我们仍然需要一种遍历数组的方法来进行计算。我们已经了解到列表理解是最快的迭代工具。 (顺便说一句，如果您尝试在一个普通的for循环中构建NumPy数组， for避免从列表到NumPy数组的转换，那么您将获得295秒的运行时间。)因此，我们被困住了，NumPy没有用？当然不是。

正确使用NumPy (Proper use of NumPy)

Just storing data in NumPy arrays does not do the trick. The real power of NumPy comes with the functions that run calculations over NumPy arrays. They take arrays as parameters and return arrays as results.

仅将数据存储在NumPy数组中并不能解决问题。 NumPy的真正功能在于对NumPy数组进行计算的函数。他们将数组作为参数，并将数组作为结果。

For example, there is function where() which takes three arrays as parameters: condition, x, and y, and returns an array built by picking elements either from x or from y. The first parameter, condition, is an array of booleans. It tells where to pick from: if an element of condition is evaluated to True, the corresponding element of x is sent to the output, otherwise the element from y is taken.

例如，有一个where()函数where()它将三个数组作为参数： condition ， x和y ，并返回一个通过从x或y选择元素构建的数组。第一个参数condition是布尔数组。它告诉从哪里选择：如果condition的元素被评估为True ，则x的对应元素被发送到输出，否则取y的元素。

Note that the NumPy function does all this in a single call. Looping through the arrays is put away under the hood.

请注意，NumPy函数在单个调用中完成了所有这些操作。遍历阵列的循环被隐藏在引擎盖下。

This is how we use where() as a substitute of the internal for loop in the first solver or, respectively, the list comprehension of the latest:

这是我们在第一个求解器或最新的列表理解中使用where()代替内部for循环的方式：

There are three pieces of code that are interesting: line 8, line 9 and lines 10–13 as numbered above. Together, they substitute for the inner loop which would iterate through all possible sizes of knapsacks to find the solution values.

有三段有趣的代码：第8行，第9行和第10-13行，如上所示。它们一起代替了内部循环，该内部循环将遍历背包的所有可能尺寸以找到解值。

Until the knapsack’s capacity reaches the weight of the item newly added to the working set (this_weight), we have to ignore this item and set solution values to those of the previous working set. This is pretty straightforward (line 8):

在背包的容量达到新添加到工作集中的项目的重量( this_weight )之前，我们必须忽略此项目并将解决方案值设置为先前工作集的值。这非常简单(第8行)：

grid[item+1, :this_weight] = grid[item, :this_weight]

Then we build an auxiliary array temp (line 9):

然后我们建立一个辅助数组temp (第9行)：

temp = grid[item, :-this_weight] + this_value

This code is analogous to, but much faster than:

该代码类似于但比以下代码快得多：

[grid[item, k — this_weight] + this_value  for k in range(this_weight, capacity+1)]

It calculates would-be solution values if the new item were taken into each of the knapsacks that can accommodate this item.

如果将新物品放入可容纳该物品的每个背包中，它将计算可能的解决方案值。

Note how thetemp array is built by adding a scalar to an array. This is another powerful feature of NumPy called “broadcasting”. When NumPy sees operands with different dimensions, it tries to expand (that is, to “broadcast”) the low-dimensional operand to match the dimensions of the other. In our case, the scalar is expanded to an array of the same size as grid[item, :-this_weight] and these two arrays are added together. As a result, the value of this_value is added to each element of grid[item, :-this_weight]— no loop is needed.

请注意如何通过向数组添加标量来构建temp数组。这是NumPy的另一个强大功能，称为“广播”。当NumPy看到尺寸不同的操作数时，它将尝试扩展(即“广播”)低维操作数以匹配另一个尺寸。在我们的例子中，标量被扩展为与grid[item, :-this_weight]大小相同的数组，并将这两个数组加在一起。结果， this_value的值被添加到grid[item, :-this_weight]每个元素中-不需要循环。

In the next piece (lines 10–13) we use the function where() which does exactly what is required by the algorithm: it compares two would-be solution values for each size of knapsack and selects the one which is larger.

在下一部分(第10-13行)中，我们使用了where()函数where()该函数完全满足算法的要求：针对每个背包尺寸，它比较两个可能的解决方案值，然后选择一个较大的值。

grid[item + 1, this_weight:] =                 np.where(temp > grid[item, this_weight:],             temp,             grid[item, this_weight:])

The comparison is done by the condition parameter, which is calculated as temp > grid[item, this_weight:]. This is an element-wise operation that produces an array of boolean values, one for each size of an auxiliary knapsack. A True value means that the corresponding item is to be packed into the knapsack. Therefore, the solution value taken from the array is the second argument of the function, temp. Otherwise, the item is to be skipped, and the solution value is copied from the previous row of the grid — the third argument of the where()function .

比较是通过condition参数完成的， condition参数的计算方式为temp > grid[item, this_weigh t：]。这是一个逐个元素的操作，它生成一个布尔值数组，每个辅助背包的大小都对应一个布尔值。 AT řUE值的装置，所述对应的产品被包装到所述背包。因此，从数组中获得的解值是函数n, t emp的第二个参数。否则，该项目将被跳过，并且所述溶液的值从电网中的前一行复制- T的第三个参数he wher E()函数。

At last, the warp drive engaged! This solver executes in 0.55 sec. This is 145 times faster than the list comprehension-based solver and 329 times faster than the code using thefor loop. Although we did not outrun the solver written in Go (0.4 sec), we came quite close to it.

最后，翘曲驱动器启动了！该求解器在0.55秒内执行。这比基于列表理解的求解器快145倍，比使用for循环的代码快329倍。尽管我们没有超出用Go编写的求解器(0.4秒)的速度，但我们离它很近。

有些循环会留下来 (Some loops are to stay)

Wait, but what about the outer for loop?

等待，但是外部for循环呢？

In our example, the outer loop code, which is not part of the inner loop, is run only 100 times, so we can get away without tinkering with it. However, other times the outer loop can turn out to be as long as the inner.

在我们的示例中，外部循环代码(不是内部循环的一部分)仅运行100次，因此我们无需修改就可以逃脱。但是，有时其他情况下，外部循环可能与内部循环一样长。

Can we rewrite the outer loop using a NumPy function in a similar manner to what we did to the inner loop? The answer is no.

我们是否可以使用NumPy函数以类似于内部循环的方式重写外部循环？答案是不。

Despite both being for loops, the outer and inner loops are quite different in what they do.

尽管这两个都是for循环，但外部和内部循环的工作方式却大不相同。

The inner loop produces a 1D-array based on another 1D-array whose elements are all known when the loop starts. It is this prior availability of the input data that allowed us to substitute the inner loop with either map(), list comprehension, or a NumPy function.

内部循环根据另一个1D数组生成一个1D数组，当循环开始时，其所有元素都是已知的。正是输入数据的这种先验可用性使我们可以使用map() ，列表理解或NumPy函数替换内部循环。

The outer loop produces a 2D-array from 1D-arrays whose elements are not known when the loop starts. Moreover, these component arrays are computed by a recursive algorithm: we can find the elements of the (i+1)th array only after we have found the ith.

外环产生从一维阵列，其元素在循环开始时，不知道的2D阵列。此外，这些组件数组是通过递归算法计算的：只有找到第i个元素，我们才能找到第(i + 1)个数组的元素。

Suppose the outer loop could be presented as a function:grid = g(row0, row1, … rowN) All function parameters must be evaluated before the function is called, yet only row0 is known beforehand. Since the computation of the (i+1)th row depends on the availability of the ith, we need a loop going from 1 to N to compute all the row parameters. Therefore, to substitute the outer loop with a function, we need another loop which evaluates the parameters of this function. This other loop is exactly the loop we are trying to replace.

假设外部循环可以表示为一个函数： grid = g(row0, row1, … rowN)必须在调用函数之前对所有函数参数求值，但只有row0事先已知。由于计算第(i + 1)行依赖于第i个的可用性，我们需要一个循环从去1至N计算所有的row 参数。因此，要用一个函数代替外部循环，我们需要另一个循环来评估该函数的参数。另一个循环正是我们要替换的循环。

The other way to avoid the outer for loop is to use the recursion. One can easily write the recursive function calculate(i) that produces the ith row of the grid. In order to do the job, the function needs to know the (i-1)th row, thus it calls itself as calculate(i-1) and then computes the ith row using the NumPy functions as we did before. The entire outer loop can then be replaced with calculate(N). To make the picture complete, a recursive knapsack solver can be found in the source code accompanying this article on GitHub.

避免外部for循环的另一种方法是使用递归。可以很容易地编写生成网格第i行的递归函数calculate(i) 。为了完成这项工作，该函数需要知道第(i-1)行，因此将其自身称为calculate(i-1) ，然后像以前一样使用NumPy函数计算第i行。然后可以将整个外部循环替换为calculate(N) 。为了使图片更完整，可以在GitHub上随本文附带的源代码中找到一个递归背包求解器。

However, the recursive approach is clearly not scalable. Python is not tail-optimized. The depth of the recursion stack is, by default, limited by the order of one thousand. This limit is surely conservative but, when we require a depth of millions, stack overflow is highly likely. Moreover, the experiment shows that recursion does not even provide a performance advantage over a NumPy-based solver with the outer for loop.

但是，递归方法显然不可扩展。 Python并非尾部优化。默认情况下，递归堆栈的深度受一千个量级的限制。这个限制肯定是保守的，但是当我们需要数百万的深度时，很有可能发生堆栈溢出。此外，实验表明，与带有外部for循环的基于NumPy的求解器相比，递归甚至无法提供性能优势。

This is where we run out of the tools provided by Python and its libraries (to the best of my knowledge). If you absolutely need to speed up the loop that implements a recursive algorithm, you will have to resort to Cython, or to a JIT-compiled version of Python, or to another language.

据我们所知，这是我们用完Python及其库提供的工具的地方。如果您绝对需要加快实现递归算法的循环，则必须使用Cython，JIT编译的Python版本或另一种语言。

外卖 (Takeaways)

Do numerical calculations with NumPy functions. They are two orders of magnitude faster than Python’s built-in tools.使用NumPy函数进行数值计算。它们比Python的内置工具快两个数量级。
Of Python’s built-in tools, list comprehension is faster than map() , which is significantly faster than for.

在Python的内置工具中，列表理解比map()快，而map()则比for快得多。
For deeply recursive algorithms, loops are more efficient than recursive function calls.对于深度递归算法，循环比递归函数调用更有效。
You cannot replace recursive loops with map(), list comprehension, or a NumPy function.

您不能用map() ，列表理解或NumPy函数替换递归循环。
“Dumb” code (broken down into elementary operations) is the slowest. Use built-in functions and tools.“哑”代码(分解为基本操作)是最慢的。使用内置的功能和工具。

翻译自: https://www.freecodecamp.org/news/if-you-have-slow-loops-in-python-you-can-fix-it-until-you-cant-3a39e03b6f35/

循环直到 python