超出三行部分用省略号显示_如何用三行数学建立神经网络

超出三行部分用省略号显示

人工智能的无代码指南 (A code-free guide to Artificial Intelligence)

So about a year ago, I read this fantastic article by Trask.

所以大约一年前，我读了Trask的这篇精彩文章。

If you didn’t click the link, do it now.

如果您没有单击链接，请立即执行。

Did you? Ok, good.

你是否？好好

Now here’s the thing — The article requires you to know a little bit of python, which you probably do.

现在是问题了—本文要求您了解一些python，您可能已经知道了。

On the off-chance that you’re interested in Neural Networks (if that phrase sounds utterly foreign to you, watch this YouTube playlist) and haven’t learned python yet, congratulations, you’re in the right spot.

不太可能是您对神经网络感兴趣(如果这句话听起来对您来说很陌生，请观看此YouTube播放列表 )，并且还没有学习python，恭喜，您来对地方了。

But regardless of where you are in the vast landscape of deep learning, I think that once in a while, it’s nice to go back to the basics and revisit the fundamental mathematical ideas that brought us Siri, Alexa, and endless hours of Netflix binge-watching.

但是，无论您处于广阔的深度学习领域中，无论何时，我都会不时回到基础知识，重新审视那些为我们带来Siri，Alexa和无尽的Netflix狂欢的基本数学思想，观看。

So without any further ado, I present to you the three equations that make up what I’ll call the “fundamental theorem of deep learning.”

因此，事不宜迟，我向您介绍构成我称之为“深度学习的基本定理”的三个方程。

1.线性回归 (1. Linear Regression)

The first equation is pretty basic. I guess the others are as well, but we’ll get to them in due time.

第一个方程很基本。我想其他人也一样，但我们会在适当的时候联系他们。

For now, all we’re doing is computing a vector z (from the equation above), where W is a matrix that is initially just filled with a bunch of random numbers, b is a vector that is initially just filled with a bunch of random numbers, and x vector that is not initially filled with a bunch of random numbers.

现在，我们要做的就是计算向量z (根据上述等式)，其中W是一个矩阵，最初仅填充一堆随机数， b是一个向量，最初填充一堆随机数。随机数，以及最初没有用一堆随机数填充的x向量。

x is a training example from our dataset. For example, if you’re training a neural net to predict someone’s age given their gender and height, you’d first need a few (or preferably a lot, the more, the merrier) examples of data in the form [[height, gender], age]. The vector [height, gender] is what we’re calling x.

x是我们数据集中的训练示例。例如，如果您正在训练神经网络以根据某人的性别和身高预测其年龄，则首先需要一些[[height, gender], age] 。向量[height, gender]是我们所说的x。

2.激活功能 (2. Activation Functions)

On the left-hand side, we have our predicted values of y, which is the variable that I’m using to denote the labels on our data.

在左侧，我们有y的预测值，这是我用来表示数据标签的变量。

The hat on top means that this value of y is a predicted value, as opposed to the ground truth labels from our dataset.

顶部的帽子表示y的此值是预测值，与数据集中的地面真实标签相反。

The z in this equation is the same one that we computed above. The sigma represents the sigmoid activation function, which looks like this:

该方程式中的z与我们上面计算的相同。 sigma表示S型激活函数，如下所示：

So in plain English, we’re taking z, a vector of real numbers that can be arbitrarily large or small, and squishing its components to be between 0 and 1.

因此，以通俗的英语来说，我们采用z ，它是可以任意大小的实数向量，并将其分量压缩为0到1之间。

Having a number between 0 and 1 is useful because if we’re trying to build a classifier, let’s say that predicts if an image is a cat or a dog, we can let 1 represent dogs, and we can let 0 be for cats. Or the other way around if you like cats more.

介于0到1之间的数字很有用，因为如果我们试图构建一个分类器，那么假设可以预测图像是猫还是狗，我们可以让1代表狗，而让0代表猫。或者，如果您更喜欢猫，则相反。

But suppose we’re not doing dogs and cats (yeah right, like there’s any other better use case for machine learning). Let’s go back to our age predictor. Over there, we can’t merely predict 1’s and 0’s.

但是，假设我们不是在做猫狗(是的，就像机器学习还有其他更好的用例一样)。让我们回到年龄预测器。在那边，我们不能仅仅预测1和0。

In general, you could use whatever function you like, not necessarily just a sigmoid. But a bunch of smart people noticed that the sigmoid worked pretty well. So we’re stuck with it.

通常，您可以使用自己喜欢的任何功能，而不必只是S型。但是一群聪明的人注意到，乙状结肠工作得很好。所以我们坚持下去。

However, it’s a different story when we’re dealing with labels that are actual numbers, and not classes. For our age predictor, we need to use a different activation function.

但是，当我们处理的是实际数字而不是类的标签时，情况就不同了。对于我们的年龄预测者，我们需要使用其他激活函数。

Enter ReLU.

输入ReLU。

Let me say upfront that I think that this is the most boring part of deep learning. I mean, seriously, just a boring ol’ straightforward-looking function? Where’s the fun in that?

让我先说一下，我认为这是深度学习中最无聊的部分。我的意思是说真的，只是一个无聊的简单功能？那里的乐趣在哪里？

Looks can be deceiving though. While it’s pretty dull — ReLU(x) is just max(0,x) — the ReLU function works really well in practice. So hey, live with it.

外观可能在欺骗。尽管很沉闷-ReLU( x )只是max(0,x) -ReLU函数在实践中确实很好用。所以，嘿，忍受它。

3.反向传播和梯度下降 (3. Back-propagation And Gradient Descent)

Ok, you got me. I cheated. It’s technically four lines of math. But hey, you could condense steps 1 and 2 into a single step, so I guess I come out victorious.

好吧，你懂了。我作弊了。从技术上讲，这是四行数学运算。但是，您可以将步骤1和2压缩为一个步骤，所以我想我会取得胜利。

Now to digest all of that (literal) Greek stuff.

现在来消化所有这些(文字上的)希腊文东西。

In the first equation, we’re doing that fancy stuff to y and y-hat to compute a single number called the loss, denoted by L.

在第一个方程式中，我们对y和y -hat做一些花哨的东西，以计算一个称为损失的单一数字，用L表示。

As can be inferred by the name, the loss measures how badly we’ve lost in our vicious battle to conquer the machine learning grimoire.

可以从名称中推断出，损失衡量我们在征服机器学习命令的恶性斗争中损失了多少。

In particular, our L here is measuring something called the binary cross entropy loss, which is a shortcut to sounding like you have a math Ph.D. when you’re actually just measuring how far y is from y-hat. Nevertheless, there’s a lot more under the surface of the equation, so check out Daniel Godoy’s article on the topic.

特别是，我们的L在这里测量的是所谓的二进制交叉熵损失，这听起来像是拥有数学博士学位的捷径。当您实际上只是在测量y与y -hat之间的距离时。但是，方程式的内容还有很多，因此请查看Daniel Godoy的有关该主题的文章。

All you need to know to get the intuition behind this stuff is that L gets big if our predicted values are far away from the ground truth values, and L gets tiny when our predictions and reality match.

要了解这些东西，您需要知道的是，如果我们的预测值与基本真实值相距太远，则L会变大，而当我们的预测与现实相匹配时， L会变小。

The sum is there so that we can add up all the messed-up-ness for each of the training examples, so that our neural net understands how messed-up it is overall.

总和在那里，所以我们可以将每个训练示例的所有混乱情况加起来，以便我们的神经网络了解总体情况。

Now, the actual “learning” part of deep learning begins.

现在，深度学习的实际“学习”部分开始了。

The final step in our stack is to update the matrix W and the vector b so that our loss goes down. By doing this, we are effectively minimizing how far are predictions are from the ground truth values, and thus, our model is getting more accurate.

我们堆栈的最后一步是更新矩阵W和向量b，以使我们的损失下降。通过这样做，我们有效地最小化了预测与基本真值之间的距离，因此，我们的模型变得越来越准确。

Here’s the equation again:

这又是等式：

W’ is the matrix with updated numbers that gets us closer to the ground truth. Alpha is a constant that we get to choose. That last term you’re looking at is the gradient of the loss with respect to a parameter. Put simply, it’s a measure of much our loss changes for a small tweak in the numbers in the W matrix.

W'是具有更新数字的矩阵，它使我们更接近地面真理。 Alpha是我们可以选择的常数。您要查看的最后一项是损耗相对于参数的梯度。简而言之，这是对W矩阵中的数字进行微小调整后我们的损失变化的量度。

Again, I’m not going to go too in-depth into gradient descent (the process of updating our numbers in the matrix) since there are already a lot of great resources on the topic. I’d highly recommend this article by Sebastian Ruder.

同样，我不会对梯度下降(矩阵中的数字更新过程)进行过深入的研究，因为有关该主题的资源已经很多。我强烈推荐Sebastian Ruder 撰写的这篇文章。

By the way, we can do the same thing for the initially random values in the b vector. Just tweak them by the right amount in the right direction, and BOOM! We just got closer to an all time low loss.

顺便说一下，我们可以对b向量中的初始随机值执行相同的操作。只需在正确的方向上以适当的数量调整它们，就可以了！我们刚刚接近历史上最低的损失。

结论 (Conclusion)

And there you have it. The three great equations that make up the foundations of the neural networks that we use today.

那里有。这三个伟大的方程式构成了我们今天使用的神经网络的基础。

Pause and ponder for a second. What you just saw is a compilation of humanity’s understanding of the intricacies of intelligence.

暂停和思考一秒钟。您刚刚看到的是人类对智力复杂性的理解的汇编。

Sure, this is a pretty basic vanilla neural net that we just looked at, and there have been countless improvements in learning algorithms over the years that have resulted in significant breakthroughs. When coupled with the unprecedented explosion of data and computing power over the years, it seems, to a degree, almost inevitable that well-thought out mathematics is able to grasp the subtle art of distinguishing cats and dogs.

当然，这是我们刚刚看过的一个非常基本的基本神经网络，并且多年来，学习算法已经取得了无数改进，从而取得了重大突破。加上多年来数据和计算能力的空前增长，在某种程度上，经过深思熟虑的数学似乎能够掌握区分猫和狗的微妙艺术，这几乎是不可避免的。

But still. This is where it all began.

但是还是。这就是一切的开始。

In a way, the heart and soul of this decade’s (arguably) most significant technological advancement lie right before your eyes. So take a second. Pause and ponder.

在某种程度上，这十年(可以说)最重要的技术进步的心脏和灵魂就在您的眼前。等一下暂停和思考。

翻译自: https://www.freecodecamp.org/news/how-to-build-a-neural-net-in-three-lines-of-math-a0c42f45c40e/

超出三行部分用省略号显示