邮件伪造

入门指南 (An Introductory Guide)

Although many are familiar with the incredible results produced by deepfakes, most people find it hard to understand how the deepfakes actually work. Hopefully, this article will demystify some of the math that goes into creating a deepfake.

尽管许多人熟悉Deepfake产生的令人难以置信的结果，但大多数人发现很难理解Deepfake的实际工作原理。希望本文能使创建Deepfake的一些数学方法神秘化。

Deepfake generally refers to videos in which the face and/or voice of a person, usually a public figure, has been manipulated using artificial intelligence software in a way that makes the altered video look authentic. — Dictionary.com

Deepfake通常是指使用人工智能软件以某种方式使人(通常是公众人物)的脸部和/或声音被操纵过的视频，以使更改后的视频看起来真实。 — Dictionary.com

It turns that deepfake is a sort of umbrella term, with no definitive way to create. However, most deepfakes are created with a deep learning framework known as generative adversarial nets, or GANs, so that will be the main focus of this article.

事实证明，deepfake是一种笼统的术语，没有确定的创建方式。但是，大多数Deepfake都是使用称为生成对抗网络(GAN)的深度学习框架创建的，因此这将是本文的重点。

什么是GAN？ (What is a GAN?)

Generative adversarial nets — or GANs for short —are a deep learning model that was first proposed in a 2014 paper by Ian Goodfellow and his colleagues. The model operates by simultaneously training two neural networks in an adversarial game.

生成对抗网络 (简称GAN)是一种深度学习模型，由Ian Goodfellow及其同事在2014年的论文中首次提出。该模型通过在对抗游戏中同时训练两个神经网络来运行。

Abstractly, we would have generative model G, that is trying to learn a distribution p_g which replicates p_data, the distribution of the data set, while a discriminative model D tries to determine whether or not a piece of data came from the data set or the generator. Although seeing this for the first time may be intimidating, that math becomes relatively straightforward when looking at an example.

抽象，我们必须生成模型G，即试图了解它复制P_DATA，该数据集的分布进行分布P_G，而判别模型d试图确定一个数据是否从数据集或来发电机。尽管第一次看到这个可能会令人生畏，但是在看一个例子时，数学变得相对简单。

Classically, GANs are explained are explained using the analogy of producing counterfeit money. To set up the situation, there is an organization of counterfeiters who try to produce counterfeit money, while the police are trying to detect whether or not money is counterfeited. Here, our counterfeiters can be treated as the generative model G that produces fake money with the distribution p_g. A distribution is essentially a “map” of characteristics that describes the features of money. Basically, the counterfeiters are producing money with some set of characteristics described by the distribution p_g. Furthermore, the role of the police is the discriminate between real and counterfeited money, so they play the part of the discriminative model D. In practice, these models are often multi-layer perceptrons, but there is no need to specify the type of neural network when only discussing theory.

传统上，使用产生假币的类比解释GAN。为了解决这种情况，有一个伪造者组织，他们试图生产伪造的钱，而警察则试图查明是否伪造了金钱。在这里，我们的造假者可以被视为生成模型g的生成模型G ，该假模型的分布为p_g 。分配实质上是描述货币特征的特征“图”。伪造者基本上是在生产具有分布p_g描述的某些特征的货币。此外，警察的作用是区分真钱和假币，因此，他们扮演了区分模型D的角色。实际上，这些模型通常是多层感知器 ，但是仅在讨论理论时就无需指定神经网络的类型。

this is a setup for our example scenario

Initially, the money produced by the counterfeiters might have many flaws, so the police can easily detect that the money is produced by the counterfeiters; in other words, the police know when money comes from the distribution p_g. As time progresses, both the police and counterfeiters become more proficient in their work. For the counterfeiters, this means that the money they produce will better resemble real money; mathematically, this is shown when the distribution of counterfeit money, p_g, approaches the distribution of real money, p_data. On the other hand, the police become more accurate at detecting whether or not money comes from p_data or p_g. However, the counterfeiters will eventually reach a point where the counterfeited money can pass for real money and fool the police. This occurs when the distributions p_g and p_data are the same; simply put, the features of the counterfeit money match those of real money. It turns out that this measure of “distance” can be calculated in many ways, each working in slightly different ways. With this knowledge in hand, we can set a goal for the counterfeiters: learn the distribution p_g, such that it equals the distribution of data p_data. Similarly, we set a goal for the police: maximize the accuracy of detecting counterfeit money.

最初，伪造者产生的金钱可能有很多缺陷，因此警察可以轻松地检测到伪造者产生的金钱；换句话说，警察知道何时金钱来自分配p_g 。随着时间的流逝，警察和造假者都变得更加精通他们的工作。对于造假者而言，这意味着他们产生的货币将更好地类似于真实货币；数学上，这是所示，当假币分布，P_G，接近的真金白银，P_DATA分布。在另一方面，警方成为检测的钱是否来自P_DATA或P_G更准确。但是，伪造者最终将达到使伪造货币可以用来赚取真钱并欺骗警察的地步。当分布P_G和P_DATA是相同的这种情况发生时; 简而言之，假币的特征与真实货币的特征相匹配。事实证明，这种“距离”度量可以通过多种方式计算，每种方式的工作方式略有不同。有了这些知识在手，我们可以设置一个目标造假：学习分布P_G，使得它等于数据P_DATA的分布。同样，我们为警察设定了一个目标：最大程度地提高发现假币的准确性。

this is a slightly more formal setup for the example

Up until now, we have largely neglected the specifics of how these models actually operate, so we will begin with describing the generator G. Going back to the previous example with counterfeiting money, our generator needs to take in some input that specifies what kind of money is being created. This means that input corresponding to creating a one dollar bill will differ from the input corresponding to creating a ten dollar bill. For consistency, we will define this input using the variable z that comes from the distribution p_z. The distribution p_z gives a rough idea of what kinds of money can be counterfeited. Furthermore, the outputs of the generator, expressed as G(z), can be described with the distribution p_g. Shifting our focus to the discriminator, we begin by examining the role it plays. Namely, our discriminator should tell us whether or not some piece data is from our data set or the generator. It turns out that probabilities are perfectly suited for this! Specifically, when our discriminator takes in some input x, D(x) should return a number between 0 and 1 representing the probability that x is from the data set. To see why our discriminator does is allowed to return values between 0 and 1, we will examine the case where our input somewhat resembles something from the data set. Revisiting our previous example, say we had a US dollar with small scuff marks in the corner and another US dollar with a figure of Putin printed on it. Without a doubt, the second bill is much more suspicious compared to first, so it is easily classified as fake (discriminator returns 0). However, our first bill still has the chance of being genuine, and classifying it with a 0 would mean is looks just as bad as bill number two. Obviously, we are losing some information regarding bill one, and it might be best to classify it with a number like 0.5, where our discriminator has some doubts that is genuine but is not certain that it is a fake. Simply put, our discriminator returns a number that represents its confidence level that an input comes from the data set.

到现在为止，我们在很大程度上都忽略了这些模型实际运行方式的细节，因此我们将从描述生成器G开始。回到前面关于伪造货币的示例，我们的生成器需要输入一些输入来指定要创建哪种货币。这意味着对应于创建一美元钞票的输入将不同于对应于创建十美元钞票的输入。为了保持一致性，我们将使用来自分布p_z的变量z定义此输入。分布p_z给出了可以伪造哪种货币的粗略概念。此外，可以用分布p_g描述发电机的输出，表示为G ( z )。将重点转移到歧视者上，我们首先研究其所扮演的角色。也就是说，我们的鉴别器应该告诉我们一些片段数据是否来自我们的数据集或生成器。事实证明，概率完全适合于此！具体来说，当我们的鉴别器接受某些输入x时， D ( x )应该返回一个介于0和1之间的数字，表示x来自数据集的概率。要了解为什么允许我们的鉴别器返回0到1之间的值，我们将研究输入与数据集中的情况有点相似的情况。再看前面的例子，假设我们在角落里有一小划痕的美元，在上面印有普京的数字。毫无疑问，第二张钞票比第一张钞票更具可疑性，因此很容易将其分类为伪造的(鉴别符返回0)。但是，我们的第一张钞票仍然有可能是真实的，并且将其分类为0意味着看起来和第二张钞票一样糟糕。显然，我们正在丢失一些有关法案一的信息，最好将其分类为0.5之类的数字，我们的判别器会怀疑是真实的，但不确定是假的。简而言之，我们的鉴别器将返回一个数字，该数字表示其输入来自数据集的置信度。

推导误差函数 (Deriving an Error Function)

Now that we have a rough understanding of what our models G and D should be doing, we still need a way to evaluate their performances; this is where error functions come into play. Basically, an error function, E, tells us how poorly our model is performing given a its current set of parameters. For example, say we had a model that was being trained to recognize various objects. If we showed the model a bicycle, and the model sees a tricycle, the error function would return a relatively small error since the two are so similar. However, if the model saw the bicycle as a truck or school building, the error function would return a much larger number as there is little to no similarity in between these. In other words, error is low if the predictions of our model closely match the actual data, and error is large when the predictions do not match the actual data at all.

既然我们对G和D模型应该做的事情有了一个大概的了解，我们仍然需要一种评估它们性能的方法。这就是错误功能发挥作用的地方。基本上，误差函数E告诉我们，在给定其当前参数集的情况下，模型的执行效果如何。例如，假设我们有一个正在接受训练以识别各种物体的模型。如果我们向模型显示自行车，并且模型看到三轮车，则误差函数将返回相对较小的误差，因为两者是如此相似。但是，如果模型将自行车视为卡车或学校建筑物，则误差函数将返回更大的数字，因为两者之间几乎没有相似性。换句话说，如果我们模型的预测与实际数据紧密匹配，则误差较小；而当预测与实际数据完全不匹配时，误差较大。

Armed with this knowledge, we begin laying out some desired characteristics that our error function should have. First of all, the error function should return a large number when our discriminator misclassifies data, and a small number when data is classified correctly. In order to understand what this means, we begin by defining classifications. Essentially, a classification is a label for some piece of data. For a example, a red robin would be put under the classification of birds, while tuna would be put under the classification of fish. In our case, an input to our discriminator can come from two places, the data set or the generator. For convenience which we will see later on, we classify data that comes the generator by giving it a label of 0, while data that comes from the data set will be given the label 1. Using this, we can further elaborate on our error function. For example, say we have some piece of data, x, with the label 1. If our discriminator predicts that x is from the data set (D(x) returns a number close to 1), then our discriminator would have correctly predicted the classification of x and the error would be low. However, if our discriminator predicted that x was from the generator (D(x) returns a number close to 0), then our discriminator would have incorrectly classified our data and error would be high.

掌握了这些知识之后，我们便开始列出错误函数应具有的一些所需特征。首先，当我们的鉴别器对数据进行错误分类时，错误函数应该返回一个大数字，而当数据正确分类时，错误函数应该返回一个小数字。为了理解这意味着什么，我们首先定义分类。本质上，分类是某些数据的标签。例如，将红知更鸟放在鸟的分类下，而金枪鱼则放在鱼的分类下。在我们的例子中，鉴别器的输入可以来自两个地方，即数据集或生成器。为了方便起见，我们将在后面看到，我们通过将生成器中的数据标记为0来对生成器中的数据进行分类，而将数据集中的数据标记为1。使用此标记，我们可以进一步详细说明错误函数。例如，假设我们有一些数据x带有标签1。如果我们的鉴别器预测x来自数据集( D ( x )返回接近1的数字)，那么我们的鉴别器将正确预测x的分类，误差很小。但是，如果我们的鉴别器预测x来自生成器( D ( x )返回接近0的数字)，那么我们的鉴别器将错误地对我们的数据进行分类，并且误差会很大。

this represents how our error function should behave

As we look for an ideal function, we notice that the graph of y = log(x) on the interval [0,1] matches our specification after some manipulation.

当我们寻找理想函数时，我们注意到间隔[0,1]上的y = log( x )的图经过一些操作后符合我们的规范。

In particular, flipping the graph around the x-axis results results in the error function where our label is 1. Reflecting this new graph across the line y=0.5, then reveals the error function for when our label is 0. The equations for these are y = -log(x) and y = -log(1-x) respectively, and can be seen below.

特别是，将图形绕x轴翻转会导致我们的标签为1的误差函数。在y = 0.5线上反映此新图形，然后揭示标签为0时的误差函数。这些方程式分别为y = -log( x )和y = -log(1- x )，如下所示。

**label = 0** on the left and **标签= 0**且右侧的**label = 1** on the right标签= 1时的错误函数

Putting these two functions together, we can create the following “piece-wise” function.

将这两个函数放在一起，我们可以创建以下“逐段”函数。

We can substitute **x = D(G(z))** when **label = 0** and **x = D(x)** when **label = 1**. When **label = 0,** we are evaluating the error of our discriminator when it takes an image from the generator as input. When **label = 1**, we are finding the error of our discriminator when it takes something from our data set as an input.

Unfortunately, this formula is a little cumbersome to write out, so want to find a way to reduce down to one line. We begin by giving our error function a proper name, like E. Additionally, we will also want to create a variable to represent our label, since writing out label is inefficient; we will call this new variable y. Here is where a little bit of genius comes into play. When we treat y not only as a label, but also as a number, we can actually reduce this formula into the following:

不幸的是，这个公式写起来有点麻烦，所以想找到一种减少到一行的方法。我们首先给错误函数起一个适当的名称，例如E。另外，由于写标签效率低下，我们还希望创建一个变量来表示标签。我们将这个新变量称为y 。这是一些天才发挥作用的地方。当我们不仅将y视为标签，而且将其视为数字时，我们实际上可以将该公式简化为以下形式：

Notice, that when y = 0 (label is 0), the (1 - y) coefficient turns into 1, while the term y(log(D(x)) turns into 0. When y = 1 (label is 1), something similar occurs. The first term reduces to 0 leaving us with -log(D(x)). It turns out that these results exactly equal our “piece-wise” function. On an unrelated note, this error function is also known as binary cross entropy.

请注意，当y = 0(标号为0)时，(1- y )系数变为1，而项y (log( D ( x ( x )))变为0。当y = 1(标号为1)时，类似的情况发生了，第一项减少到0，剩下-log( D ( x ( x ))。结果证明，这些结果与我们的“逐段”函数完全相同。在一个不相关的注释中，该误差函数也称为二元交叉熵 。

One quick thing to note is that the paper which introduces GANs uses the error function -E instead. Therefore, in order to stay consistent with the original paper, we will redefine our error function to -E.

值得注意的一件事是，介绍GAN的论文使用了误差函数-E 。因此，为了与原始论文保持一致，我们将误差函数重新定义为-E 。

this is the error function after minor adjustments to better represent what was originally presented in Ian Goodfellow’s paper

This change in the formula means an incorrect prediction (i.e. — y = 0 but D outputs 1) will result in an error of -∞ as opposed to ∞.

公式中的这一变化意味着错误的预测(即y = 0但D输出1)将导致-∞的误差，而不是∞的误差。

应用误差函数 (Applying the Error Function)

After deriving a suitable error function for our GAN, the next reasonable step is to apply it to the current setup.

在为我们的GAN导出合适的误差函数之后，下一步是将其应用于当前设置。

The first step in this process is to set some goals for our models. Essentially, our discriminator, D, should aim to classify all of its inputs correctly, while the generator, G, should try to trick the discriminator by making it misclassify as much data as possible. With these two goals in mind, we now begin to analyze the behavior of our error function. Right away, it is easy to see that the error function attains a maximum value of 0, which only occurs when the discriminator perfectly classifies everything with 100% confidence (this is especially easy to see using the definition of our error function). Additionally, our error function attains a minimum at -∞, which only occurs when the discriminator is 100% confident in its predictions, but is always wrong (this may occur if D(x) is 0 but y = 1).

此过程的第一步是为我们的模型设定一些目标。本质上，我们的鉴别器D应该旨在正确地对其所有输入进行分类，而生成器G则应尝试通过使分类器对尽可能多的数据进行错误分类来欺骗鉴别器。考虑到这两个目标，我们现在开始分析误差函数的行为。马上就可以看出，误差函数的最大值为0，只有当鉴别器以100％的置信度对所有事物进行完美分类时，才会出现该最大值(使用误差函数的定义尤其容易看出这一点)。此外，我们的误差函数在-∞处达到最小值，仅当鉴别器对其预测具有100％的置信度时才会发生，但始终是错误的(如果D ( x )为0但y = 1可能会发生)。

Combining these two insights, we are able to mathematically formulate a competition between the two models G and D. Namely, G is attempting to minimize our error function (G wants the error to be -∞), while D is trying to maximize it (D wants to error to be 0). This sort of adversarial competition is also known as a mini-max game, where the models G and D are competing against each other like players. As a result, we find it more intuitive to call E a value function, V(G,D), where G’s goal is the minimize the value of V(G,D), while D’s goal is to maximize the value function. This can be described with the following expression:

结合这两种见解，我们可以在数学上公式化两个模型G和D之间的竞争。即， G试图使误差函数最小化( G希望误差为-∞)，而D试图使误差函数最大化( D希望误差为0)。这种对抗性竞争也称为迷你－最大 游戏，其中模型G和D像玩家一样相互竞争。结果，我们发现将E称为值函数 V ( G ， D )更直观，其中G的目标是最小化V ( G ， D )的值，而D的目标是最大化V ( G ， D )的值。值函数。这可以用以下表达式来描述：

unfortunately, this expression is not yet complete; do you see how to improve it?

However, the above formula has a critical flaw: it only takes in a single input at a time. In order to improve the utility of this function, it would be best for it to calculate the error over all of our data (this includes both the data set and everything generated by the generator). This is where it becomes more useful to find the aggregate or total error that the models have over the entire data set. In fact, we can find this total error by just summing up the error for each individual input. To see where this will lead us, we must examine now examine the cases where an input to our discriminator comes from the data set and the cases where an input comes from the generator.

但是，上面的公式有一个严重的缺陷：一次只能输入一个输入。为了提高此功能的实用性，最好对所有数据(包括数据集和生成器生成的所有数据)计算误差。在这里，找到模型在整个数据集上的汇总误差或总误差变得更加有用。实际上，我们可以通过对每个单独输入的误差求和来找到总误差。为了了解这将导致我们什么，我们现在必须检查判别器输入来自数据集的情况以及输入来自生成器的情况。

When an input to the discriminator comes from the data set, y will be equal to 1. This means that the value function for that single instance of data becomes log(D(x)). Consequently, if we were to find the error for every piece of data from our data set, the total error for these data entries would be the number of entries in the data multiplied with the error for a single entry in the data set. Of course, this is assuming that the error is roughly the same for each entry in the data set. Additionally, we can mathematically describe the number data entries in our data set using

邮件伪造_伪造品背后的数学相关推荐

分类决策树回归决策树_决策树分类器背后的数学
分类决策树回归决策树决策树分类器背后的数学 (Maths behind Decision Tree Classifier) Before we see the python implementat ...
深度学习背后的数学_深度学习背后的简单数学
深度学习背后的数学 Deep learning is one of the most important pillars in machine learning models. It is based ...
08-Flutter移动电商实战-dio基础_伪造请求头获取数据
08-Flutter移动电商实战-dio基础_伪造请求头获取数据在很多时候,后端为了安全都会有一些请求头的限制,只有请求头对了,才能正确返回数据.这虽然限制了一些人恶意请求数据,但是对于我们聪明的程 ...
逻辑回归算法c语言_逻辑回归算法背后的数学
逻辑回归算法背后的数学看完深蓝学院的机器学习公开课后,对于逻辑回归部分,打算写篇学习笔记记录总结一下,也和大家共同分享. 1 基本思能逻辑回归(Logistic Regression)和线性回归( ...
模拟上帝之手的对抗博弈——GAN背后的数学原理
模拟上帝之手的对抗博弈--GAN背后的数学原理深度学习人工智能机器学习神经网络神经网络与机器学习-英文版阅读1984 作者:李乐 CSDN专栏作家简介深度学习的潜在优势就在于可以利用 ...
Google BBR拥塞控制算法背后的数学解释 | 深度
参加 2019 Python开发者日,请扫码咨询 ↑↑↑ 作者 | 赵亚转载自CSDN网站杭州待了一段时间,回到深圳过国庆假期,无奈温州皮鞋?厂老板过节要回温州和上海,不在深圳,也就没有见着,非常 ...
解析深度神经网络背后的数学原理！
作者 | Piotr Skalski 译者 | 巧克力编辑 | Jane 出品 | AI科技大本营 [导读]为了更好地理解神经网络的运作,今天只为大家解读神经网络背后的数学原理.而作者写这篇文章的目 ...
人工神经网络背后的数学原理！
↑↑↑关注后"星标"Datawhale 每日干货 & 每月组队学习,不错过 Datawhale干货作者:贾博文,浙江大学,Datawhale原创作者本文约8000字,建 ...
梯度下降背后的数学原理几何？
来自 | 深度学习这件小事编辑 | Datawhale 对于诸位"机器学习儿"而言,梯度下降这个概念一定不陌生,然而从直观上来看,梯度下降的复杂性无疑也会让人"敬而 ...

邮件伪造_伪造品背后的数学

入门指南 (An Introductory Guide)

什么是GAN？ (What is a GAN?)

推导误差函数 (Deriving an Error Function)

应用误差函数 (Applying the Error Function)

邮件伪造_伪造品背后的数学相关推荐

最新文章

热门文章