机器学习中的两个概率模型

discriminative model 和 generative model是机器学习算法中两种概率模型，用来实现对训练样本的概率分布进行建模，在实践中由于经常混淆，现在通过查阅资料，将两者的分别总结于此。
不妨用stackoverflow上的一段描述来开启这个话题：

Let’s say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditional probability distribution p(y|x) - which you should read as “the probability of y given x”. Here’s a really simple example. Suppose you have the following data in the form (x,y):(1,0),(1,0),(2,0),(2,1)

p(x,y) is

p(x,y)	y=0	y=1
x=1	12	0
x=2	14	14

p(y|x) is

p(y\|x)	y=0	y=1
x=1	1	0
x=2	12	12

If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions. The distribution p(y|x) is the natural distribution for classifying a given example x into a class y, which is why algorithms that model this directly are called discriminative algorithms. Generative algorithms model p(x,y), which can be tranformed into p(y|x) by applying Bayes rule and then used for classification. However, the distribution p(x,y) can also be used for other purposes. For example you could use p(x,y) to generate likely (x,y) pairs. From the description above you might be thinking that generative models are more generally useful and therefore better, but it’s not as simple as that. The overall gist is that discriminative models generally outperform generative models in classification tasks.

Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes’ rule.

生成模型是对样本数据的联合概率p(x,y)进行建模，建模得到的联合概率p(x,y)可以用来生成数据对(x,y)，所以被称为生成模型。而判别模型则是对条件概率p(y|x)进行建模，即给定x对应y的概率。而通过生成模型是可通过贝叶斯公式推导至判别模型，而从判别模型无法推导至生成模型。p(x,y)=p(x|y)p(y)，在进行建模的时候，生成模型在训练样本中将对（以二分类问题为例）y=0时样本的特征分布和y=1时样本的特征分布分别进行建模，然后还需对训练样本中的y的先验概率p(y)进行建模。当输入新的无标签样本进行测试时，只需通过计算。而判别模型则比较简单，直接通过计算p(x,y=1)=p(x|y=1)p(y=1)用来代替p(y=1|x)和p(y=0|x)，并比较两者大小来判定类别归属。而对于判别模型则直接对后验概率模型p(y|x)进行建模，比如logistic regression和linear regression等。在测试时，对于无标签样本，直接输入到概率模型中就能得到对应的y值，如果是二分类问题，就可以通过输出p(y=1|x)的概率是否大于0.5为标准来判定归属。

Although this topic is quite old, I think it’s worth to add this important distinction. In practice the models are used as follows.

In discriminative models to predict the label y from the training example x, you must evaluate:

f(x)=arg maxy p(y|x)

Which merely chooses what is the most likely class considering x. It’s like we were trying to model the decision boundary between the classes. This behavior is very clear in neural networks, where the computed weights can be seen as a complex shaped curve isolating the elements of a class in the space.

Now using Bayes’ rule, let’s replace the p(y|x) in the equation by p(x|y)p(y)p(x). Since you are just interested in the arg max, you can wipe out the denominator, that will be the same for every y. So you are left with

f(x)=arg maxy p(x|y)p(y)

Which is the equation you use in generative models. While in the first case you had the conditional probability distribution p(y|x), which modeled the boundary between classes, in the second you had the joint probability distribution p(x,y), since p(x,y)=p(x|y)p(y), which explicitly models the actual distribution of each class.

With the joint probability distribution function, given an y, you can calculate (“generate”) its respective x. For this reason they are called generative models.

Imagine your task is to classify a speech to a language:

you can do it either by:

1) Learning each language and then classifying it using the knowledge you just gained

2) Determining the difference in the linguistic models without learning the languages and then classifying the speech.

the first one is the Generative Approach and the second one is the Discriminative approach.

Examples of discriminative models used in machine learning include:

Logistic regression
Support vector machines
Boosting (meta-algorithm)
Conditional random fields
Linear regression
Neural networks

Examples of generative models include:

Gaussian mixture model and other types of mixture model
Hidden Markov model
Probabilistic context-free grammar
Naive Bayes
Averaged one-dependence estimators
Latent Dirichlet allocation
Restricted Boltzmann machine

2015-8-31 艺少

增补内容：2015-9-1

利用Discriminative model对p(w|x)直接进行建模：
（注：w在此就是y）
1. 为p(w)选择一个合适的概率分布形式
比如选择w服从正态分布，如图所示：

2. 通过x的函数来作为p(w)概率分布形式中的参数
将正态分布的均值μ由x的线性函数表示，方差为一个常数。

p(w|x,θ)=Normw[ϕ0+ϕ1x,σ2]

3. 以θ为参数将定义p(w|x)的形状
参数为ϕ0,ϕ1,σ2. note： this is a linear regression model。
参数可以通过最大化后验概率（MAP），或者最大似然概率（MLE）等来实现估计。
利用Generative模型对p(x|w)或者是p(x,w)进行建模：
1. 为p(x)选择一个合适的概率分布形式
比如选择x服从正态分布，如图所示：

2. 通过w的函数来作为p(x)概率分布形式中的参数
将正态分布的均值μ由x的线性函数表示，方差为一个常数。

p(x|w,θ)=Normx[ϕ0+ϕ1w,σ2]

3. 以θ为参数将定义p(w|x)的形状
参数为ϕ0,ϕ1,σ2。
参数可以通过最大化后验概率（MAP），或者最大似然概率（MLE）等来实现估计。
之后通过p(x|w)×p(w)=p(x,w)来计算联合概率密度，之后再通过贝叶斯概率公式，推导至p(w|x)。图示如下：

在这个例子中，如果采用最大似然估计的方法，则两个模型生成的相同的正态分布。主要是因为x,w都是连续的，而且由线性模型相关联着，都是采用的正态分布来表示不确定性。如果使用MAP即最大后验估计，两个模型将会有不同的结果。

上面主要是以连续回归的方法进行的对比，下面将通过分类离散的方法进行对比，区分效果将更加明显

利用Discriminative model对p(w|x)直接进行建模：
（注：w在此就是y）
1. 为p(w)选择一个合适的概率分布形式
比如选择w服从伯努利分布，如图所示：

2. 通过x的函数来作为p(w)概率分布形式中的参数
对伯努利分布中的参数λ用x的函数进行建模表示：

p(w|x,θ)=Bernw[sig[ϕ0+ϕ1x]]=Bernw[11+exp[−ϕ0−ϕ1x]]

3. 以θ为参数将定义p(w|x)的形状
参数为ϕ0,ϕ1. note： this is a logistic regression model。

利用Generative模型对p(x|w)或者是p(x,w)进行建模：
1. 为p(x)选择一个合适的概率分布形式
比如选择x服从正态分布，如图所示：

2. 通过离散的二进制值w的函数来作为p(x)概率分布形式中的参数
将正态分布的均值μ由x的线性函数表示，方差为一个常数。

p(x|w,θ)=Normx[μw,σ2w]

3. 以θ为参数将定义p(w|x)的形状
参数为μ0,μ1,σ20,σ21。

两者的对比如下图所示：

对于generative model，采用学习算法（learning algorithm）估计的是p(x|y)模型，而采用推理算法（inference algorithm）直接结合先验概率p(y)，推至联合概率密度和利用贝叶斯准则计算至后验概率p(y|x)。

2015-9-1 艺少

转载于:https://www.cnblogs.com/huty/p/8519195.html

机器学习中的两个概率模型相关推荐

机器学习中是如何处理误差的
我们知道,同一个问题,可采用多种机器学习模型来解决,那如何评价这些模型的好坏呢?这时,就需要构建一系列"靠谱"的标准.因此,提及机器学习,性能评估是一个绕不开的话题. 训练误差与测 ...
机器学习中的概率模型
机器学习中的概率模型转自:https://zhuanlan.zhihu.com/p/164551678 机器学习中的概率模型概率论,包括它的延伸-信息论,以及随机过程,在机器学习中有重要的作用.它 ...
机器学习中的相似性度量（转）
在做分类时常常需要估算不同样本之间的相似性度量(Similarity Measurement),这时通常采用的方法就是计算样本间的"距离"(Distance).采用什么样的方法计算 ...
3. 机器学习中为什么需要梯度下降_机器学习中一些模型为什么要对数据归一化？...
一般做机器学习应用的时候大部分时间是花费在特征处理上,其中很关键的一步就是对特征数据进行归一化,为什么要归一化呢?很多同学并未搞清楚,维基百科给出的解释: 1)归一化后加快了梯度下降求最优解的速度蓝 ...
机器学习中树模型算法总结之决策树（下）
写在前面首先回顾一下上一篇的相关内容,主要是理论的介绍了决策树的模型及几种常见的特征选择准则,具体可参见机器学习中树模型算法总结之决策树(上).今天主要接着学习,包括决策树的生成(依赖于第一篇的三 ...
机器学习中的目标函数总结
点击上方,选择星标或置顶,每天给你送干货! 阅读大概需要26分钟跟随小博主,每天进步一丢丢来自:SIGAI 几乎所有的机器学习算法都归结为求解最优化问题.有监督学习算法在训练时通过优化一个目标函数 ...
机器学习中常见的几种归一化方法以及原因
在机器学习中,数据归一化是非常重要,它可能会导致模型坏掉或者训练出一个很奇怪的模型,为了让机器学习的模型更加适合实际情况,需要对数据进行归一化处理. 1.机器学习中常用的归一化方法: 2. 不同归一化 ...
机器学习| 面试题：01、机器学习中LR（Logistic Regression）和SVM（Support Vector Machine）有什么区别与联系？
问题机器学习中LR(Logistic Regression)和SVM(Support Vector Machine)有什么区别与联系? 背景 LR和SVM的概念大家都有了解甚至很熟悉了,不过在面试中 ...
机器学习中贝叶斯判决、概率分布、样本等概念间的关系
以下是在看模型识别,机器学习及数理统计时,对贝叶斯决策.概率分布.样本关系的总结,每想到一点就写下来,比较乱,这块需要反复学习.慢慢理解. 1. 机器学习的一些概念: 什么是机器学习? 机器学习包含哪 ...

机器学习中的两个概率模型

机器学习中的两个概率模型相关推荐

最新文章

热门文章