Suppose that an experiment consists of n = 5 independent Bernoulli trials, each having probability of success p. LetX be the total number of successes in the trials, so that X∼Bin(5,p)X∼Bin(5,p). If the outcome is X = 3, the likelihood is

L(p;x)=n!x!(n−x)!px(1−p)n−x=5!3!(5−3)!p3(1−p)5−3∝p3(1−p)2L(p;x)=n!x!(n−x)!px(1−p)n−x=5!3!(5−3)!p3(1−p)5−3∝p3(1−p)2

where the constant at the beginning is ignored. A graph of L(p;x)=p3(1−p)2L(p;x)=p3(1−p)2 over the unit interval p ∈ (0, 1) looks like this:

It’s interesting that this function reaches its maximum value at p = .6. An intelligent person would have said that if we observe 3 successes in 5 trials, a reasonable estimate of the long-run proportion of successes p would be 3/5 = .6.

This example suggests that it may be reasonable to estimate an unknown parameter θ by the value for which the likelihood function L(θ ; x) is largest. This approach is called maximum-likelihood (ML) estimation. We will denote the value of θ that maximizes the likelihood function by θ^θ^, read “theta hat.”θ^θ^ is called the maximum-likelihood estimate (MLE) of θ.

Finding MLE’s usually involves techniques of differential calculus. To maximize L(θ ; x) with respect to θ:

  • first calculate the derivative of L(θ ; x) with respect to θ,
  • set the derivative equal to zero, and
  • solve the resulting equation for θ.

These computations can often be simplified by maximizing the loglikelihood function,

l(θ;x)=logL(θ;x)l(θ;x)=logL(θ;x),

where “log” means natural log (logarithm to the base e). Because the natural log is an increasing function, maximizing the loglikelihood is the same as maximizing the likelihood. The loglikelihood often has a much simpler form than the likelihood and is usually easier to differentiate.

In Stat 504 you will not be asked to derive MLE’s by yourself. In most of the probability models that we will use later in the course (logistic regression, loglinear models, etc.) no explicit formulas for MLE’s are available, and we will have to rely on computer packages to calculate the MLE’s for us. For the simple probability models we have seen thus far, however, explicit formulas for MLE’s are available and are given next.

ML for Bernoulli trials

If our experiment is a single Bernoulli trial and we observe X = 1 (success) then the likelihood function is L(p ; x) =p . This function reaches its maximum at p^=1p^=1. If we observe X = 0 (failure) then the likelihood is L(p ; x) = 1 − p , which reaches its maximum at p^=0p^=0. Of course, it is somewhat silly for us to try to make formal inferences about θ on the basis of a single Bernoulli trial; usually multiple trials are available.

Suppose that X = (X1, X2, . . ., Xn) represents the outcomes of n independent Bernoulli trials, each with success probability p . The likelihood for p based on X is defined as the joint probability distribution of X1, X2, . . . , Xn. Since X1, X2, . . . , Xn are iid random variables, the joint distribution is

L(p;x)≈f(x;p)=∏i=1nf(xi;p)=∏i=1npx(1−p)1−xL(p;x)≈f(x;p)=∏i=1nf(xi;p)=∏i=1npx(1−p)1−x

Differentiating the log of L(p ; x) with respect to p and setting the derivative to zero shows that this function achieves a maximum at p^=∑i=1nxi/np^=∑i=1nxi/n. Since  ∑i=1nxi∑i=1nxi is the total number of successes observed in the n trials, p^p^ is the observed proportion of successes in the n trials. We often call p^p^ the sample proportion to distinguish it from p , the “true” or “population” proportion. Note that in some textbooks the authors may use π instead of p. For repeated Bernoulli trials, the MLE p^p^ is the sample proportion of successes.

ML for Binomial

Suppose that X is an observation from a binomial distribution, X ∼ Bin(n, p ), where n is known and p is to be estimated. The likelihood function is

L(p;x)=n!x!(n−x)!px(1−p)n−xL(p;x)=n!x!(n−x)!px(1−p)n−x

which, except for the factor n!x!(n−x)!n!x!(n−x)!, is identical to the likelihood from n independent Bernoulli trials with x=∑i=1nxix=∑i=1nxi. But since the likelihood function is regarded as a function only of the parameter p, the factor n!x!(n−x)!n!x!(n−x)!is a fixed constant and does not affect the MLE. Thus the MLE is again p^=x/np^=x/n, the sample proportion of successes.

You get the same value by maximizing the binomial loglikelihood function

l(p;x)=k+x log p+(n−x) log (1−p)l(p;x)=k+x log p+(n−x) log (1−p)

where k is a constant that does not involve the parameter p. In the future we will omit the constant, because it's statistically irrelevant.

The fact that the MLE based on n independent Bernoulli random variables and the MLE based on a single binomial random variable are the same is not surprising, since the binomial is the result of n independent Bernoulli trials anyway. In general, whenever we have repeated, independent Bernoulli trials with the same probability of successp for each trial, the MLE will always be the sample proportion of successes. This is true regardless of whether we know the outcomes of the individual trials X1, X2, . . . , Xn, or just the total number of successes for all trials X=∑i=1nXiX=∑i=1nXi.

Suppose now that we have a sample of iid binomial random variables. For example, suppose that X1, X2, . . . , X10are an iid sample from a binomial distribution with n = 5 and p unknown. Since each Xi is actually the total number of successes in 5 independent Bernoulli trials, and since the Xi’s are independent of one another, their sum X=∑i=110XiX=∑i=110Xi is actually the total number of successes in 50 independent Bernoulli trials. Thus X∼Bin(50,p)X∼Bin(50,p) and the MLE is p^=x/np^=x/n, the observed proportion of successes across all 50 trials. Whenever we have independent binomial random variables with a common p , we can always add them together to get a single binomial random variable.

Adding the binomial random variables together produces no loss of information about p if the model is true. But collapsing the data in this way may limit our ability to diagnose model failure, i.e. to check whether the binomial model is really appropriate.

ML for Poisson

Suppose that X = (X1, X2, . . . , Xn) are iid observations from a Poisson distribution with unknown parameter λ. The likelihood function is:

L(λ;x)=∏i=1nf(xi;λ)=∏i=1nλxie−λxi!=λ∑i=1nxie−nλx1!x2!⋯xn!L(λ;x)=∏i=1nf(xi;λ)=∏i=1nλxie−λxi!=λ∑i=1nxie−nλx1!x2!⋯xn!

By differentiating the log of this function with respect to λ, that is by differentiating the Poisson loglikelihood function

l(λ;x)=∑i=1nxi log λ−nλl(λ;x)=∑i=1nxi log λ−nλ

ignoring the constant terms that do not depend on λ, one can show that the maximum is achieved at λ^=∑i=1nxi/nλ^=∑i=1nxi/n. Thus, for a Poisson sample, the MLE for λ is just the sample mean.

Next: Likelihood-based confidence intervals and tests.

from: https://onlinecourses.science.psu.edu/stat504/node/28

最大似然估计Maximum-likelihood (ML) Estimation相关推荐

  1. 极大似然估计Maximum Likelihood Estimation

    极大似然估计是概率论在统计学的应用,是一种参数估计.说的是已知随机样本满足某种具体参数未知的概率分布,参数估计就是通过若干次试验,利用结果推出参数的大概值.极大似然估计的一种直观想法是已知某个事件发生 ...

  2. 很好的例子理解区别 Maximum Likelihood (ML) Maximum a posteriori (MAP)

    节食一定会瘦, 而食减肥药有一半机会瘦. P(瘦|节食)=1 P(瘦|食减肥药)=0.5 A女士很瘦. By Maximum Likelihood(ML),则A女士最大机会节食. 但如果 given ...

  3. 机器学习基础:最大似然(Maximum Likelihood) 和最大后验估计 (Maximum A Posteriori Estimation)的区别

    前言 在研究SoftMax交叉熵损失函数(Cross Entropy Loss Function)的时候,一种方法是从概率的角度来解释softmax cross entropy loss functi ...

  4. 最大似然估计log likelihood

    log likelihood--对数似然函数值 在参数估计中有一类方法叫做"最大似然估计",因为涉及到的估计函数往往是是指数型族,取对数后不影响它的单调性但会让计算过程变得简单,所 ...

  5. Maximum Likelihood(ML) 和 Maximum a posterior(MAP)的直观理解

    ML和MAP在Pattern Recognition, Machine Learning这些领域绝对是超高频词汇, 这段时间琢磨了一下下, 写下点体会. 随便找个图,看着方便理解.引用自:http:/ ...

  6. 机器学习笔记(VII)线性模型(III)对数几率回归和极大似然估计

    背景知识 常见回归模型 线性回归(linear regression): y=wTx+b(1) y=\mathbf{w}^T\mathbf{x}+b\tag{1} 但是有时候预测值会逼近 y \mat ...

  7. ICCV 2021 | 重铸Regression荣光!具有残差对数似然估计的人体姿态回归

    © 作者 | Tau 单位 | 网易 研究方向 | 计算机视觉 本文是一篇新鲜出炉的 ICCV Oral,由于我的工作内容是姿态估计相关,因此也在第一时间拜读了一下,以下是一些笔记和过往经验的总结.由 ...

  8. 一文看懂 “极大似然估计” 与 “最大后验估计” —— 极大似然估计篇

    参考: 唐宇迪<人工智能数学基础>第8章 Richard O. Duda <模式分类>第三章 白板机器学习 P2 - 频率派 vs 贝叶斯派 频率学派还是贝叶斯学派?聊一聊机器 ...

  9. Lab 4: Phylogenetic Reinference Using Maximum Likelihood

    Lab 4: Phylogenetic Reinference Using Maximum Likelihood [!TOC] 来自吴茂英老师的Exercise Learning Objectives ...

  10. 机器学习之数学基础(一)~maximum likelihood

    一.转载博客 转载在:https://www.douban.com/note/640290683/ 注0:<deep learning>的chapter 5有一部分讲maximum lik ...

最新文章

  1. 安装Windows Storage Server 2008 R2
  2. 如何在android的XML和java代码中引用字符串常量
  3. Android中的动画
  4. com.sun.istack.SAXException2: 在对象图中检测到循环。这将产生无限深的 XML
  5. python 类中定义列表_Python-从类定义中的列表理解访问类变量
  6. 逻辑回归与朴素贝叶斯的战争
  7. 安装npm_微信小程序使用npm安装第三方库
  8. python https 协议_Python中连接HTTPS网站如何强制使用TLSv1协议
  9. Go语言的素数对象编程实现及其使用
  10. Ubuntu18.04之man中文版
  11. js之事件冒泡和事件捕获详细介绍
  12. SAP PP制造生产教程
  13. 美洽客服JavaScript 网页插件
  14. python处理excel数据
  15. 密码在智能汽车数据安全领域的应用研究报告
  16. SNDACode介绍
  17. 做软件开发,客户难找?接单难?怎么办?
  18. JAVA并发编程(一)上下文切换
  19. ftp服务器型号,ftp服务器的类型及其特点
  20. python实现进制转换器_python实现各进制转换的总结大全

热门文章

  1. 智能合约开发环境搭建及 Hello World 合约
  2. Learning to Rank 中Listwise关于ListNet算法讲解及实现
  3. 李彦宏透露百度真正的护城河
  4. Elasticsearch-02CentOS7安装elasticsearch-head插件
  5. JVM-08垃圾收集Garbage Collection【GC常用参数】
  6. java对mysql排序_MySQL 排序
  7. docker安装ActiveMQ
  8. css 识别变量中的换行符_Python编程 第二章——变量和简单数据类型
  9. kibana显示JAVA,elasticsearch kibana简单查询讲解
  10. java string string_深入理解Java:String