注意：（1）原载地址：http://blog.csdn.net/xinhanggebuguake/article/details/8765536

（2）英文原文链接：http://www.temida.si/~bojan/probability_estimation.PHP

概率估计

1引言

假设在一次实验中，我们进行了n次独立试验，其中有r次成功。其余的试验（n-r）失败。这样的试验例如：掷硬币（例如，头像的面为成功），掷骰子（例如，六朝上为成功），篮球中的罚球投篮（一个得分点是成功），等等。

在这份报告中，我们将讨论以下问题：如何估计下次（第n +1次）试验成功的概率。此外，我们将特别关注样例大小足够小的情况。在这种情况下我们将讨论和比较三种方法：相对频率，拉普拉斯平滑定理和M-估计。

足够小究竟是什么意思？Good（1965）指出当在我们的样例集中有三个以上的成功和三个以上的失败时，对于许多实际应用来说以上鸟叔的任何方法之间差别不大。所以，我们基本上要处理的情况是失败的数目或者成功的数目是很小的（如，0,1，2）。注意在这种情况下也常发生在样本大小很大时，但我们将分割试验集成子集以满足一定条件来估计这些子集上的条件概率。

2估计下次试验成功的概率

2.1相对频率

有时也称相对概率为极大似然估计。下次试验成功的概率按照以下公式计算：

P=r/n

当样本数量足够大的概率估计，可以看作是一个相对简单的任务。在这种情况下，我们就不需要任何理论。伯努利定理之处，当n足够大时，下次试验成功的概率可以可信的按相对频率P=r/n估计。更正式地说，对于任何任意小ε和δ，存在n₀使得对于每一个n≥n₀，有以下不等式：

P(|r/n -P| <ε) >1 - δ

然而，在完成一次试验后，结果是失败，下次试验的成功的相对频率的估计概率为0。

2.2拉普拉斯平滑定理

为了缓解这种零概率估计，修改该后的方案是：

P = (r+1)/(n+2)

在此公式中，假定了统一先验概率（Good，1965）。事实上，分子加1和分母加2背后的基本原理是这样的：在执行实际的试验之前，我们假设已经有两次试验，一次成功和一次失败。

2.3贝叶斯M-估计

（Cestnik，1990，1991）中描述了一种更一般的贝叶斯概率估计。在证据的基础上来计算一般的贝叶斯概率估计，首先假定先验概率分布。然后，根据已有的证据，更新先验分布为后验分布，其预期可以被视为p的一个点估计。确定先验概率分布的任务被认为是贝叶斯方法的一个固有难点。

P=(r + P_am)/(n + m)

3条件概率的估计

为条件概率估计提出的m-估计（Cestnik，1990，1991）背后的基本想法是，先验概率可以从无条件样本中估计。方差相关的其它参数 m，也还必须确定。先验方差由下式计算：

Var(p) = P_a(1-P_a)/(m+1)

参数m与先验分布的方差成反比。从以下形式的M-估计可以观察到，它还利用相对频率和先验概率之间的折中：

P=n/(n+m) * r/n + m/(n+m) * P_a

附M-估计的论文：

Estimating Probabilities: A Crucial Task in Machine Learning. Cestnik 1990

附英文原文

Probability estimation

1 Introduction

Let us assume that in an experiment we have conducted n independent trials, of which there are r successes. The rest of the trials (n-r) are failures. Examples of such experiments are tossing a coin (e.g. success is a head), throwing a dice (e.g. success is 6 on top), shooting a free throw in basketball (success is a scored point), etc.

In this report we will discuss the following question: how to estimate the probability of success in the next (n+1) trial.

In addition, we would like to devote special attention to the cases when the sample size is effectively small. In this context we will discuss and compare three methods: relative frequency, Laplace's law of succession and m-estimate.

What exactly do we mean by effectively small sample size? Good (1965) states then when there are more than three successes and more than three failures in our examples set, there is little difference between any of the described methods for many practical purposes. So, we will basically be dealing with the cases when either the number of successes or the number of failures are small (e.g. 0, 1, 2). Note that such situations can occur also when the actual sample size is large,

but we divide the trial set to subsets that fulfill certain conditions for the purpose of estimating conditional probabilities in such subsets.

2 Estimates for success in the next trial

2.1 Relative frequency

Relative frequency is sometimes called also maximum likelihood estimate. Probability of success in the nest trial is computed according to the following formula:

P=r/n

The estimation of probabilities can be regarded as a relatively simple task when the sample size is large enough. In such case we would hardly require any theory. Bernoulli's theorem states that when n is large enough, the probability of success in the next trial can be reliably estimated by the relative frequency P=r/n More formally, for any arbitrary small ε and δ such n₀ can be found that for every n≥n₀the following inequation holds:

P(|r/n -P| <ε) >1 - δ

However, after completing just one trial, which was a failure, the relative frequency probability estimate of a success in the next trial would be 0.

2.2 Laplace's law of succession

In order to alleviate such zero probability estimations, a modified formula was proposed:

P = (r+1)/(n+2)

In this formula a uniform prior probability is assumed (Good, 1965). In fact, the rationale behind adding 1 in the numerator and 2 in denominator is the following:

before performing the actual experiment, we assume that there were two trials, one successful and one failure.

2.3 Bayesian m-estimate

More general Bayesian probability estimate is described in (Cestnik, 1990, 1991). To calculate a general Bayesian probability estimate on the basis of evidence,

a prior probability distribution has to be assumed first. Then, given the evidence, the prior distribution is updated to a posterior one, from which the expectation can be taken as a point estimate of p. The task of determining a prior probability distribution has been identified as an intrinsic difficulty of the Bayesian approach.

P=(r + P_am)/(n + m)

3 Estimation of conditional probabilities

The basic idea behind the proposed m-estimate (Cestnik, 1990, 1991) for estimating conditional probabilities is that the prior probabilities can be estimated from an unconditional sample. The remaining parameter m, which is related to the variance, has to be determined also. The prior variance is computed by the following formula:

Var(p) = P_a(1-P_a)/(m+1)

The parameter m is inversely proportional to the variance of the prior distribution. It also leverages the tradeoff between relative frequency and prior probability, as can be observed from the following form of m-estimate:

P=n/(n+m) * r/n + m/(n+m) * P_a

4 Literature

B. Cestnik: Estimating probabilities in machine learning, Ph.D. thesis, University of Ljubljana, Faculty of Computer and Information Science, 1991.

B. Cestnik: Estimating probabilities: A crucial task in machine learning. In: Carlucci Aiello, Luigia (ed.). ECAI 90. London: Pitman, 1990, str. 147-149.

B. Cestnik, I. Bratko: On estimating probabilities in tree pruning. In: Kodratoff, Yves. Machine learning - EWSL-91 : European working session on learning, Porto, Portugal, March 6-8, 1991: proceedings, (Lecture notes in computer science, Lecture notes in artificial intelligence, 482). Berlin [etc.]: Springer-Verlag, 1991, str. 138-150.

S. Džeroski, B. Cestnik, I. Petrovski: Using the m-estimate in rule induction. CIT. J. Comput. Inf. Technol., 1993, vol. 1, str. 37-46

I. J. Good: The Estimation of Probabilities: An Essay on Modern Bayesian Methods, Cambridge, Mass., M.I.T. Press, 1965.

译文：

极大似然估计、拉普拉斯平滑定理、M-估计详解相关推荐

毫米波雷达系列 | 基于前后向空间平滑的MUSIC算法详解
毫米波雷达系列 | 基于前后向空间平滑的MUSIC算法详解文章目录毫米波雷达系列 | 基于前后向空间平滑的MUSIC算法详解 DOA阵列模型 MUSIC算法空间平滑算法整体流程仿真代码忙了 ...
nginx平滑升级与配置文件详解
nginx平滑升级与配置文件详解文章目录 nginx平滑升级 nginx配置文件 nginx.conf配置详解用于调试.定位问题的配置参数正常运行必备的配置参数优化性能的配置参数网络连接相关 ...
NGINX 安装、启停、平滑升级、配置文件详解
NGINX安装.启停.平滑升级一.NGINX 安装 1.下载nginx 2.nginx解压安装 3.预先安装 4.nginx编译 5.安装nginx 6.查看安装路径 7.启动nginx 二.NGI ...
6D位姿估计Point Pair Feature (PPF)算法详解
作者丨丁洪凯@知乎来源丨https://zhuanlan.zhihu.com/p/94952276 编辑丨3D视觉工坊一.论文解读论文: Drost et al. Model Globally, ...
用Burg法估计AR模型的参数原理详解及matlab实现
用Burg法估计AR模型的参数. 借助如图所示的格型预测误差滤波器,伯格法通过求出前向预测误差和后向预测误差的平均功率来选取最佳的反射系数k,使误差的平均功率取得最小值,进而通过反馈求出模型系数和噪声 ...
无迹卡尔曼滤波估计SOC（附MATLAB程序详解）
设置电流采样周期为1s T=1; 导入电流数据并求电流数据的长度. I(1,:)=xlsread('UDDS.xlsx')'; N=length(I); 设置参数Q为单位矩阵乘以1e-4,设置参数R= ...
python 数据、曲线平滑处理——基于Numpy.convolve实现滑动平均滤波——详解
文章目录 1 基于Numpy.convolve实现滑动平均滤波 1.1 滑动平均概念 1.2 滑动平均的数学原理 1.3 语法 1.4 滑动平均滤波示例 2 曲线平滑处理--Savitzky-Gola ...
贝叶斯网专题11：参数学习之极大似然估计
第一部分:贝叶斯网基础 1.1 信息论基础 1.2 贝叶斯网基本概念 1.3 变量独立性的图论分析第二部分:贝叶斯网推理 2.1 概率推理中的变量消元方法 2.2 团树传播算法 2.3 近似推理 2 ...
一文看懂 “极大似然估计” 与 “最大后验估计” —— 极大似然估计篇
参考: 唐宇迪<人工智能数学基础>第8章 Richard O. Duda <模式分类>第三章白板机器学习 P2 - 频率派 vs 贝叶斯派频率学派还是贝叶斯学派?聊一聊机器 ...

极大似然估计、拉普拉斯平滑定理、M-估计详解

Estimating Probabilities: A Crucial Task in Machine Learning. Cestnik 1990

极大似然估计、拉普拉斯平滑定理、M-估计详解相关推荐

最新文章

热门文章