【回归分析】logistic regresssion中的拟合优度检验（goodness-of-fit test）

参考资料：【回归分析】台湾交通大学-黄冠华教授

goal : to test how well the used model fits to the observed data.
in the linear regression,the coeffient of determination R2R^2R2, which represents the fraction of the total variation of the data explained by the used model, can be used as a goodness-of-fit measurement.
in logistic regression, the coefficient of determination is not a valid goodness-of-fit measurement. we need to develop a quantity in logistic regression for goodness-of-fit test.
Model assumption in logistic regression
独立：用处体现在likelihood function上，如果不能保持独立，那么根据likelihood function推出来的标准差和显著性检验全部都是错的。
线性：log odds ln(p/(1−p))ln(p/(1-p))ln(p/(1−p))与参数保持线性关系。
in this case linearity is on the logit scale,meaning that ln(p/(1−p))ln(p/(1-p))ln(p/(1−p)) has the same increment with every unit increase in x. this is the same as saying that the odds ratio is the same between x and x+1 no matter what x is.
无交互作用：no interaction effects are assumptions of constancy of odd ratios of one variable across level of the other.
saturated model ，饱和模型，是最复杂的模型，是对原始数据的完全描述，它不需要再添加任何假设，使用它的预测和使用原始数据做预测效果一样。
如何识别一个模型是不是饱和模型？
如果数据分分组数 = 模型中未知参数的个数，则该模型是饱和模型。
回顾之前：拟合优度检验主要是看新模型和原数据拟合得是否贴切，而饱和模型和原数据是完全符合的，于是问题进一步转化成了：检验新模型和饱和模型是否接近。如果很靠近，说明新模型很好，否则不好。
for grouped data , goodness-of -fit amounts to compare the model we have with the saturated model(since the data can be exactly reproduced by the saturated model)
this is then equivalent to testing whether enough interaction effects have been included in the model (since a saturated model is the model with all possible interaction)
如何比较两个模型之间的差距？
使用 −2ln(likelihoodfunction)-2ln(likelihood function)−2ln(likelihoodfunction),在这里,
two forms of goodness-of-fit test are commonly used with logistic regression, where sums are taken over risk factor-confounder combinations:
form1:G1=2ln(L(saturated.model))−2ln(L(fitted.model))=2ΣiOiln(Oi/Ei)G_1=2ln(L(saturated.model))-2ln(L(fitted.model))=2\Sigma_iO_iln(O_i/E_i)G1=2ln(L(saturated.model))−2ln(L(fitted.model))=2ΣiOiln(Oi/Ei)
where OiO_iOi are the numbers of observations in each cell, and EiE_iEi are the predicted numbers of observations based on the fitted model.
form2:

举个小栗子：下面x1表示随机变量（binary variable），D21和D31表示虚拟变量（dummy variable）,y表示因变量（binary variable），分别写出饱和模型和现需要检验的新模型。

根据fitted model 算出公式中的y=1 的概率，也就是EiE_iEi。
以上是对分组数据而言，goodness-of-fit for individual data 的情况如下
特点是这里的自变量是连续的，没有分组。
解决思路：使用the Hosmer and Lemeshow 方法进行分组。
Hosmer and Lemeshow分组的大致步骤：
使用上述fitted model，根据每一条case已知的covariate,计算出对应的y=1时的概率Pr(y=1)Pr(y=1)Pr(y=1)。
然后，按照计算出来的概率，从小到大对这些case，进行排序。
如果是分成10组，那么就是【0%-10%】的为第一组，【10%-20%】的为第二组，……，【90%-100%】的为第10组。
分过组后，如何计算第一组的EiE_iEi？对第一组的Pr(y=1)Pr(y=1)Pr(y=1)取平均，再乘上该组的sample size,即可。

上面的goodness-of-fit test 是一种overall test,也就是如果接受原假设意味着模型符合得很好，而如果不接受原假设意味着模型符合得不好，但却不说哪里不好，无法给出是哪一条假设出了问题，于是考虑使用 residual analysis。
普通线性回归中的残差图，在logistic regression不再适用，因为logistics regression 中的 outcome 只取0或1，做出的图像分段(不均匀分散，所以无法按照之前的特性对残差图进行分析)，于是提出Pearson residuals.

【回归分析】logistic regresssion中的拟合优度检验（goodness-of-fit test）相关推荐

R语言回归中的Hosmer-Lemeshow拟合优度检验
在依赖模型得出结论或预测未来结果之前,我们应尽可能检查我们假设的模型是否正确指定.也就是说,数据不会与模型所做的假设冲突. 我们围绕回归技术进行一些咨询,帮助客户解决独特的业务问题.对于二元结果,逻辑 ...
多元有序logistic回归_医学统计与R语言：多分类logistic回归HosmerLemeshow拟合优度检验...
微信公众号:医学统计与R语言如果你觉得对你有帮助,欢迎转发输入1:multinominal logistic regression install.packages("nnet" ...
数据分析统计学原理第十二章：多个比例的比较、独立性及拟合优度检验 | 我的统计学原理复习日记
个或多个总体比例的相等性的检验例子: 三个或多个总体比例相等性的卡方检验的一般步骤多重比较方法我们使用卡方检验得到三个汽车车主总体的总体比例不全相等的结论.因此,有些总体比例之间存在差异,而且研 ...
应用统计学与R语言实现学习笔记（七）——拟合优度检验
版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/ESA_DSQ/article/details/71513581 Chapter 7 Goodness ...
回归方程的拟合优度检验_拟合优度检验
可决系数可决系数(coefficient of determination) 如果样本回归线对样本观测值拟合程度越好,各样本观测点与回归线靠得越近,由样本回归做出解释的离差平方和与总离差平方和越相近 ...
回归方程的拟合优度检验_计量经济学第四讲（多元线性回归模型：基本假定，参数估计，统计检验）...
第三章.经典单方程计量经济学模型:多元线性回归模型 3.1多元线性回归模型及其基本假定 3.1.1多元回归模型及其表示解释变量至少有两个的线性回归模型,一般形式为如果不作说明, 是不包括常数项的解 ...
计量经济分析：计量经济学中的三大检验（LR， Wald， LM）
前面用Python底层编写进行计量经济分析(一):多元线性回归(参数估计.T检验.拟合优度.F检验)写过在多元线性回归时的参数检验方法t检验和方程整体的F检验.在分析中和实际情况中,我们可能会假定因素 ...
R语言Poisson回归的拟合优度检验
在这篇文章中,我们将看一下Poisson回归的拟合优度测试与个体计数数据. 最近我们被客户要求撰写关于Poisson回归的研究报告,包括一些图形和统计输出.许多软件包在拟合Poisson回归模型时在输 ...
统计学中的t检验、f检验、卡方检验
1.1数据的种类我们都知道,一般数据可以分为两类,即定量数据(数值型数据)和定性数据(非数值型数据),定性数据很好理解,例如人的性别,姓名这些都是定性数据. 定量数据可以分为以下几种: 1.1.1定 ...
UA MATH566 统计理论推导卡方拟合优度检验
UA MATH566 统计理论推导卡方拟合优度检验卡方拟合优度检验主要是检验categorical data的,假设一共有ddd种category,每一种理论比例为pip_ipi,满足 ∑i=1 ...

【回归分析】logistic regresssion中的拟合优度检验（goodness-of-fit test）

【回归分析】logistic regresssion中的拟合优度检验（goodness-of-fit test）相关推荐

最新文章

热门文章