logit回归模型假设_Logistic回归模型及应用建模(二)
二、因变量多分类 logistic
回归
1、概述:多元Logistic回归模型被用来建立有多个输出变量的模型,且这些预测变量通过一个线性组合变成为一个最终的预测变量。Multinomial
Logistic 回归模型中因变量可以取多个值.
所需的应用包:
library(foreign)
library(nnet)
library(ggplot2)
library(reshape2)
2、建模例子:
Example . Entering high school students make program
choices among general program, vocational program and academic
program. Their choice might be modeled using their writing score
and their social economic
status.
例子:进入高中的学生在做计划选择时,面临着一般计划、职业计划及学术计划三种选择。他们的选择往往是通过对他们的写作成绩及社会经济地位进行模型化决定。
1)读取数据:
ml
read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")
这个数据集包含了有200个学生的记录,结果变量为prog(program
type)三种项目类型,预测变量为ses(social economic
status)社会经济地位,3个分类变量(id,female,schtyp),及其他连续变量。
2)数据初步探索:
with(ml,
table(ses, prog))
prog
ses
general
academic vocation
low
16
19
12
middle
20
44
31
high
9
42
7
按照prog分组统计写作得分的均值及标准方差
with(ml, do.call(rbind, tapply(write, prog, function(x) c(M =
mean(x), SD = sd(x)))))
M
SD
general 51.33333 9.397775
academic 56.25714 7.943343
vocation 46.76000 9.318754
3)建立模型:
利用nnet包中的函数multinom,建立多元logistic回归模型:
Before
running our model. We then choose the level of our outcome that we
wish to use as our baseline and specify this in
therelevelfunction.
Then, we run our model usingmultinom.
Themultinompackage
does not include p-value calculation for the regression
coefficients, so we calculate p-values using Wald tests (here
z-tests).
在建立模型前,我们选择输出变量的水平,并利用函数relevel设置prog为哑变量(虚拟变量),以academic作为参考水平:
ml$prog2
"academic")
test
~ ses + write, data = ml)
summary(test)
结果显示:
Call:
multinom(formula = prog2 ~ ses + write, data =
ml)
Coefficients:
(Intercept) sesmiddle
seshigh
write
general 2.852198
-0.5332810 -1.1628226 -0.0579287
vocation 5.218260
0.2913859 -0.9826649 -0.1136037
Std. Errors:
(Intercept) sesmiddle seshigh
write
general 1.166441
0.4437323 0.5142196 0.02141097
vocation 1.163552
0.4763739 0.5955665 0.02221996
Residual Deviance: 359.9635
AIC: 375.9635
z
summary(test)$coefficients/summary(test)$standard.errors
(Intercept) sesmiddle
seshigh
write
general 2.445214
-1.2018081 -2.261334 -2.705562
vocation 4.484769
0.6116747 -1.649967 -5.112689
2-tailed z
test
p
-
pnorm(abs(z),0, 1)) * 2
(Intercept)
sesmiddle
seshigh
write
general 0.0144766100 0.2294379
0.02373856 6.818902e-03
vocation 0.0000072993 0.5407530 0.09894976
3.176045e-07
The model summary output
has a block of coefficients and a block of standard errors.
模型的结果输出为一系列的系数及相应的标准差,每一行参数都反应了一个模型的等式。Each of
these blocks has one row of values corresponding to a model
equation. Focusing on the block of coefficients, we can
look at the first row
comparing prog =
"general"to our
baselineprog =
"academic"and the second row
comparingprog =
"vocation"to our
baselineprog =
"academic". If we consider our coefficients from the
first row to be b_1 and our coefficients from the second row to be
b_2, we can write our model equations:
\[ln\left(\frac{P(prog=general)}{P(prog=academic)}\right)
= b_{10} + b_{11}(ses=2) + b_{12}(ses=3) +
b_{13}write\]
模型公式一
\[ln\left(\frac{P(prog=vocation)}{P(prog=academic)}\right)
= b_{20} + b_{21}(ses=2) + b_{22}(ses=3) + b_{23}write\]
模型公式二
## extract the
coefficients from the model and exponentiate
exp(coef(test))
(Intercept) sesmiddle
seshigh
write
general 17.32582
0.5866769 0.3126026 0.9437172
vocation 184.61262 1.3382809 0.3743123
0.8926116
解释:
The relative risk ratio for a one-unit
increase in the
variable write is
.9437 for being in general program vs. academic
program.
The relative risk ratio switching
from ses= 1 to 3 is .3126 for being
in general program vs. academic program.
You can also use predicted
probabilities to help you understand the model.
You can calculate predicted probabilities for
each of our outcome levels using
thefittedfunction.
We can start by generating the predicted probabilities for the
observations in our dataset and viewing the first few
rows
head(pp
academic
general vocation
1 0.1482764 0.3382454 0.5134781
2 0.1202017 0.1806283 0.6991700
3 0.4186747 0.2368082 0.3445171
4 0.1726885 0.3508384 0.4764731
5 0.1001231 0.1689374 0.7309395
6 0.3533566 0.2377976 0.4088458
Next, if we want to examine the changes in predicted
probability associated with one of our two variables, we can create
small datasets varying one variable while holding the other
constant. We will first do this
holdingwriteat
its mean and examining the predicted probabilities for each level
ofses.
dses
data.frame(ses
=
c("low",
"middle",
"high"), write
=
mean(ml$write))
ses
write
1
low
52.775
2 middle 52.775
3
high
52.775
predict(test, newdata = dses,
"probs")
academic
general vocation
1 0.4396845 0.3581917 0.2021238
2 0.4777488 0.2283353 0.2939159
3 0.7009007 0.1784939 0.1206054
Another way to understand
the model using the predicted probabilities is to look at the
averaged predicted probabilities for different values of the
continuous predictor
variablewritewithin
each level
ofses.
dwrite
data.frame(ses = rep(c("low", "middle", "high"), each = 41), write
= rep(c(30:70),3))
##
store the predicted probabilities for each value of ses and
write
pp.write
cbind(dwrite, predict(test, newdata = dwrite, type = "probs", se =
TRUE))
## calculate the mean probabilities within each level of
ses
by(pp.write[,
3:5],
pp.write$ses,
colMeans)
pp.write$ses: high
academic general
vocation
0.6164315 0.1808037 0.2027648
------------------------------------------------------------------------
pp.write$ses: low
academic general
vocation
0.3972977 0.3278174 0.2748849
------------------------------------------------------------------------
pp.write$ses: middle
academic general
vocation
0.4256198 0.2010864 0.3732938
Sometimes, a couple of
plots can convey a good deal amount of information. Using the
predictions we generated for the pp.write object above, we can plot
the predicted probabilities against the writing score by the level
ofsesfor
different levels of the outcome variable.
## melt data set to long
for ggplot2
lpp
melt(pp.write, id.vars
=
c("ses",
"write"), value.name
=
"probability")
head(lpp)
# view first few
rows
ses write
variable
probability
1 low 30
academic 0.09843588
2 low 31
academic 0.10716868
3 low 32
academic 0.11650390
4 low 33
academic 0.12645834
5 low 34
academic 0.13704576
6 low 35
academic 0.14827643
## plot predicted
probabilities across write values for each level of ses
## facetted by program
type
ggplot(lpp, aes(x = write, y = probability, colour =
ses)) + geom_line() + facet_grid(variable ~ ., scales =
"free")
4、其他应用包及函数
1)程序包mlogit提供了多项logit的模型拟合函数:
mlogit(formula, data, subset, weights,na.action, start =
NULL,
alt.subset = NULL, reflevel = NULL,
nests = NULL, un.nest.el = FALSE, unscaled = FALSE,
heterosc = FALSE, rpar = NULL, probit = FALSE,
R = 40, correlation = FALSE, halton = NULL,
random.nb = NULL, panel = FALSE, estimate = TRUE,
seed = 10, ...)
mlogit.data(data, choice, shape = c("wide","long"), varying =
NULL,
sep=".",alt.var = NULL, chid.var = NULL, alt.levels =
NULL,id.var = NULL, opposite = NULL, drop.index = FALSE,
ranked = FALSE, ...)
参数说明:
formula:mlogit提供了条件logit,多项logit,混合logit多种模型,对于多项logit的估计模型应写为:因变量~0|自变量,如:mode
~ 0|income
data:使用mlogit.data函数使得数据结构符合mlogit函数要求。
Choice:确定分类变量是什么
Shape:如果每一行是一个观测,我们选择wide,如果每一行是表示一个选择,那么就应该选择long。
alt.var:对于shape为long的数据,需要标明所有的选择名称
例子:
install.packages("mlogit")
library(Formula)
library(maxLik)
library(miscTools)
library(mlogit)
data("Fishing", package = "mlogit")
Fish
= "mode")
summary(mlogit(mode ~ 0|income, data =
Fish))
结果显示:
Call:
mlogit(formula = mode ~ 0|income, data = Fish, method =
"nr",
print.level = 0)
Frequencies of alternatives:
beach
boat
charter
pier
0.11337
0.35364
0.38240
0.15059
nr method
4 iterations, 0h:0m:0s
g'(-H)^-1g = 8.32E-07
gradient close to zero
Coefficients :
Estimate Std. Error t-value Pr(>t)
boat:(intercept) 7.3892e-01 1.9673e-01 3.7560 0.0001727 ***
charter:(intercept) 1.3413e+00 1.9452e-01 6.8955 5.367e-12
***
pier:(intercept) 8.1415e-01 2.2863e-01 3.5610 0.0003695 ***
boat:income 9.1906e-05 4.0664e-05 2.2602 0.0238116 *
charter:income -3.1640e-05 4.1846e-05 -0.7561 0.4495908
pier:income -1.4340e-04 5.3288e-05 -2.6911 0.0071223 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Log-Likelihood: -1477.2
McFadden R^2: 0.013736
Likelihood ratio test : chisq = 41.145 (p.value =
6.0931e-09)
2)程序包MASS提供polr()函数可以进行ordered
logit或probit回归
polr(formula, data, weights, start, ..., subset, na.action,
contrasts = NULL, Hess = FALSE, model = TRUE,
method = c("logistic", "probit", "cloglog",
"cauchit"))
参数说明:
Formula:回归形式,与lm()函数的formula参数用法一致
Data:数据集
Method:默认为order logit,选择probit时变为order
probit模型。
例子:
library(MASS)
house.plr
weights = Freq, data = housing)
summary(house.plr, digits = 3)
结果显示:
Re-fitting to get Hessian
Call:
polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights
= Freq)
Coefficients:
Value Std. Error t value
InflMedium 0.566 0.1047 5.41
InflHigh
1.289 0.1272 10.14
TypeApartment -0.572 0.1192 -4.80
TypeAtrium -0.366 0.1552 -2.36
TypeTerrace -1.091 0.1515 -7.20
ContHigh 0.360 0.0955 3.77
Intercepts:
Value Std. Error t value
Low|Medium -0.496 0.125 -3.974
Medium|High 0.691 0.125 5.505
Residual Deviance: 3479.149
AIC: 3495.149
logit回归模型假设_Logistic回归模型及应用建模(二)相关推荐
- [DataAnalysis]多元线性回归深入浅出-案例+模型假设+参数估计方法+模型评判方法+变量选择+多重共线性问题
一.案例介绍 1.目的:利用上市公司当年的公开财务指标预测来年盈利情况最重要的投资人决策依据. 2.数据来源:随机抽取深市和沪市2002和2003年的500个上市公司样本预测来年的净资产收益率. 3. ...
- logit回归模型假设_一文读懂条件Logistic回归
在医学研究中,为了控制一些重要的混杂因素,经常会把病例和对照按年龄,性别等条件进行配对,形成多个匹配组.各匹配组的病例数和对照人数是任意的,比如一个病例和若干个对照匹配即1:1,在医学上称作" ...
- logit回归模型假设_一文让你搞懂Logistic回归模型
注:本文是我和夏文俊同学共同撰写的 现考虑二值响应变量 ,比如是否购车,是否点击,是否患病等等,而 是相应的自变量或者称特征.现希望构建一个模型用于描述 和 的关系,并对 进行预测. 线性模型可以吗? ...
- logit回归模型_常见机器学习模型的假设
> Photo by Thought Catalog on Unsplash 暂时忘记深度学习和神经网络. 随着越来越多的人开始进入数据科学领域,我认为重要的是不要忘记这一切的基础. 统计. 如 ...
- R语言VaR市场风险计算方法与回测、用LOGIT逻辑回归、PROBIT模型信用风险与分类模型...
全文链接:http://tecdat.cn/?p=27530 市场风险指的是由金融市场中资产的价格下跌或价格波动增加所导致的可能损失. 相关视频 市场风险包含两种类型:相对风险和绝对风险.绝对风险关 ...
- R语言回归模型构建、回归模型基本假设(正态性、线性、独立性、方差齐性)、回归模型诊断、car包诊断回归模型、特殊观察样本分析、数据变换、模型比较、特征筛选、交叉验证、预测变量相对重要度
R语言回归模型构建.回归模型基本假设(正态性.线性.独立性.方差齐性).回归模型诊断.car包诊断回归模型.特殊观察样本分析.数据变换.模型比较.特征筛选.交叉验证.预测变量相对重要度 目录
- r语言解释回归模型的假设_模型假设-解释
r语言解释回归模型的假设 Ever heard of model assumptions? What are they? And why are they important? A model is ...
- 【skLearn 回归模型】岭回归 <linear_model.Ridge>
文章目录 一.岭回归概念 • 定义 • 岭回归处理多重共线性原理 二.linear_model.Ridge 类 案例:加利福尼亚房价 ① 读取数据集 ② 划分训练集.测试集 ③ 岭回归训练模型 ④ 使 ...
- r ridge回归_R语言逻辑回归和泊松回归模型对发生交通事故概率建模
原文链接 http://tecdat.cn/?p=14139 我们考虑风险敞口,计算包含风险敞口的多个数量(经验均值和经验方差)的非参数估计量.如果要对二项式变量建模. 这里的模型如下: 未观察到该期 ...
最新文章
- Python3 与 C# 并发编程之~ Net篇
- apache源码安装
- pandas.to_csv()中文编码问题
- 一些让人受益匪浅的话--转
- 国内数据中心制冷系统设计与发展
- Linux Shell——函数的使用
- javascript编写_如何在JavaScript中使用解构来编写更简洁,功能更强大的代码
- 开发Flex for Android第一个ANE(ActionScript Native Extensions)本地扩展
- 在线正则表达式可视化测试工具
- git rebase和 merge的区别
- Response.Redirect在新窗口打开
- UEFI开发探索50 – UEFI与网络2
- java 坑爹的黑店,大土地神系统
- 支付系统 java_PaySystem
- 工具类-随即获取姓名-ZH
- 影视账号涨粉10w,反套路营销获赞百万,小红内容趋势是什么?
- String类型转Long类型
- 全面认识二极管,一篇文章就够了
- Motion Graphics Loops: 2 After Effects Techniques 运动图形循环2: After Effects技术 Lynda课程中文字幕
- python传递指针_python值传递和指针传递