二、因变量多分类 logistic

回归

​1、概述:多元Logistic回归模型被用来建立有多个输出变量的模型,且这些预测变量通过一个线性组合变成为一个最终的预测变量。Multinomial

Logistic 回归模型中因变量可以取多个值.

所需的应用包:​

​library(foreign)

library(nnet)

library(ggplot2)

library(reshape2)​

2、建模例子:​

Example . Entering high school students make program

choices among general program, vocational program and academic

program. Their choice might be modeled using their writing score

and their social economic

status.​​

例子:进入高中的学生在做计划选择时,面临着一般计划、职业计划及学术计划三种选择。他们的选择往往是通过对他们的写作成绩及社会经济地位进行模型化决定。

1)读取数据:

ml

read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")

这个数据集包含了有200个学生的记录,结果变量为prog(program

type)三种项目类型,预测变量为ses(social economic

status)社会经济地位,3个分类变量(id,female,schtyp),及其他连续变量。​

2)数据初步探索​:

with(ml,

table(ses, prog))

prog

ses

general

academic  vocation

low

16

19

12

middle

20

44

31

high

9

42

7​

按照prog分组统计写作得分的均值及标准方差

with(ml, do.call(rbind, tapply(write, prog, function(x) c(M =

mean(x), SD = sd(x)))))

​M

SD

general  51.33333 9.397775

academic 56.25714 7.943343

vocation 46.76000 9.318754

3)建立模型:

利用nnet包中的函数multinom,建立多元logistic回归模型:

Before

running our model. We then choose the level of our outcome that we

wish to use as our baseline and specify this in

therelevelfunction.

Then, we run our model usingmultinom.

Themultinompackage

does not include p-value calculation for the regression

coefficients, so we calculate p-values using Wald tests (here

z-tests).

在建立模型前,我们选择输出变量的水平,并利用函数​relevel设置prog为哑变量(虚拟变量),以academic作为参考水平:

ml$prog2

"academic")

​test

~ ses + write, data = ml)

summary(test)​

结果显示:

Call:

multinom(formula = prog2 ~ ses + write, data =

ml)

Coefficients:

(Intercept)  sesmiddle

seshigh

write

general     2.852198

-0.5332810 -1.1628226 -0.0579287

vocation    5.218260

0.2913859 -0.9826649 -0.1136037

Std. Errors:

(Intercept) sesmiddle   seshigh

write

general     1.166441

0.4437323 0.5142196 0.02141097

vocation    1.163552

0.4763739 0.5955665 0.02221996

Residual Deviance: 359.9635

AIC: 375.9635

​z

summary(test)$coefficients/summary(test)$standard.errors

(Intercept)  sesmiddle

seshigh

write

general     2.445214

-1.2018081 -2.261334 -2.705562

vocation    4.484769

0.6116747 -1.649967 -5.112689

​2-tailed z

test

​p

-

pnorm(abs(z),0, 1)) * 2

​(Intercept)

sesmiddle

seshigh

write

general  0.0144766100 0.2294379

0.02373856 6.818902e-03

vocation 0.0000072993 0.5407530 0.09894976

3.176045e-07

​The model summary output

has a block of coefficients and a block of standard errors.

模型的结果输出为一系列的系数及相应的标准差,每一行参数都反应了一个模型的等式。Each of

these blocks has one row of values corresponding to a model

equation. Focusing on the block of coefficients, we can

look at the first row

comparing ​prog =

"general"to our

baselineprog =

"academic"and the second row

comparingprog =

"vocation"to our

baselineprog =

"academic". If we consider our coefficients from the

first row to be b_1 and our coefficients from the second row to be

b_2, we can write our model equations:

\[ln\left(\frac{P(prog=general)}{P(prog=academic)}\right)

= b_{10} + b_{11}(ses=2) + b_{12}(ses=3) +

b_{13}write\] ​

模型公式一

\[ln\left(\frac{P(prog=vocation)}{P(prog=academic)}\right)

= b_{20} + b_{21}(ses=2) + b_{22}(ses=3) + b_{23}write\]

模型公式二

​## extract the

coefficients from the model and exponentiate

exp(coef(test))

(Intercept) sesmiddle

seshigh

write

general     17.32582

0.5866769 0.3126026 0.9437172

vocation   184.61262 1.3382809 0.3743123

0.8926116

解释:

The relative risk ratio for a one-unit

increase in the

variable write is

.9437 for being in general program vs. academic

program.

The relative risk ratio switching

from ses= 1 to 3 is .3126 for being

in general program vs. academic program.

​You can also use predicted

probabilities to help you understand the model.

You can calculate predicted probabilities for

each of our outcome levels using

thefittedfunction.

We can start by generating the predicted probabilities for the

observations in our dataset and viewing the first few

rows

head(pp

​academic

general  vocation

1 0.1482764 0.3382454 0.5134781

2 0.1202017 0.1806283 0.6991700

3 0.4186747 0.2368082 0.3445171

4 0.1726885 0.3508384 0.4764731

5 0.1001231 0.1689374 0.7309395

6 0.3533566 0.2377976 0.4088458

Next, if we want to examine the changes in predicted

probability associated with one of our two variables, we can create

small datasets varying one variable while holding the other

constant. We will first do this

holdingwriteat

its mean and examining the predicted probabilities for each level

ofses.​

dses

data.frame(ses

=

c("low",

"middle",

"high"), write

=

mean(ml$write))​

ses

write

1

low

52.775

2 middle   52.775

3

high

52.775

predict(test, newdata = dses,

"probs")

​academic

general  vocation

1 0.4396845 0.3581917 0.2021238

2 0.4777488 0.2283353 0.2939159

3 0.7009007 0.1784939 0.1206054

​Another way to understand

the model using the predicted probabilities is to look at the

averaged predicted probabilities for different values of the

continuous predictor

variablewritewithin

each level

ofses.

​dwrite

data.frame(ses = rep(c("low", "middle", "high"), each = 41), write

= rep(c(30:70),3))

​##

store the predicted probabilities for each value of ses and

write

pp.write

cbind(dwrite, predict(test, newdata = dwrite, type = "probs", se =

TRUE))

## calculate the mean probabilities within each level of

ses

​by(pp.write[,

3:5],

pp.write$ses,

colMeans)​

​pp.write$ses: high

academic   general

vocation

0.6164315 0.1808037 0.2027648

------------------------------------------------------------------------

pp.write$ses: low

academic   general

vocation

0.3972977 0.3278174 0.2748849

------------------------------------------------------------------------

pp.write$ses: middle

academic   general

vocation

0.4256198 0.2010864 0.3732938

​Sometimes, a couple of

plots can convey a good deal amount of information. Using the

predictions we generated for the pp.write object above, we can plot

the predicted probabilities against the writing score by the level

ofsesfor

different levels of the outcome variable.

​## melt data set to long

for ggplot2

​lpp

melt(pp.write, id.vars

=

c("ses",

"write"), value.name

=

"probability")

​head(lpp)

​ # view first few

rows

​ses     write

variable

probability

1 low    30

academic  0.09843588

2 low    31

academic  0.10716868

3 low    32

academic  0.11650390

4 low    33

academic  0.12645834

5 low    34

academic  0.13704576

6 low    35

academic  0.14827643

​## plot predicted

probabilities across write values for each level of ses

​## facetted by program

type

ggplot(lpp, aes(x = write, y = probability, colour =

ses)) + geom_line() + facet_grid(variable ~ ., scales =

"free")​

4、其他应用包及函数​

1)程序包mlogit提供了多项logit的模型拟合函数:

mlogit(formula, data, subset, weights,na.action, start =

NULL,

alt.subset = NULL, reflevel = NULL,

nests = NULL, un.nest.el = FALSE, unscaled = FALSE,

heterosc = FALSE, rpar = NULL, probit = FALSE,

R = 40, correlation = FALSE, halton = NULL,

random.nb = NULL, panel = FALSE, estimate = TRUE,

seed = 10, ...)​​

mlogit.data(data, choice, shape = c("wide","long"), varying =

NULL,

sep=".",alt.var = NULL, chid.var = NULL, alt.levels =

NULL,id.var = NULL, opposite = NULL, drop.index = FALSE,

ranked = FALSE, ...)

参数说明:

formula:mlogit提供了条件logit,多项logit,混合logit多种模型,对于多项logit的估计模型应写为:因变量~0|自变量,如:mode

~ 0|income

data:使用mlogit.data函数使得数据结构符合mlogit函数要求。

Choice:确定分类变量是什么

Shape:如果每一行是一个观测,我们选择wide,如果每一行是表示一个选择,那么就应该选择long。

alt.var:对于shape为long的数据,需要标明所有的选择名称​

例子:​

install.packages("mlogit")

library(Formula)

library(maxLik)

library(miscTools)

library(mlogit)​

data("Fishing", package = "mlogit")​

Fish

= "mode")​

summary(mlogit(mode ~ 0|income, data =

Fish))​

结果显示:​

Call:

mlogit(formula = mode ~ 0|income, data = Fish, method =

"nr",

print.level = 0)

Frequencies of alternatives:

beach

boat

charter

pier

0.11337

0.35364

0.38240

0.15059

nr method

4 iterations, 0h:0m:0s

g'(-H)^-1g = 8.32E-07

gradient close to zero

Coefficients :

Estimate Std. Error t-value Pr(>t)

boat:(intercept) 7.3892e-01 1.9673e-01 3.7560 0.0001727 ***

charter:(intercept) 1.3413e+00 1.9452e-01 6.8955 5.367e-12

***

pier:(intercept) 8.1415e-01 2.2863e-01 3.5610 0.0003695 ***

boat:income 9.1906e-05 4.0664e-05 2.2602 0.0238116 *

charter:income -3.1640e-05 4.1846e-05 -0.7561 0.4495908

pier:income -1.4340e-04 5.3288e-05 -2.6911 0.0071223 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

1

Log-Likelihood: -1477.2

McFadden R^2: 0.013736

Likelihood ratio test : chisq = 41.145 (p.value =

6.0931e-09)

2)程序包MASS提供polr()函数可以进行ordered

logit或probit回归

polr(formula, data, weights, start, ..., subset, na.action,

contrasts = NULL, Hess = FALSE, model = TRUE,

method = c("logistic", "probit", "cloglog",

"cauchit"))​

参数说明:

Formula:回归形式,与lm()函数的formula参数用法一致

Data:数据集

Method:默认为order logit,选择probit时变为order

probit模型。​​

例子:​

library(MASS)

house.plr

weights = Freq, data = housing)

summary(house.plr, digits = 3)

结果显示:

Re-fitting to get Hessian

Call:

polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights

= Freq)

Coefficients:

Value Std. Error t value

InflMedium 0.566 0.1047 5.41

InflHigh

1.289 0.1272 10.14

TypeApartment -0.572 0.1192 -4.80

TypeAtrium -0.366 0.1552 -2.36

TypeTerrace -1.091 0.1515 -7.20

ContHigh 0.360 0.0955 3.77

Intercepts:

Value Std. Error t value

Low|Medium -0.496 0.125 -3.974

Medium|High 0.691 0.125 5.505

Residual Deviance: 3479.149

AIC: 3495.149

logit回归模型假设_Logistic回归模型及应用建模(二)相关推荐

  1. [DataAnalysis]多元线性回归深入浅出-案例+模型假设+参数估计方法+模型评判方法+变量选择+多重共线性问题

    一.案例介绍 1.目的:利用上市公司当年的公开财务指标预测来年盈利情况最重要的投资人决策依据. 2.数据来源:随机抽取深市和沪市2002和2003年的500个上市公司样本预测来年的净资产收益率. 3. ...

  2. logit回归模型假设_一文读懂条件Logistic回归

    在医学研究中,为了控制一些重要的混杂因素,经常会把病例和对照按年龄,性别等条件进行配对,形成多个匹配组.各匹配组的病例数和对照人数是任意的,比如一个病例和若干个对照匹配即1:1,在医学上称作" ...

  3. logit回归模型假设_一文让你搞懂Logistic回归模型

    注:本文是我和夏文俊同学共同撰写的 现考虑二值响应变量 ,比如是否购车,是否点击,是否患病等等,而 是相应的自变量或者称特征.现希望构建一个模型用于描述 和 的关系,并对 进行预测. 线性模型可以吗? ...

  4. logit回归模型_常见机器学习模型的假设

    > Photo by Thought Catalog on Unsplash 暂时忘记深度学习和神经网络. 随着越来越多的人开始进入数据科学领域,我认为重要的是不要忘记这一切的基础. 统计. 如 ...

  5. R语言VaR市场风险计算方法与回测、用LOGIT逻辑回归、PROBIT模型信用风险与分类模型...

    全文链接:http://tecdat.cn/?p=27530  市场风险指的是由金融市场中资产的价格下跌或价格波动增加所导致的可能损失. 相关视频 市场风险包含两种类型:相对风险和绝对风险.绝对风险关 ...

  6. R语言回归模型构建、回归模型基本假设(正态性、线性、独立性、方差齐性)、回归模型诊断、car包诊断回归模型、特殊观察样本分析、数据变换、模型比较、特征筛选、交叉验证、预测变量相对重要度

    R语言回归模型构建.回归模型基本假设(正态性.线性.独立性.方差齐性).回归模型诊断.car包诊断回归模型.特殊观察样本分析.数据变换.模型比较.特征筛选.交叉验证.预测变量相对重要度 目录

  7. r语言解释回归模型的假设_模型假设-解释

    r语言解释回归模型的假设 Ever heard of model assumptions? What are they? And why are they important? A model is ...

  8. 【skLearn 回归模型】岭回归 <linear_model.Ridge>

    文章目录 一.岭回归概念 • 定义 • 岭回归处理多重共线性原理 二.linear_model.Ridge 类 案例:加利福尼亚房价 ① 读取数据集 ② 划分训练集.测试集 ③ 岭回归训练模型 ④ 使 ...

  9. r ridge回归_R语言逻辑回归和泊松回归模型对发生交通事故概率建模

    原文链接 http://tecdat.cn/?p=14139 我们考虑风险敞口,计算包含风险敞口的多个数量(经验均值和经验方差)的非参数估计量.如果要对二项式变量建模. 这里的模型如下: 未观察到该期 ...

最新文章

  1. Python3 与 C# 并发编程之~ Net篇
  2. apache源码安装
  3. pandas.to_csv()中文编码问题
  4. 一些让人受益匪浅的话--转
  5. 国内数据中心制冷系统设计与发展
  6. Linux Shell——函数的使用
  7. javascript编写_如何在JavaScript中使用解构来编写更简洁,功能更强大的代码
  8. 开发Flex for Android第一个ANE(ActionScript Native Extensions)本地扩展
  9. 在线正则表达式可视化测试工具
  10. git rebase和 merge的区别
  11. Response.Redirect在新窗口打开
  12. UEFI开发探索50 – UEFI与网络2
  13. java 坑爹的黑店,大土地神系统
  14. 支付系统 java_PaySystem
  15. 工具类-随即获取姓名-ZH
  16. 影视账号涨粉10w,反套路营销获赞百万,小红内容趋势是什么?
  17. String类型转Long类型
  18. 全面认识二极管,一篇文章就够了
  19. Motion Graphics Loops: 2 After Effects Techniques 运动图形循环2: After Effects技术 Lynda课程中文字幕
  20. python传递指针_python值传递和指针传递

热门文章

  1. 2D Application
  2. 方正无盘服务器,方正科技改革大学图书馆电子阅览室
  3. C语言通讯录管理系统(含完整代码)
  4. 觉得算法难的看这里-算法动画图解的App算法宝开发历程分享
  5. AVOD论文和代码解析
  6. 126邮箱设置html发送邮件,126邮箱设置pop3
  7. 美颜SDK如何进行Android和iOS双端开发?
  8. 使用C#开发Word VSTO外接程序示例
  9. 山东办理高新技术企业可以享受的税收优惠
  10. 【Matlab】开发环境介绍及学习方法