logit回归模型假设_Logistic回归模型及应用建模（二）

二、因变量多分类 logistic

回归

1、概述：多元Logistic回归模型被用来建立有多个输出变量的模型，且这些预测变量通过一个线性组合变成为一个最终的预测变量。Multinomial

Logistic 回归模型中因变量可以取多个值.

所需的应用包：

library(foreign)

library(nnet)

library(ggplot2)

library(reshape2)

2、建模例子：

Example . Entering high school students make program

choices among general program, vocational program and academic

program. Their choice might be modeled using their writing score

and their social economic

status.

例子：进入高中的学生在做计划选择时，面临着一般计划、职业计划及学术计划三种选择。他们的选择往往是通过对他们的写作成绩及社会经济地位进行模型化决定。

1)读取数据：

read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")

这个数据集包含了有200个学生的记录，结果变量为prog(program

type)三种项目类型，预测变量为ses(social economic

status)社会经济地位，3个分类变量(id,female,schtyp),及其他连续变量。

2)数据初步探索：

with(ml,

table(ses, prog))

prog

ses

general

academic vocation

low

middle

high

按照prog分组统计写作得分的均值及标准方差

with(ml, do.call(rbind, tapply(write, prog, function(x) c(M =

mean(x), SD = sd(x)))))

general 51.33333 9.397775

academic 56.25714 7.943343

vocation 46.76000 9.318754

3)建立模型：

利用nnet包中的函数multinom，建立多元logistic回归模型：

Before

running our model. We then choose the level of our outcome that we

wish to use as our baseline and specify this in

therelevelfunction.

Then, we run our model usingmultinom.

Themultinompackage

does not include p-value calculation for the regression

coefficients, so we calculate p-values using Wald tests (here

z-tests).

在建立模型前，我们选择输出变量的水平，并利用函数relevel设置prog为哑变量(虚拟变量)，以academic作为参考水平：

ml$prog2

"academic")

test

~ ses + write, data = ml)

summary(test)

结果显示：

Call:

multinom(formula = prog2 ~ ses + write, data =

ml)

Coefficients:

(Intercept) sesmiddle

seshigh

write

general 2.852198

-0.5332810 -1.1628226 -0.0579287

vocation 5.218260

0.2913859 -0.9826649 -0.1136037

Std. Errors:

(Intercept) sesmiddle seshigh

write

general 1.166441

0.4437323 0.5142196 0.02141097

vocation 1.163552

0.4763739 0.5955665 0.02221996

Residual Deviance: 359.9635

AIC: 375.9635

summary(test)$coefficients/summary(test)$standard.errors

(Intercept) sesmiddle

seshigh

write

general 2.445214

-1.2018081 -2.261334 -2.705562

vocation 4.484769

0.6116747 -1.649967 -5.112689

2-tailed z

test

pnorm(abs(z),0, 1)) * 2

(Intercept)

sesmiddle

seshigh

write

general 0.0144766100 0.2294379

0.02373856 6.818902e-03

vocation 0.0000072993 0.5407530 0.09894976

3.176045e-07

The model summary output

has a block of coefficients and a block of standard errors.

模型的结果输出为一系列的系数及相应的标准差，每一行参数都反应了一个模型的等式。Each of

these blocks has one row of values corresponding to a model

equation. Focusing on the block of coefficients, we can

look at the first row

comparing prog =

"general"to our

baselineprog =

"academic"and the second row

comparingprog =

"vocation"to our

baselineprog =

"academic". If we consider our coefficients from the

first row to be b_1 and our coefficients from the second row to be

b_2, we can write our model equations:

\[ln\left(\frac{P(prog=general)}{P(prog=academic)}\right)

= b_{10} + b_{11}(ses=2) + b_{12}(ses=3) +

b_{13}write\]

模型公式一

\[ln\left(\frac{P(prog=vocation)}{P(prog=academic)}\right)

= b_{20} + b_{21}(ses=2) + b_{22}(ses=3) + b_{23}write\]

模型公式二

## extract the

coefficients from the model and exponentiate

exp(coef(test))

(Intercept) sesmiddle

seshigh

write

general 17.32582

0.5866769 0.3126026 0.9437172

vocation 184.61262 1.3382809 0.3743123

0.8926116

解释：

The relative risk ratio for a one-unit

increase in the

variable write is

.9437 for being in general program vs. academic

program.

The relative risk ratio switching

from ses= 1 to 3 is .3126 for being

in general program vs. academic program.

You can also use predicted

probabilities to help you understand the model.

You can calculate predicted probabilities for

each of our outcome levels using

thefittedfunction.

We can start by generating the predicted probabilities for the

observations in our dataset and viewing the first few

rows

head(pp

academic

general vocation

1 0.1482764 0.3382454 0.5134781

2 0.1202017 0.1806283 0.6991700

3 0.4186747 0.2368082 0.3445171

4 0.1726885 0.3508384 0.4764731

5 0.1001231 0.1689374 0.7309395

6 0.3533566 0.2377976 0.4088458

Next, if we want to examine the changes in predicted

probability associated with one of our two variables, we can create

small datasets varying one variable while holding the other

constant. We will first do this

holdingwriteat

its mean and examining the predicted probabilities for each level

ofses.

dses

data.frame(ses

c("low",

"middle",

"high"), write

mean(ml$write))

ses

write

low

52.775

2 middle 52.775

high

52.775

predict(test, newdata = dses,

"probs")

academic

general vocation

1 0.4396845 0.3581917 0.2021238

2 0.4777488 0.2283353 0.2939159

3 0.7009007 0.1784939 0.1206054

Another way to understand

the model using the predicted probabilities is to look at the

averaged predicted probabilities for different values of the

continuous predictor

variablewritewithin

each level

ofses.

dwrite

data.frame(ses = rep(c("low", "middle", "high"), each = 41), write

= rep(c(30:70),3))

store the predicted probabilities for each value of ses and

write

pp.write

cbind(dwrite, predict(test, newdata = dwrite, type = "probs", se =

TRUE))

## calculate the mean probabilities within each level of

ses

by(pp.write[,

3:5],

pp.write$ses,

colMeans)

pp.write$ses: high

academic general

vocation

0.6164315 0.1808037 0.2027648

------------------------------------------------------------------------

pp.write$ses: low

academic general

vocation

0.3972977 0.3278174 0.2748849

------------------------------------------------------------------------

pp.write$ses: middle

academic general

vocation

0.4256198 0.2010864 0.3732938

Sometimes, a couple of

plots can convey a good deal amount of information. Using the

predictions we generated for the pp.write object above, we can plot

the predicted probabilities against the writing score by the level

ofsesfor

different levels of the outcome variable.

## melt data set to long

for ggplot2

lpp

melt(pp.write, id.vars

c("ses",

"write"), value.name

"probability")

head(lpp)

# view first few

rows

ses write

variable

probability

1 low 30

academic 0.09843588

2 low 31

academic 0.10716868

3 low 32

academic 0.11650390

4 low 33

academic 0.12645834

5 low 34

academic 0.13704576

6 low 35

academic 0.14827643

## plot predicted

probabilities across write values for each level of ses

## facetted by program

type

ggplot(lpp, aes(x = write, y = probability, colour =

ses)) + geom_line() + facet_grid(variable ~ ., scales =

"free")

4、其他应用包及函数

1)程序包mlogit提供了多项logit的模型拟合函数：

mlogit(formula, data, subset, weights,na.action, start =

NULL,

alt.subset = NULL, reflevel = NULL,

nests = NULL, un.nest.el = FALSE, unscaled = FALSE,

heterosc = FALSE, rpar = NULL, probit = FALSE,

R = 40, correlation = FALSE, halton = NULL,

random.nb = NULL, panel = FALSE, estimate = TRUE,

seed = 10, ...)

mlogit.data(data, choice, shape = c("wide","long"), varying =

NULL,

sep=".",alt.var = NULL, chid.var = NULL, alt.levels =

NULL,id.var = NULL, opposite = NULL, drop.index = FALSE,

ranked = FALSE, ...)

参数说明：

formula：mlogit提供了条件logit，多项logit，混合logit多种模型，对于多项logit的估计模型应写为：因变量~0|自变量,如：mode

~ 0|income

data：使用mlogit.data函数使得数据结构符合mlogit函数要求。

Choice：确定分类变量是什么

Shape：如果每一行是一个观测，我们选择wide，如果每一行是表示一个选择，那么就应该选择long。

alt.var：对于shape为long的数据，需要标明所有的选择名称

例子：

install.packages("mlogit")

library(Formula)

library(maxLik)

library(miscTools)

library(mlogit)

data("Fishing", package = "mlogit")

Fish

= "mode")

summary(mlogit(mode ~ 0|income, data =

Fish))

结果显示：

Call:

mlogit(formula = mode ~ 0|income, data = Fish, method =

"nr",

print.level = 0)

Frequencies of alternatives:

beach

boat

charter

pier

0.11337

0.35364

0.38240

0.15059

nr method

4 iterations, 0h:0m:0s

g'(-H)^-1g = 8.32E-07

gradient close to zero

Coefficients :

Estimate Std. Error t-value Pr(>t)

boat:(intercept) 7.3892e-01 1.9673e-01 3.7560 0.0001727 ***

charter:(intercept) 1.3413e+00 1.9452e-01 6.8955 5.367e-12

***

pier:(intercept) 8.1415e-01 2.2863e-01 3.5610 0.0003695 ***

boat:income 9.1906e-05 4.0664e-05 2.2602 0.0238116 *

charter:income -3.1640e-05 4.1846e-05 -0.7561 0.4495908

pier:income -1.4340e-04 5.3288e-05 -2.6911 0.0071223 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

Log-Likelihood: -1477.2

McFadden R^2: 0.013736

Likelihood ratio test : chisq = 41.145 (p.value =

6.0931e-09)

2)程序包MASS提供polr()函数可以进行ordered

logit或probit回归

polr(formula, data, weights, start, ..., subset, na.action,

contrasts = NULL, Hess = FALSE, model = TRUE,

method = c("logistic", "probit", "cloglog",

"cauchit"))

参数说明：

Formula：回归形式，与lm()函数的formula参数用法一致

Data：数据集

Method：默认为order logit，选择probit时变为order

probit模型。

例子：

library(MASS)

house.plr

weights = Freq, data = housing)

summary(house.plr, digits = 3)

结果显示：

Re-fitting to get Hessian

Call:

polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights

= Freq)

Coefficients:

Value Std. Error t value

InflMedium 0.566 0.1047 5.41

InflHigh

1.289 0.1272 10.14

TypeApartment -0.572 0.1192 -4.80

TypeAtrium -0.366 0.1552 -2.36

TypeTerrace -1.091 0.1515 -7.20

ContHigh 0.360 0.0955 3.77

Intercepts:

Value Std. Error t value

Low|Medium -0.496 0.125 -3.974

Medium|High 0.691 0.125 5.505

Residual Deviance: 3479.149

AIC: 3495.149

logit回归模型假设_Logistic回归模型及应用建模（二）相关推荐

[DataAnalysis]多元线性回归深入浅出-案例+模型假设+参数估计方法+模型评判方法+变量选择+多重共线性问题
一.案例介绍 1.目的:利用上市公司当年的公开财务指标预测来年盈利情况最重要的投资人决策依据. 2.数据来源:随机抽取深市和沪市2002和2003年的500个上市公司样本预测来年的净资产收益率. 3. ...
logit回归模型假设_一文读懂条件Logistic回归
在医学研究中,为了控制一些重要的混杂因素,经常会把病例和对照按年龄,性别等条件进行配对,形成多个匹配组.各匹配组的病例数和对照人数是任意的,比如一个病例和若干个对照匹配即1:1,在医学上称作" ...
logit回归模型假设_一文让你搞懂Logistic回归模型
注:本文是我和夏文俊同学共同撰写的现考虑二值响应变量 ,比如是否购车,是否点击,是否患病等等,而是相应的自变量或者称特征.现希望构建一个模型用于描述和的关系,并对进行预测. 线性模型可以吗? ...
logit回归模型_常见机器学习模型的假设
> Photo by Thought Catalog on Unsplash 暂时忘记深度学习和神经网络. 随着越来越多的人开始进入数据科学领域,我认为重要的是不要忘记这一切的基础. 统计. 如 ...
R语言VaR市场风险计算方法与回测、用LOGIT逻辑回归、PROBIT模型信用风险与分类模型...
全文链接:http://tecdat.cn/?p=27530 市场风险指的是由金融市场中资产的价格下跌或价格波动增加所导致的可能损失. 相关视频市场风险包含两种类型:相对风险和绝对风险.绝对风险关 ...
R语言回归模型构建、回归模型基本假设（正态性、线性、独立性、方差齐性）、回归模型诊断、car包诊断回归模型、特殊观察样本分析、数据变换、模型比较、特征筛选、交叉验证、预测变量相对重要度
R语言回归模型构建.回归模型基本假设(正态性.线性.独立性.方差齐性).回归模型诊断.car包诊断回归模型.特殊观察样本分析.数据变换.模型比较.特征筛选.交叉验证.预测变量相对重要度目录
r语言解释回归模型的假设_模型假设-解释
r语言解释回归模型的假设 Ever heard of model assumptions? What are they? And why are they important? A model is ...
【skLearn 回归模型】岭回归＜linear_model.Ridge＞
文章目录一.岭回归概念 • 定义 • 岭回归处理多重共线性原理二.linear_model.Ridge 类案例:加利福尼亚房价 ① 读取数据集 ② 划分训练集.测试集 ③ 岭回归训练模型 ④ 使 ...
r ridge回归_R语言逻辑回归和泊松回归模型对发生交通事故概率建模
原文链接 http://tecdat.cn/?p=14139 我们考虑风险敞口,计算包含风险敞口的多个数量(经验均值和经验方差)的非参数估计量.如果要对二项式变量建模. 这里的模型如下: 未观察到该期 ...

logit回归模型假设_Logistic回归模型及应用建模（二）

logit回归模型假设_Logistic回归模型及应用建模（二）相关推荐

最新文章

热门文章