文章目录

  • 介绍
  • rpart()的使用方法
    • 参数介绍
    • 实例
  • 对rapart对象的美观显示包rattle
    • fancyRpartPlot()的用法
    • 参数介绍
    • 实例

介绍

rpart包中的rpart()函数可以实现决策树和回归树的建模

rpart()的使用方法

rpart(formula, data, weights, subset, na.action = na.rpart, method,model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ...)

参数介绍

  • formula
    a formula, with a response but no interaction terms. If this a a data frame, that is taken as the model frame (see model.frame).

  • data
    an optional data frame in which to interpret the variables named in the formula.

  • weights
    optional case weights.

  • subset
    optional expression saying that only a subset of the rows of the data should be used in the fit.

  • na.action
    the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing.

  • method
    one of “anova”, “poisson”, “class” or “exp”. If method is missing then the routine tries to make an intelligent guess. If y is a survival object, then method = “exp” is assumed, if y has 2 columns then method = “poisson” is assumed, if y is a factor then method = “class” is assumed, otherwise method = “anova” is assumed. It is wisest to specify the method directly, especially as more criteria may added to the function in future.
    Alternatively, method can be a list of functions named init, split and eval. Examples are given in the file ‘tests/usersplits.R’ in the sources, and in the vignettes ‘User Written Split Functions’.

  • model
    if logical: keep a copy of the model frame in the result? If the input value for model is a model frame (likely from an earlier call to the rpart function), then this frame is used rather than constructing new data.

  • x
    keep a copy of the x matrix in the result.

  • y
    keep a copy of the dependent variable in the result. If missing and model is supplied this defaults to FALSE.

  • parms
    optional parameters for the splitting function.
    Anova splitting has no parameters.
    Poisson splitting has a single parameter, the coefficient of variation of the prior distribution on the rates. The default value is 1.
    Exponential splitting has the same parameter as Poisson.
    For classification splitting, the list can contain any of: the vector of prior probabilities (component prior), the loss matrix (component loss) or the splitting index (component split). The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive off-diagonal elements. The splitting index can be gini or information. The default priors are proportional to the data counts, the losses default to 1, and the split defaults to gini. 例如:parms = list(prior = c(0.65,0.35), split = “information”))

  • control
    a list of options that control details of the rpart algorithm. See rpart.control.

  • cost
    a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose.


  • arguments to rpart.control may also be specified in the call to rpart. They are checked against the list of valid arguments.

实例


fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit2 <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,method = 'class',parms = list(prior = c(.65,.35), split = "information"))
fit3 <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,control = rpart.control(cp = 0.05))
plot(fit)
text(fit, use.n = TRUE)
plot(fit2)
text(fit2, use.n = TRUE)
plot(fit3)
text(fit3, use.n = TRUE)


对rapart对象的美观显示包rattle

rattle包中的fancyRpartPlot()可以使rpart对象得到更好的显示

fancyRpartPlot()的用法

fancyRpartPlot(model, main="", sub, caption, palettes, type=2, ...)

参数介绍

  • model
    an rpart object.

  • main
    title for the plot.

  • sub
    sub title for the plot. The default is a Rattle string with date, time and username.

  • caption
    caption for bottom right of plot.

  • palettes
    a list of sequential palettes names. As supported by RColorBrewer::brewer.pal the available names are Blues BuGn BuPu GnBu Greens Greys Oranges OrRd PuBu PuBuGn PuRd Purples RdPu Reds YlGn YlGnBu YlOrBr YlOrRd.

  • type
    the type of plot to generate (2).


  • additional arguments passed on to prp.

实例

## Set up the data for modelling.
library(rattle)
library(rpart)
set.seed(42)
ds     <- weather
target <- "RainTomorrow"
risk   <- "RISK_MM"
ignore <- c("Date", "Location", risk)
vars   <- setdiff(names(ds), ignore)
nobs   <- nrow(ds)
form   <- formula(paste(target, "~ ."))
train  <- sample(nobs, 0.7*nobs)
test   <- setdiff(seq_len(nobs), train)
actual <- ds[test, target]
risks  <- ds[test, risk]# Fit the model.fit <- rpart(form, data=ds[train, vars])## Plot the model.fancyRpartPlot(fit)## Choose different colours.fancyRpartPlot(fit, main='test',sub='test1',caption='Let me think',palettes=c("Greys", "Oranges"),type=1)

R语言实现决策回归树的包rpart相关推荐

  1. R语言构建logistic回归模型:WVPlots包PRTPlot函数可视化获取logistic回归模型的最优阈值、优化(precision、enrichment)和recall之间的折衷

    R语言构建logistic回归模型:WVPlots包PRTPlot函数可视化获取logistic回归模型的最佳阈值(改变阈值以优化精确度(precision.enrichment)和查全率(recal ...

  2. R语言构建logistic回归模型并评估模型:模型预测结果抽样、可视化模型分类预测的概率分布情况、使用WVPlots包绘制ROC曲线并计算AUC值

    R语言构建logistic回归模型并评估模型:模型预测结果抽样.可视化模型分类预测的概率分布情况.使用WVPlots包绘制ROC曲线并计算AUC值 目录

  3. R语言系统自带及附属包开元可用数据集汇总

    R语言系统自带及附属包开元可用数据集汇总 目录 R语言系统自带及附属包开元可用数据集汇总 #R自带数据集 #R的各种包自带数据集 #R自带数据集 向量 euro #欧元汇率,长度为11,每个元素都有命 ...

  4. R语言可视化学习笔记之ggridges包绘制山峦图

    作者:严涛 浙江大学作物遗传育种在读研究生(生物信息学方向)伪码农,R语言爱好者,爱开源. 严涛老师的绘图教程还有: gganimate |诺奖文章里面的动图绘制教程来了!! ggplot2学习笔记之 ...

  5. 语言nomogram校准曲线图_医学统计与R语言:Meta 回归作图(Meta regression Plot)

    微信公众号:医学统计与R语言如果你觉得对你有帮助,欢迎转发 输入1: install.packages("metafor") library(metafor) dat.bcg 结果 ...

  6. R语言主成分回归(PCR)、 多元线性回归特征降维分析光谱数据和汽车油耗、性能数据...

    原文链接:http://tecdat.cn/?p=24152 什么是PCR?(PCR = PCA + MLR)(点击文末"阅读原文"获取完整代码数据). • PCR是处理许多 x ...

  7. 手把手教你使用R语言做LASSO 回归

    LASSO 回归也叫套索回归,是通过生成一个惩罚函数是回归模型中的变量系数进行压缩,达到防止过度拟合,解决严重共线性的问题,LASSO 回归最先由英国人Robert Tibshirani提出,目前在预 ...

  8. 决策回归树回归算法30

    1.决策回归树原理概述 与分类树一样 裂分指标,使用的是MSE.MAE MSE(y,y^)=1nsamples∑i=0nsamples−1(yi−y^i)2\text{MSE}(y, \hat{y}) ...

  9. R语言中的管道操作——magrittr包

    R语言中的管道操作--magrittr包 一.项目环境 开发工具:RStudio R:3.5.2 相关包:magritter 二.数据准备以及问题阐述 这次要解决的问题是如何使用提取数据框中所有包含缺 ...

最新文章

  1. lncRNA研究利器之TANRIC
  2. mysql 数据库 限制大小_MySQL数据库表各种大小限制小结
  3. 行为模式之Visitor模式
  4. 【转】Xcode7.1环境下上架iOS App到AppStore 流程 -- 不错!!
  5. ES中如何使用逗号来分词
  6. AI学习笔记--人机对话的四种形态
  7. DES, TripleDES and BlowFish in Silverlight
  8. Mac电脑上怎么添加密码提示?操作教程来啦!
  9. python简单查询用户
  10. Robocode:基础知识及入门示例
  11. labuladong算法小结
  12. 配置计算机能不能关机,详细教你电脑自动关机怎么设置
  13. php5.2.17 pecl,php pecl的使用
  14. jquery常用方法之siblings方法
  15. 【项目10】python+flask搭建CNN在线识别手写中文网站
  16. android 点赞手型,在朋友圈,你是哪种点赞型人格?
  17. 用阿里云搭建Http代理服务器
  18. 避免我们的邮件服务器发出的邮件被当成垃圾邮件
  19. TCL info命令
  20. 微信小程序(一):小程序中使用EChart、控制EChart显示隐藏及数据懒加载

热门文章

  1. 五角大楼的眼中钉 - 维基揭秘创始人
  2. location对象的方法
  3. The Programmer's Oath程序员的誓言----鲍勃·马丁大叔(Bob Martin)
  4. Windows Practice(八)_MFC
  5. 472计算机毕业设计
  6. 用appcan写个学校的图书馆app
  7. oracle数据库connectionstring,C#如何链接本地Oracle数据库 ConnectionString BadImageError报错...
  8. 2021.11浙江高考成绩查询,2021浙江教师资格证
  9. sgreen服务器未响应,SGreen浏览器
  10. c语言union字节相同大小不同,C语言的struct/union字节对齐