@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).

Examples

Some setup

library(pipelearner)
library(tidyverse)
library(nycflights13)# Help functions
r_square <- function(model, data) { actual <- eval(formula(model)[[2]], as.data.frame(data)) residuals <- predict(model, data) - actual 1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE)) } add_rsquare <- function(result_tbl) { result_tbl %>% mutate(rsquare_train = map2_dbl(fit, train, r_square), rsquare_test = map2_dbl(fit, test, r_square)) } # Data set d <- weather %>% select(visib, humid, precip, wind_dir) %>% drop_na() %>% sample_n(2000) # Set theme for plots theme_set(theme_minimal()) 

k-fold cross validation

results <- d %>% pipelearner(lm, visib ~ .) %>% learn_cvpairs(k = 10) %>% learn()results %>%add_rsquare() %>% select(cv_pairs.id, contains("rsquare")) %>% gather(source, rsquare, contains("rsquare")) %>%mutate(source = gsub("rsquare_", "", source)) %>% ggplot(aes(cv_pairs.id, rsquare, color = source)) + geom_point() + labs(x = "Fold", y = "R Squared") 

Learning curves

results <- d %>% pipelearner(lm, visib ~ .) %>% learn_curves(seq(.1, 1, .1)) %>% learn()results %>%add_rsquare() %>%select(train_p, contains("rsquare")) %>%gather(source, rsquare, contains("rsquare")) %>% mutate(source = gsub("rsquare_", "", source)) %>% ggplot(aes(train_p, rsquare, color = source)) + geom_line() + geom_point(size = 2) + labs(x = "Proportion of training data used", y = "R Squared") 

Grid Search

results <- d %>% pipelearner(rpart::rpart, visib ~ .,minsplit = c(2, 50, 100),cp = c(.005, .01, .1)) %>% learn() results %>% mutate(minsplit = map_dbl(params, ~ .$minsplit), cp = map_dbl(params, ~ .$cp)) %>% add_rsquare() %>% select(minsplit, cp, contains("rsquare")) %>% gather(source, rsquare, contains("rsquare")) %>% mutate(source = gsub("rsquare_", "", source), minsplit = paste("minsplit", minsplit, sep = "\n"), cp = paste("cp", cp, sep = "\n")) %>% ggplot(aes(source, rsquare, fill = source)) + geom_col() + facet_grid(minsplit ~ cp) + guides(fill = "none") + labs(x = NULL, y = "R Squared") 

Model comparisons

results <- d %>% pipelearner() %>% learn_models(c(lm, rpart::rpart, randomForest::randomForest),visib ~ .) %>% learn()results %>%add_rsquare() %>%select(model, contains("rsquare")) %>%gather(source, rsquare, contains("rsquare")) %>%mutate(source = gsub("rsquare_", "", source)) %>% ggplot(aes(model, rsquare, fill = source)) + geom_col(position = "dodge", size = .5) + labs(x = NULL, y = "R Squared") + coord_flip() 

Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors

转载于:https://www.cnblogs.com/payton/p/6251953.html

Easy machine learning pipelines with pipelearner: intro and call for contributors相关推荐

  1. 中科院计算所开源Easy Machine Learning:让机器学习应用开发简单快捷 By 机器之心2017年6月13日 13:05 今日,中科院计算所研究员徐君在微博上宣布「中科院计算所开源了

    中科院计算所开源Easy Machine Learning:让机器学习应用开发简单快捷 By 机器之心2017年6月13日 13:05 今日,中科院计算所研究员徐君在微博上宣布「中科院计算所开源了 E ...

  2. ML:MLOps系列讲解之《基于ML的软件的三个层次之02 Model: Machine Learning Pipelines——2.6 ML Model serialization forma》解读

    ML:MLOps系列讲解之<基于ML的软件的三个层次之02 Model: Machine Learning Pipelines--2.6 ML Model serialization forma ...

  3. ML:MLOps系列讲解之《基于ML的软件的三个层次之02 Model: Machine Learning Pipelines——2.5 Different forms of ML workfl》解读

    ML:MLOps系列讲解之<基于ML的软件的三个层次之02 Model: Machine Learning Pipelines--2.5 Different forms of ML workfl ...

  4. ML:MLOps系列讲解之《基于ML的软件的三个层次之02 Model: Machine Learning Pipelines 2.1~2.4》解读

    ML:MLOps系列讲解之<基于ML的软件的三个层次之02 Model: Machine Learning Pipelines 2.1~2.4>解读 目录 <基于ML的软件的三个层次 ...

  5. 【github】机器学习(Machine Learning)深度学习(Deep Learning)资料

    转自:https://github.com/ty4z2008/Qix/blob/master/dl.md# <Brief History of Machine Learning> 介绍:这 ...

  6. 机器学习(Machine Learning)深度学习(Deep Learning)资料汇总

    本文来源:https://github.com/ty4z2008/Qix/blob/master/dl.md 机器学习(Machine Learning)&深度学习(Deep Learning ...

  7. 机器学习----(Machine Learning)深度学习(Deep Learning)资料(Chapter 1)

    文章转至:作者:yf210yf  感谢您提供的资源 资料汇总的很多,转载一下也方便自己以后慢慢学习 注:机器学习资料篇目一共500条,篇目二开始更新 希望转载的朋友,你可以不用联系我.但是一定要保留原 ...

  8. 机器学习(Machine Learning)深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  9. 机器学习(Machine Learning)深度学习(Deep Learning)资料集合

    机器学习(Machine Learning)&深度学习(Deep Learning)资料 原文链接:https://github.com/ty4z2008/Qix/blob/master/dl ...

  10. 机器学习(Machine Learning)深度学习(Deep Learning)资料(Chapter 1

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

最新文章

  1. 解决eclipse中egit中的cannot open git-upload-pack问题
  2. 需求旺盛:市场提供大量机器学习与物联网相关岗位
  3. WinCE下串口虚拟软件
  4. 快头条月增迅猛超微视 三四线城市“流量炼金”的上限在哪?
  5. python3 音乐播放器_python3 音乐播放器
  6. oracle的分支语句,oracle中的分支与循环语句
  7. perl中替换文本一例
  8. Unity脚本在层级面板中的执行顺序测试3
  9. win11系统安装打印机的方法
  10. java io流不关闭_Java IO流关闭问题的深入研究
  11. 批量根据实体类生成相关的类
  12. 19个免费好用的CSS代码样式生成器工具
  13. SAS硬盘优缺点概述
  14. Android黑白照片上色APP,黑白图片上色工具
  15. yii2-imagine 使用方法
  16. 个人角度谈IE10浏览器
  17. 三电平igbt死区时间计算_一种T型三电平IGBT互补死区驱动电路的制作方法
  18. 计算机桌面锁在哪里设置,怎么设置电脑屏幕锁
  19. Debian11之Docker稳定版本安装
  20. Apache Cordova development lands on Visual Studio Code

热门文章

  1. Atitit.获取swing ui 按钮控件的id 与名字 与JPDA 调试体系
  2. atitit.面向过程的编程语言异常处理 c语言 asp vbs 的try catch 实现
  3. paip.提升效率---filter map reduce 的java 函数式编程实现
  4. paip.FTP服务架设选型
  5. 实话实说?基金公司“存量时代”的创新
  6. 蚂蚁自研数据库OceanBase基于木兰公共协议正式开源
  7. 2017中国云计算开源优秀案例
  8. Linux宝库上线,有木有get到你?
  9. 【数字信号去噪】基于matlab中值滤波+奇异值分解(SVD)数字信号降噪【含Matlab源码 1021期】
  10. 【游戏】基于 matlab GUI lanchester作战模拟设计【含Matlab源码 426期】