@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).

Examples

Some setup

library(pipelearner)
library(tidyverse)
library(nycflights13)# Help functions
r_square <- function(model, data) { actual <- eval(formula(model)[[2]], as.data.frame(data)) residuals <- predict(model, data) - actual 1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE)) } add_rsquare <- function(result_tbl) { result_tbl %>% mutate(rsquare_train = map2_dbl(fit, train, r_square), rsquare_test = map2_dbl(fit, test, r_square)) } # Data set d <- weather %>% select(visib, humid, precip, wind_dir) %>% drop_na() %>% sample_n(2000) # Set theme for plots theme_set(theme_minimal()) 

k-fold cross validation

results <- d %>% pipelearner(lm, visib ~ .) %>% learn_cvpairs(k = 10) %>% learn()results %>%add_rsquare() %>% select(cv_pairs.id, contains("rsquare")) %>% gather(source, rsquare, contains("rsquare")) %>%mutate(source = gsub("rsquare_", "", source)) %>% ggplot(aes(cv_pairs.id, rsquare, color = source)) + geom_point() + labs(x = "Fold", y = "R Squared") 

Learning curves

results <- d %>% pipelearner(lm, visib ~ .) %>% learn_curves(seq(.1, 1, .1)) %>% learn()results %>%add_rsquare() %>%select(train_p, contains("rsquare")) %>%gather(source, rsquare, contains("rsquare")) %>% mutate(source = gsub("rsquare_", "", source)) %>% ggplot(aes(train_p, rsquare, color = source)) + geom_line() + geom_point(size = 2) + labs(x = "Proportion of training data used", y = "R Squared") 

Grid Search

results <- d %>% pipelearner(rpart::rpart, visib ~ .,minsplit = c(2, 50, 100),cp = c(.005, .01, .1)) %>% learn() results %>% mutate(minsplit = map_dbl(params, ~ .$minsplit), cp = map_dbl(params, ~ .$cp)) %>% add_rsquare() %>% select(minsplit, cp, contains("rsquare")) %>% gather(source, rsquare, contains("rsquare")) %>% mutate(source = gsub("rsquare_", "", source), minsplit = paste("minsplit", minsplit, sep = "\n"), cp = paste("cp", cp, sep = "\n")) %>% ggplot(aes(source, rsquare, fill = source)) + geom_col() + facet_grid(minsplit ~ cp) + guides(fill = "none") + labs(x = NULL, y = "R Squared") 

Model comparisons

results <- d %>% pipelearner() %>% learn_models(c(lm, rpart::rpart, randomForest::randomForest),visib ~ .) %>% learn()results %>%add_rsquare() %>%select(model, contains("rsquare")) %>%gather(source, rsquare, contains("rsquare")) %>%mutate(source = gsub("rsquare_", "", source)) %>% ggplot(aes(model, rsquare, fill = source)) + geom_col(position = "dodge", size = .5) + labs(x = NULL, y = "R Squared") + coord_flip() 

Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors

转载于:https://www.cnblogs.com/payton/p/6251953.html

Easy machine learning pipelines with pipelearner: intro and call for contributors相关推荐

  1. 中科院计算所开源Easy Machine Learning:让机器学习应用开发简单快捷 By 机器之心2017年6月13日 13:05 今日,中科院计算所研究员徐君在微博上宣布「中科院计算所开源了

    中科院计算所开源Easy Machine Learning:让机器学习应用开发简单快捷 By 机器之心2017年6月13日 13:05 今日,中科院计算所研究员徐君在微博上宣布「中科院计算所开源了 E ...

  2. ML:MLOps系列讲解之《基于ML的软件的三个层次之02 Model: Machine Learning Pipelines——2.6 ML Model serialization forma》解读

    ML:MLOps系列讲解之<基于ML的软件的三个层次之02 Model: Machine Learning Pipelines--2.6 ML Model serialization forma ...

  3. ML:MLOps系列讲解之《基于ML的软件的三个层次之02 Model: Machine Learning Pipelines——2.5 Different forms of ML workfl》解读

    ML:MLOps系列讲解之<基于ML的软件的三个层次之02 Model: Machine Learning Pipelines--2.5 Different forms of ML workfl ...

  4. ML:MLOps系列讲解之《基于ML的软件的三个层次之02 Model: Machine Learning Pipelines 2.1~2.4》解读

    ML:MLOps系列讲解之<基于ML的软件的三个层次之02 Model: Machine Learning Pipelines 2.1~2.4>解读 目录 <基于ML的软件的三个层次 ...

  5. 【github】机器学习(Machine Learning)深度学习(Deep Learning)资料

    转自:https://github.com/ty4z2008/Qix/blob/master/dl.md# <Brief History of Machine Learning> 介绍:这 ...

  6. 机器学习(Machine Learning)深度学习(Deep Learning)资料汇总

    本文来源:https://github.com/ty4z2008/Qix/blob/master/dl.md 机器学习(Machine Learning)&深度学习(Deep Learning ...

  7. 机器学习----(Machine Learning)深度学习(Deep Learning)资料(Chapter 1)

    文章转至:作者:yf210yf  感谢您提供的资源 资料汇总的很多,转载一下也方便自己以后慢慢学习 注:机器学习资料篇目一共500条,篇目二开始更新 希望转载的朋友,你可以不用联系我.但是一定要保留原 ...

  8. 机器学习(Machine Learning)深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  9. 机器学习(Machine Learning)深度学习(Deep Learning)资料集合

    机器学习(Machine Learning)&深度学习(Deep Learning)资料 原文链接:https://github.com/ty4z2008/Qix/blob/master/dl ...

  10. 机器学习(Machine Learning)深度学习(Deep Learning)资料(Chapter 1

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

最新文章

  1. 推荐一些动作识别数据集
  2. 开源数据库中间件-MyCa初探与分片实践
  3. ASP.NET Core 框架本质学习
  4. MyBatis(延迟加载 缓存)
  5. mysql 数据仓库 元数据_数据仓库中的元数据管理
  6. mac利用vscode运行c语言程序,Mac下使用VScode编译配置C/C++程序详细图文教程
  7. python实现多人聊天论文_python网络编程 双人多人聊天
  8. 华为厉害了:已启动6G网络技术研究
  9. YAML_12 批量创建用户,分别设置用户组
  10. 计算机ensp项目无法运行,eNSP常见问题及解决办法
  11. 八点建议助您写出优雅的Java代码
  12. 值得铭记的爱情语录:海鸟跟鱼相爱,永远只是一场意外
  13. 计算机网络知识点汇总(考研用)
  14. 中科院阿里云联合发布11比特云接入超导量子计算服务
  15. c语言字符幂函数怎么编写,c语言幂函数(c语言中如何编写幂函数)
  16. python数据容器
  17. mysql数据中包含不间断空格(ascii值为194和160)解决办法
  18. 国家版权局称中国软件盗版率已大降
  19. ffmpeg合并(复用)音频和视频文件,组成mp4
  20. js jquery新窗口打开的几种方式

热门文章

  1. PAIP.ASP技术手册
  2. (转)走进Smart Beta的世界
  3. (转)互联网投顾平台的监管风险:和讯信息
  4. 云栖回顾|龙蜥论坛圆桌环节都有哪些精彩观点?
  5. 一个线上SQL死锁异常分析:深入了解事务和锁
  6. 机器学习笔记(二十九):决策树、信息熵
  7. c mysql 数据更新_MySQL数据更新
  8. 【优化覆盖】基于matlab萤火虫算法求解无线网络传感覆盖优化问题【含Matlab源码 1275期】
  9. 【物流选址】基于matlab佛洛依德算法求解物流选址问题【含Matlab源码 892期】
  10. 【TWVRP】基于matlab蚁群算法求解带时间窗的多中心车辆路径规划问题【含Matlab源码 113期】