端到端机器学习

Prerequisite:

先决条件:

- Docker

-码头工人

- Jupyter Notebook

-Jupyter笔记本

- Python and Pip

-Python和Pip

- Java

-Java

- Maven

-Maven

- Lombok

-Lombok

Resource: https://github.com/adrian3ka/shared-article/tree/master/h2o-auto-ml

资源: https : //github.com/adrian3ka/shared-article/tree/master/h2o-auto-ml

Nowadays, demand for data scientist and analyst expert has outpaced the supply, despite the surge of the people entering the field. To answer this gap, we need some friendlies machine learning frameworks that can be used by non-experts user. Some machine learning framework like Tensorflow, H2O has made it easy for non-experts to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing models. As we want to remove this gap I would like to introduce a good concept called AutoML.

ñowadays,数据科学家和专家分析需求超出供应,尽管进入这一领域的人激增。 为了弥补这一差距,我们需要一些友好的机器学习框架,供非专家用户使用。 诸如Tensorflow,H2O之类的机器学习框架使非专业人员可以轻松地进行机器学习实验,而数据科学方面仍然需要相当多的知识和背景才能生成高性能模型。 当我们想消除这一差距时,我想介绍一个称为AutoML的好概念。

AutoML is an idea to automate the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. It will be automatically trained on collections of individual models to produce highly predictive ensemble models which, in most cases, will be the top performing models in the AutoML Leaderboard.

AutoML是一种使机器学习工作流程自动化的想法,其中包括在用户指定的时限内自动训练和调整许多模型。 它将在单个模型的集合上自动进行训练,以生成高度预测的集成模型,在大多数情况下,该模型将成为AutoML排行榜中性能最高的模型。

One of the available framework to achieve this purpose is H2O, it’s possible for non-AI users, and it’s also a friendly framework for the developer who didn’t have any previous experience to analyzing or developing a model. Before we move further, I want to define the goal first. The main goal is to train the model in the most common way for training a machine learning model in python language using jupyter notebook (if you want to collaborate with another scientist in the future it would be easier for them), and then for the software engineer part we want to deliver the model in the java language as the most used language for the large scale applications in the production. I choose java over any other languages because it one of the popular language on the industry, and it would be very relevant to the real world problem. I have already provide ready to used example for you to be convenience to follow this guide.

H2O是实现此目的的可用框架之一,对于非AI用户而言是可能的,并且对于以前没有任何分析或开发模型经验的开发人员来说,它也是一个友好的框架。 在继续之前,我想先定义目标。 主要目标是以最常见的方式训练模型,以便使用jupyter笔记本训练python语言的机器学习模型(如果您将来希望与另一位科学家合作,对他们来说会更容易),然后再使用软件工程师部分,我们希望以java语言交付模型,这是生产中大规模应用程序最常用的语言。 我选择java不是其他任何语言,是因为java是业界流行的语言之一,它与现实世界的问题非常相关。 我已经为您提供了现成的示例,以方便您遵循本指南。

First of all, we need to run the H2O in the docker that I already prepared for you, at the first line we want to make sure there is no docker container named h2o already exists in your computer / laptop. The next line is for running the docker container and open some ports to be available on your local machine.

首先,我们需要在已经为您准备的docker中运行H2O,在第一行中,我们要确保您的计算机/笔记本电脑中不存在名为h2o Docker容器。 下一行用于运行docker容器并打开一些端口以在本地计算机上可用。

docker container rm h2odocker run -ti --name=h2o -p 54321:54321 -p 8888:8888 adrian3ka/h2o:0.0.1 /bin/bash

Don’t expect anything yet, because we only the running and entering the docker machine and didn’t run anything yet. The command below will be launch H2O applications by running the jar directly:

别指望什么,因为我们只在运行和进入docker机器,什么都还没运行。 以下命令将通过直接运行jar启动H2O应用程序:

cd /optjava -Xmx1g -jar h2o.jar

I just wanted to let you know that we also could develop the model via H2O-Flow notebook on website view, but we will use jupyter notebook on this tutorial, as it is the most used tools today. You could access H2O-Flow notebook through http://localhost:54321 as we already exposed the port from the command above -p 54321:54321.

我只是想告诉您,我们也可以通过H2O-Flow笔记本在网站视图上开发模型,但是我们将在本教程中使用jupyter笔记本,因为它是当今使用最多的工具。 您可以通过http://localhost:54321访问H2O-Flow notebook ,因为我们已经在-p 54321:54321上面的命令中公开了该端口。

Now we will try to launch the jupyter notebook, first we need to launch another docker terminal:

现在我们将尝试启动jupyter笔记本,首先我们需要启动另一个docker终端:

export H2O_CONTAINER_ID=$(docker ps -aqf "name=h2o")docker exec -it $H2O_CONTAINER_ID bash

Inside the docker terminal, we would try to initiate the jupyter notebook, but we should get the docker IP information by running some command that already explained below. Here are the command:

在docker终端内部,我们将尝试启动jupyter笔记本,但是我们应该通过运行以下已经说明的命令来获取docker IP信息。 这是命令:

cd ~/h2o # we would try to move to the h2o folder that already prepared for this examplevirtualenv h2o_venvsource h2o_venv/bin/activate# open new terminal to check the docker local docker IPexport H2O_CONTAINER_ID=$(docker ps -aqf "name=h2o")docker inspect $H2O_CONTAINER_ID | grep "\"IPAddress\"" -m1# use ip from the output, my output are: 172.17.0.3jupyter notebook --ip=172.17.0.3 --port=8888 --allow-root

As for now you could see there will be link and token provided on the terminal, we could access it on the web browser. Although we could run jupyter notebook on the web browser, but we didn’t have any code to be run on the notebook. So before we move to the main part, training the model from jupyter notebook we need to open a new terminal to copy the code and the data training to the located folder on the /root/h2o/h2o_venv inside the docker:

现在,您可以看到终端上提供了链接和令牌,我们可以在Web浏览器上访问它。 尽管我们可以在网络浏览器上运行jupyter笔记本,但是我们没有在笔记本上运行任何代码。 因此,在转到主体部分之前,从jupyter笔记本中训练模型,我们需要打开一个新终端以将代码和数据训练复制到/root/h2o/h2o_venv内部/root/h2o/h2o_venv上的定位文件夹中:

export H2O_CONTAINER_ID=$(docker ps -aqf "name=h2o")docker cp automl_binary_classification.ipynb $H2O_CONTAINER_ID:/root/h2o/h2o_venv/automl_binary_classification.ipynbdocker cp product_backorders.csv $H2O_CONTAINER_ID:/root/h2o/h2o_venv/product_backorders.csv

Now, open the jupyter notebook from the website based on the information from the terminal based on the link you see earlier from the docker terminal. After that try to click the folder called h20_venv and then select automl_binary_classification.ipynb try to double-click it. From there you could see the code, and you should run it 1 by 1 by pressing “>Run” button.

现在,根据您之前从docker终端看到的链接,根据终端提供的信息从网站打开jupyter笔记本。 之后,尝试单击名为h20_venv的文件夹,然后选择automl_binary_classification.ipynb尝试双击它。 从那里您可以看到代码,并且应该按“>运行”按钮逐一运行它。

Python代码说明 (Python Code Explanation)

First, I would like to highlight some important part on the jupyter notebook.

首先,我想强调一下jupyter笔记本上的一些重要部分。

import h2oimport pandasfrom h2o.automl import H2OAutoMLh2o.init()

It would try to check if any h2o instance running, if its not it will try to run the h2o. It would produce the h2o instance information about the uptime, timezone, version, etc.

它将尝试检查是否有任何h2o实例在运行,如果没有运行,它将尝试运行h2o。 它将生成有关正常运行时间,时区,版本等的h2o实例信息。

aml = H2OAutoML(max_models = 10, seed = 1)aml.train(x = x, y = y, training_frame = df)

The next important part is when we try to train the data using the specified parameters. First max_models is an argument that specified the number of individual models to be trained for. If you make it the number bigger it would a wide range to train many models to get the optimum one, but if you lower it you could achieve the training time. Be wise to make use of it. The train method will try to train the model with the available algorithm already provided by h2o framework (it will be chosen by default by the system).

下一个重要的部分是当我们尝试使用指定的参数训练数据时。 第一个max_models是一个参数,用于指定要训练的单个模型的数量。 如果将其增大,则可以训练许多模型以获得最佳模型,但是如果减小它,则可以达到训练时间。 明智地利用它。 train方法将尝试使用h2o框架已提供的可用算法来训练模型(系统默认情况下会选择它)。

%matplotlib inlinemetalearner.std_coef_plot()

It will display the models’ effectiveness based on their score and will be very helpful to read. We will use the model with the highest score. The greater the score the better it is, and the model with the highest score will be elected as the leader model.

它将基于模型的得分显示模型的有效性,并且对阅读非常有帮助。 我们将使用得分最高的模型。 得分越高越好,得分最高的模型将被选为领导者模型。

h2o.save_model(aml.leader, path = "./product_backorders_model_bin")aml.leader.download_mojo(path = "./top_model.zip")

There are two ways to save the leader model those are the binary format and MOJO format. If you’re taking your leader model to production, then I would suggest the MOJO format since it’s optimized for production use. We should define the path including the file name on the path parameter for any method that we want to use.

保存领导者模型有两种方法,即二进制格式和MOJO格式。 如果要将领导者模型投入生产,那么我建议使用MOJO格式,因为它已针对生产用途进行了优化。 对于我们要使用的任何方法,我们都应该在path参数上定义包含文件名的path

validation_df[validation_df["went_on_backorder"] == "Yes"].head()validation_df[validation_df["went_on_backorder"] == "No"].head()preds = aml.predict(test_frame)preds.head()

Before we proceed to deploying the model we should check whether our model properly trained or not, test_frame variable contains the data came from the part of the data set to check whether our model could predict correctly. If you follow the guide on the notebook carefully we split the raw data into 2 part the first part is validation data set, and the second part is the data set for the training purpose. We pick the data from the first row and 10th row from the yes on went_on_backorder, and we pick the first row from the no. As we could see we got the expected result, 2 yes on the first two row and no on the last row. So we could say that our models trained correctly. It would display the probability about how "sure" the model about predicting the result, the range is between 0 and 1. The greater the number it says that the model is very "sure" about the prediction.

在继续部署模型之前,我们应该检查模型是否经过正确训练, test_frame变量包含的数据来自数据集的一部分,以检查模型是否可以正确预测。 如果您认真遵循笔记本上的指南,我们会将原始数据分为两部分,第一部分是验证数据集,第二部分是用于培训目的的数据集。 我们从went_on_backorderyes选择第一行和第十行数据,然后从no选择第一行。 正如我们可以看到我们得到了预期的结果,2个yes前两个行no在最后一排。 因此可以说我们的模型训练正确。 它会显示有关如何“确保”模型预测结果的概率,范围在0到1之间。数字越大,表示模型对预测非常“确定”。

predict     No             YesYes         0.424039       0.575961Yes         0.0469849     0.953015No          0.983258      0.0167421

部署数据模型 (Deploying Data Model)

If you remember we already export the MOJO model to be used on production system. We would like to use java as the predictor as we already discussed earlier. We would like try to get h2o container id and copy the folder into the running container. The folder model-predictor contains java code using maven to manage the dependency. You could copy it by executing the command below:

如果您还记得,我们已经导出了MOJO模型以用于生产系统。 如前所述,我们希望使用java作为预测变量。 我们想尝试获取h2o容器ID并将文件夹复制到正在运行的容器中。 文件夹model-predictor包含使用maven管理依赖关系的Java代码。 您可以通过执行以下命令来复制它:

cd model-predictorexport H2O_CONTAINER_ID=$(docker ps -aqf "name=h2o")docker cp . $H2O_CONTAINER_ID:/root/h2o/h2o_venv/model-predictor

Java代码说明 (Java Code Explanation)

First take a look at the java file called Main.java. We would like to load the model already defined earlier, the model placed one level up from the java project, so we need to load it by using ... We would like to load the data that we already trained based on the data training on the Python code (if you remember we done it earlier).

首先看一下名为Main.java的java文件。 我们想加载之前已经定义的模型,该模型比java项目高了一个级别,因此我们需要使用..来加载它。 我们希望基于Python代码上的数据训练来加载已经训练的数据(如果您还记得我们之前做过的话)。

EasyPredictModelWrapper model = new EasyPredictModelWrapper(MojoModel.load("../top_model.zip"));List<ReorderDataModel> reorderDataModelList = Arrays  .asList(    reorderDataModel1, reorderDataModel10, notReorderDataModel1  );

After that we would like to iterate all data by using forEach and it would predict using BinomialModelPrediction. BinomialModelPrediction is the interface for predicting the yes no result based on the trained model also calculating the probability.

之后,我们想使用forEach迭代所有数据,并使用BinomialModelPrediction预测。 BinomialModelPrediction是用于基于训练后的模型并计算概率来预测yes no结果的接口。

reorderDataModelList.forEach(reorderDataModel -> {  RowData row = new RowData();  row.put("sku", reorderDataModel.sku);  ... # some code will be not shown for the readability  BinomialModelPrediction p = null;  try {    p = model.predictBinomial(row);  } catch (PredictException e) {    e.printStackTrace();  }  System.out.println("User will reorder (1=yes; 0=no): " + p.label);  System.out.print("Class probabilities: ");  ...});

Now we would like to build and execute the java application that we already copy earlier into the docker container by running the command below:

现在,我们想通过运行以下命令来构建并执行我们先前已经复制到Docker容器中的Java应用程序:

export H2O_CONTAINER_ID=$(docker ps -aqf "name=h2o")docker exec -it $H2O_CONTAINER_ID bashcd ~/h2o/h2o_venv/model-predictormvn packagemvn exec:java -Dexec.mainClass="com.example.Main"

After you execute the program, you will see the output below:

执行该程序后,您将看到以下输出:

User will reorder (1=yes; 0=no): YesClass probabilities: 0.4240393668884196,0.5759606331115804User will reorder (1=yes; 0=no): YesClass probabilities: 0.046984898740402126,0.9530151012595979User will reorder (1=yes; 0=no): NoClass probabilities: 0.9832578692993112,0.016742130700688782

Finally, we could train on Python using jupyter notebook and deploy our model into Java, and you could see the predicted result, and the probability is absolutely the same between the model deployed on the python and deployed on Java. So, we successfully done our objective.

最后,我们可以使用jupyter notebook在Python上进行训练并将模型部署到Java中,您可以看到预期的结果,并且部署在python和Java上的模型之间的概率绝对相同。 因此,我们成功地实现了目标。

If you have any problem running this tutorial, or you want to have online course class, collaboration or anything else you could contact me on eekkaaadrian@gmail.com.

如果您在运行本教程时遇到任何问题,或者想进行在线课程,协作或其他任何事情,都可以通过eekkaaadrian@gmail.com与我联系

Reference:

参考:

  • https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html accessed at 6th August 2020.

    https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html于2020年8月6日访问。

  • https://appdoc.app/artifact/ai.h2o/h2o-genmodel/3.2.0.9/hex/genmodel/easy/prediction/BinomialModelPrediction.html accessed at 30th August 2020.

    https://appdoc.app/artifact/ai.h2o/h2o-genmodel/3.2.0.9/hex/genmodel/easy/prediction/BinomialModelPrediction.html于2020年8月30日访问。

翻译自: https://medium.com/analytics-vidhya/end-to-end-automated-machine-learning-process-using-automl-504dcdb2415b

端到端机器学习


http://www.taodudu.cc/news/show-4399444.html

相关文章:

  • suggest详解
  • 自动测试如何选择自动化测试框架_机器擅长回归测试,人类善于寻找Bug _Pekka Klärck
  • 【转】Robot Framework作者建议如何选择自动化测试框架
  • 自动化测试===【转】Robot Framework作者建议如何选择自动化测试框架
  • 自动化运维 -- 02 Ansible
  • 定制化centos7
  • jenkins自动化_通过Jenkins自动化PSR合规性
  • Robot Framework作者建议如何选择自动化测试框架
  • 自动化测试遇到的难题_自动化内容难题
  • 自动化面试问题及答案_普遍的基本收入是自动化的答案
  • Robotframework自动化测试框架
  • python组件化软件策划_Vue组件化开发
  • 自动化测试——回顾与展望
  • ubuntu18使用preseed文件定制ISO镜像实现自动化安装
  • 自动化测试 | Selenium自动化测试框架,实战遇到的坑都在这了,玩转自动化测试
  • 各类文件的文件头标志[转]
  • 文件头格式对应
  • 各类文件文件头数据
  • 2006年江苏专转本计算机试卷答案,2006年度江苏省普通高校专转本计算机试卷.doc...
  • 全差分运算放大器ADA4930的分析(1)
  • 全差分运算放大器ADA4930的分析(2)
  • 如何提升幽默感?
  • 幽默感七个技巧_培养幽默感的16种方法
  • 如何提升会议体验感和氛围感?
  • 计算机算出幽默的公式,幽默(二)段子的基本公式
  • 幽默计算
  • 幽默的本质
  • 教你如何培养幽默感
  • 培养幽默感
  • 幽默感,其实是个高级货

端到端机器学习_使用automl进行端到端的自动化机器学习过程相关推荐

  1. 图神经网络(二)GCN的性质(2)GCN能够对图数据进行端对端学习

    图神经网络(二)GCN的性质(2)GCN能够对图数据进行端对端学习  近几年,随着深度学习的发展,端对端学习变得越来越重要,人们普遍认为,深度学习的成功离不开端对端学习的作用机制.端对端学习实现了一种 ...

  2. 加密界又一响声:WhatsApp宣布对所有通讯信息进行端到端加密

    Facebook旗下WhatsApp通讯服务日前宣布,使用最新版本WhatsApp的通讯消息都将进行端到端的加密.此前,WhatsApp提供对某些类型消息的加密,但现在,WhatsApp超过10亿用户 ...

  3. WhatsApp宣布对所有通讯信息进行端到端加密

    Facebook旗下WhatsApp通讯服务日前宣布,使用最新版本WhatsApp的通讯消息都将进行端到端的加密.此前,WhatsApp提供对某些类型消息的加密,但现在,WhatsApp超过10亿用户 ...

  4. UCSB微软提出VIOLET,用Masked Visual-token Modeling进行端到端的视频语言学习!性能SOTA...

    关注公众号,发现CV技术之美 ▊ 写在前面 视频语言(VidL)建模的一个巨大挑战在于,从图像/视频理解模型提取的固定视频表示与下游VidL数据之间的脱节 .最近的研究试图通过端到端的训练来解决这个问 ...

  5. CLIP再创辉煌!西南交大MSRA提出CLIP4Clip,进行端到端的视频文本检索!

    关注公众号,发现CV技术之美 ▊ 写在前面 视频文本检索在多模态研究中起着至关重要的作用,在许多实际应用中得到了广泛的使用.CLIP(对比语言图像预训练)是一种图像语言预训练模型,它展示了从网络收集的 ...

  6. 前沿 | 使用Transformers进行端到端目标识别

    点上方蓝字计算机视觉联盟获取更多干货 在右上方 ··· 设为星标 ★,与你不见不散 仅作学术分享,不代表本公众号立场,侵权联系删除 AI博士笔记系列推荐 周志华<机器学习>手推笔记正式开源 ...

  7. 使用Nightwatch进行端到端测试

    Nightwatch是一套新近问世的基于Node.js的验收测试框架,使用Selenium WebDriver API以将Web应用测试自动化.它提供了简单的语法,支持使用JavaScript和CSS ...

  8. 回顾 | 使用Visual Studio Code进行端到端应用程序开发

    点击蓝字关注我们,获得更多课程吧~ 微软Reactor 为帮助广开发者,技术爱好者,更好的学习 Python,数据科学,机器学习,AI,区块链等技术,将每周三到周六,组织 3~5 场线上分享活动.欢迎 ...

  9. SAP Spartacus 如何使用 cypress 进行端到端自动化测试

    进入 spa root 目录,npm install, 然后 ng serve 启动应用: 进入 projects/storefrontapp-e2e 目录,先 npm install,然后 yarn ...

最新文章

  1. css -- 运用@media实现网页自适应中的几个关键分辨率
  2. cordova编译报错:Execution failed for task ':processDebugResources'
  3. linux ping 报错 sendmsg: Operation not permitted
  4. 搭建Mysql-proxy实现主从同步读写分离
  5. 敏捷开发回顾:使团队更强大pdf
  6. 纪事地图和Yahoo Cloud服务基准
  7. ubuntu12.04装机后设置
  8. app/bootstrap.php.cache : failed to open stream: No such file or directory
  9. Centos6.5下通过shell脚本快速安装samba服务器
  10. oracle和mysql使用区别大吗_Oracle和MySQL在使用上的区别
  11. CTFHUB Web题解记录(信息泄露、弱口令部分)
  12. malloc函数用法
  13. 恩格玛密码机的工作原理
  14. 内网安全-隧道穿透漫游(二)
  15. 百度地图画扇形区域覆盖(大小方向颜色透明图可调)
  16. 等额本金VS等额本息
  17. 网页加载,只显示文字,不显示图片。(原因解决方法)
  18. matlab 归一化功率谱,求大神指点如何画归一化功率谱的图像啊
  19. 《Android 开发入门与实战(第二版)》——导读
  20. would用法归纳(最全)

热门文章

  1. 选C++还是选Java,过来人给你一个建议
  2. AndroidO Notification横幅通知(HangUp Notification)
  3. 解决mini_httpd_v1.30在使用http post请求出现 socket hang up的问题
  4. scratch 学习网址:
  5. Altium阴阳拼板教程
  6. python+requests接口测试
  7. 索尼Alpha系列相机通过照片读取快门次数的在线工具
  8. 热更新总结--冷启动热更新
  9. 书籍 -- 《高性能MySQL》持续更新中(四)
  10. 选购地磁传感器应避免哪些坑