h2o automl

H2O-AutoML (H2O - AutoML)

To use AutoML, start a new Jupyter notebook and follow the steps shown below.

要使用AutoML，请启动新的Jupyter笔记本并按照以下步骤操作。

导入AutoML (Importing AutoML)

First import H2O and AutoML package into the project using the following two statements −

首先使用以下两个语句将H2O和AutoML包导入项目：


import h2o
from h2o.automl import H2OAutoML

初始化H2O (Initialize H2O)

Initialize h2o using the following statement −

使用以下语句初始化h2o-


h2o.init()

You should see the cluster information on the screen as shown in the screenshot below −

您应该在屏幕上看到群集信息，如下面的屏幕快照所示-

加载数据中 (Loading Data)

We will use the same iris.csv dataset that you used earlier in this tutorial. Load the data using the following statement −

我们将使用与本教程前面使用的相同的iris.csv数据集。使用以下语句加载数据-


data = h2o.import_file('iris.csv')

准备数据集 (Preparing Dataset)

We need to decide on the features and the prediction columns. We use the same features and the predication column as in our earlier case. Set the features and the output column using the following two statements −

我们需要确定特征和预测列。我们使用与先前案例相同的功能和谓词列。使用以下两个语句设置功能部件和输出列：


features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
output = 'class'

Split the data in 80:20 ratio for training and testing −

以80:20的比例拆分数据以进行培训和测试-


train, test = data.split_frame(ratios=[0.8])

应用AutoML (Applying AutoML)

Now, we are all set for applying AutoML on our dataset. The AutoML will run for a fixed amount of time set by us and give us the optimized model. We set up the AutoML using the following statement −

现在，我们已经准备好将AutoML应用于我们的数据集。 AutoML将在我们设置的固定时间内运行，并为我们提供优化的模型。我们使用以下语句设置AutoML-


aml = H2OAutoML(max_models = 30, max_runtime_secs=300, seed = 1)

The first parameter specifies the number of models that we want to evaluate and compare.

第一个参数指定我们要评估和比较的模型数量。

The second parameter specifies the time for which the algorithm runs.

第二个参数指定算法运行的时间。

We now call the train method on the AutoML object as shown here −

我们现在在AutoML对象上调用train方法，如下所示：


aml.train(x = features, y = output, training_frame = train)

We specify the x as the features array that we created earlier, the y as the output variable to indicate the predicted value and the dataframe as train dataset.

我们将x指定为我们先前创建的特征数组，将y指定为输出变量以指示预测值，并将数据框指定为训练数据集。

Run the code, you will have to wait for 5 minutes (we set the max_runtime_secs to 300) until you get the following output −

运行代码，您将不得不等待5分钟(我们将max_runtime_secs设置为300)，直到获得以下输出-

打印排行榜 (Printing the Leaderboard)

When the AutoML processing completes, it creates a leaderboard ranking all the 30 algorithms that it has evaluated. To see the first 10 records of the leaderboard, use the following code −

AutoML处理完成后，它将创建一个排行榜，对已评估的所有30种算法进行排名。要查看排行榜的前10条记录，请使用以下代码-


lb = aml.leaderboard
lb.head()

Upon execution, the above code will generate the following output −

执行后，上面的代码将生成以下输出-

Clearly, the DeepLearning algorithm has got the maximum score.

显然，DeepLearning算法获得了最高分。

预测测试数据 (Predicting on Test Data)

Now, you have the models ranked, you can see the performance of the top-rated model on your test data. To do so, run the following code statement −

现在，您已经对模型进行了排名，您可以在测试数据上看到顶级模型的性能。为此，请运行以下代码语句-


preds = aml.predict(test)

The processing continues for a while and you will see the following output when it completes.

处理持续一会儿，完成后您将看到以下输出。

打印结果 (Printing Result)

Print the predicted result using the following statement −

使用以下语句打印预测结果-


print (preds)

Upon execution of the above statement, you will see the following result −

执行以上语句后，您将看到以下结果-

打印所有人的排名 (Printing the Ranking for All)

If you want to see the ranks of all the tested algorithms, run the following code statement −

如果要查看所有经过测试的算法的排名，请运行以下代码语句-


lb.head(rows = lb.nrows)

Upon execution of the above statement, the following output will be generated (partially shown) −

执行上述语句后，将生成以下输出(部分显示)-

结论 (Conclusion)

H2O provides an easy-to-use open source platform for applying different ML algorithms on a given dataset. It provides several statistical and ML algorithms including deep learning. During testing, you can fine tune the parameters to these algorithms. You can do so using command-line or the provided web-based interface called Flow. H2O also supports AutoML that provides the ranking amongst the several algorithms based on their performance. H2O also performs well on Big Data. This is definitely a boon for Data Scientist to apply the different Machine Learning models on their dataset and pick up the best one to meet their needs.

H2O提供了一个易于使用的开源平台，可以在给定的数据集上应用不同的ML算法。它提供了包括深度学习在内的几种统计和ML算法。在测试期间，您可以将参数微调为这些算法。您可以使用命令行或提供的名为Flow的基于Web的界面来执行此操作。 H2O还支持AutoML，后者可根据其性能在几种算法之间进行排名。 H2O在大数据上也表现出色。对于数据科学家来说，将不同的机器学习模型应用于其数据集并挑选出最能满足他们需求的模型无疑是一个福音。

翻译自: https://www.tutorialspoint.com/h2o/h2o_automl.htm

h2o automl