By Michael Chau, Anthony Yu, Richard Liaw

由 迈克尔洲 ， 安东尼宇 ， 理查德·廖

Everyone knows about Scikit-Learn — it’s a staple for data scientists, offering dozens of easy-to-use machine learning algorithms. It also provides two out-of-the-box techniques to address hyperparameter tuning: Grid Search (GridSearchCV) and Random Search (RandomizedSearchCV).

每个人都知道Scikit-Learn，它是数据科学家的必备品，它提供了数十种易于使用的机器学习算法。它还提供了两种开箱即用的技术来解决超参数调整：网格搜索(GridSearchCV)和随机搜索(RandomizedSearchCV)。

Though effective, both techniques are brute-force approaches to finding the right hyperparameter configurations, which is an expensive and time-consuming process!

尽管有效，但这两种技术都是寻找正确的超参数配置的蛮力方法，这是一个昂贵且耗时的过程！

如果您想加快此过程怎么办？ (What if you wanted to speed up this process?)

In this blog post, we introduce tune-sklearn, which makes it easier to leverage these new algorithms while staying in the Scikit-Learn API. Tune-sklearn is a drop-in replacement for Scikit-Learn’s model selection module with cutting edge hyperparameter tuning techniques (bayesian optimization, early stopping, distributed execution) — these techniques provide significant speedups over grid search and random search!

在此博客文章中，我们介绍tune-sklearn ，这使得在保留Scikit-Learn API的同时更容易利用这些新算法。 Tune-sklearn使用尖端的超参数调整技术(贝叶斯优化，提前停止，分布式执行) 替代了Scikit-Learn的模型选择模块，这些技术大大提高了网格搜索和随机搜索的速度！

Here’s what tune-sklearn has to offer:

以下是tune-sklearn提供的功能：

Consistency with Scikit-Learn API: tune-sklearn is a drop-in replacement for GridSearchCV and RandomizedSearchCV, so you only need to change less than 5 lines in a standard Scikit-Learn script to use the API.

与Scikit-Learn API的一致性： tune-sklearn是GridSearchCV和RandomizedSearchCV的直接替代，因此您只需在标准Scikit-Learn脚本中更改少于5行即可使用该API。
Modern hyperparameter tuning techniques: tune-sklearn allows you to easily leverage Bayesian Optimization, HyperBand, and other optimization techniques by simply toggling a few parameters.

现代超参数调整技术： tune-sklearn使您可以通过简单地切换几个参数来轻松利用贝叶斯优化，HyperBand和其他优化技术。
Framework support: tune-sklearn is used primarily for tuning Scikit-Learn models, but it also supports and provides examples for many other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch), KerasClassifiers (Keras), and XGBoostClassifiers (XGBoost).

框架支持： tune-sklearn主要用于调整Scikit-Learn模型，但它也支持许多Scikit-Learn包装器，例如Skorch(Pytorch)，KerasClassifiers(Keras)和XGBoostClassifiers(XGBoost)。
Scale up: Tune-sklearn leverages Ray Tune, a library for distributed hyperparameter tuning, to efficiently and transparently parallelize cross validation on multiple cores and even multiple machines.

扩大规模： Tune-sklearn利用Ray Tune (一个用于分布式超参数调整的库)来高效透明地并行化多核甚至多台机器上的交叉验证。

A sample of the frameworks supported by tune-sklearn.

Tune-sklearn is also fast. To see this, we benchmark tune-sklearn (with early stopping enabled) against native Scikit-Learn on a standard hyperparameter sweep. In our benchmarks we can see significant performance differences on both an average laptop and a large workstation of 48 CPU cores.

Tune-sklearn也很快。为此，我们在标准超参数扫描中将tune-sklearn (启用了提前停止)与本机Scikit-Learn进行了基准测试。在我们的基准测试中，我们可以看到普通笔记本电脑和具有48个CPU内核的大型工作站在性能上都存在显着差异。

For the larger benchmark 48-core computer, Scikit-Learn took 20 minutes for a 40,000-size dataset searching over 75 hyperparameter sets. Tune-sklearn took a mere 3 and a half minutes — sacrificing minimal accuracy.*

对于更大的基准48核计算机，Scikit-Learn用了20分钟的时间来搜索40,000个数据集，并搜索了75个超参数集。 Tune-sklearn仅花费了3分半钟-牺牲了最低的准确性。*

On left: On a personal dual core i5 8GB RAM laptop using a parameter grid of 6 configurations. On right: On a large 48 core 250 GB RAM computer using a parameter grid of 75 configurations.

* Note: For smaller datasets (10,000 or fewer data points), there may be a sacrifice in accuracy when attempting to fit with early stopping. We don’t anticipate this to make a difference for users as the library is intended to speed up large training tasks with large datasets.

*注意：对于较小的数据集(10,000个或更少的数据点)，在尝试适应早期停止时可能会牺牲准确性。 我们预计这不会对用户产生任何影响，因为该库旨在加快使用大型数据集的大型培训任务的速度。

简单的60秒演练 (Simple 60 second Walkthrough)

Let’s take a look at how it all works.

让我们看一下它们的工作原理。

Run pip install tune-sklearn ray[tune] or pip install tune-sklearn "ray[tune]" to get started with our example code below.

运行pip install tune-sklearn ray[tune]或pip install tune-sklearn "ray[tune]"以开始下面的示例代码。

Hyperparam set 2 is a set of unpromising hyperparameters that would be detected by tune’s early stopping mechanisms, and stopped early to avoid wasting training time and resources.

TuneGridSearchCV示例 (TuneGridSearchCV Example)

To start out, it’s as easy as changing our import statement to get Tune’s grid search cross validation interface:

首先，就像更改导入语句以获取Tune的网格搜索交叉验证界面一样简单：

And from there, we would proceed just like how we would in Scikit-Learn’s interface! Let’s use a “dummy” custom classification dataset and an SGDClassifier to classify the data.

从那里开始，我们将像在Scikit-Learn界面中一样进行操作！让我们使用“虚拟”自定义分类数据集和SGDClassifier对数据进行分类。

We choose the SGDClassifier because it has a partial_fit API, which enables it to stop fitting to the data for a certain hyperparameter configuration. If the estimator does not support early stopping, we would fall back to a parallel grid search.

我们选择SGDClassifier是因为它具有partial_fit API，这使它可以停止拟合特定超参数配置的数据。如果估算器不支持提早停止，我们将退回到并行网格搜索。

As you can see, the setup here is exactly how you would do it for Scikit-Learn! Now, let’s try fitting a model.

如您所见，此处的设置正是您为Scikit-Learn所做的设置！现在，让我们尝试拟合模型。

Note the slight differences we introduced above:

请注意我们上面介绍的细微差别：

a new early_stopping variable, and

一个新的early_stopping变量，以及
a specification of max_iters parameter

max_iters参数的规范

The early_stopping determines when to stop early — MedianStoppingRule is a great default but see Tune’s documentation on schedulers here for a full list to choose from. max_iters is the maximum number of iterations a given hyperparameter set could run for; it may run for fewer iterations if it is early stopped.

early_stopping决定了何时提前停止-MedianStoppingRule是一个很好的默认设置，但是请参阅此处有关调度程序的Tune文档，以获取完整列表。 max_iters是给定超参数集可以运行的最大迭代次数；如果它提前停止，它可能会运行较少的迭代。

Try running this compared to the GridSearchCV equivalent.

尝试将其与GridSearchCV等效运行。

TuneSearchCV贝叶斯优化示例 (TuneSearchCV Bayesian Optimization Example)

Other than the grid search interface, tune-sklearn also provides an interface, TuneSearchCV, for sampling from distributions of hyperparameters.

除了网格搜索界面之外， tune-sklearn还提供了一个接口TuneSearchCV，用于从超参数分布中进行采样。

In addition, you can easily enable Bayesian optimization over the distributions in TuneSearchCV in only a few lines of code changes.

此外，您只需更改几行代码即可轻松地对TuneSearchCV中的发行版启用贝叶斯优化。

Run pip install scikit-optimize to try out this example:

运行pip install scikit-optimize尝试以下示例：

Lines 17, 18, and 26 are the only lines of code changed to enable Bayesian optimization

第17、18和26行是更改的仅有几行代码，以启用贝叶斯优化

As you can see, it’s very simple to integrate tune-sklearn into existing code. Check out more detailed examples and get started with tune-sklearn here and let us know what you think! Also take a look at Ray’s replacement for joblib, which allows users to parallelize training over multiple nodes, not just one node, further speeding up training.

如您所见，将tune-sklearn集成到现有代码中非常简单。在此处查看更详细的示例并开始使用tune-sklearn ，让我们知道您的想法！还可以看看Ray 替代 joblib的方法，它可以使用户在多个节点(而不仅仅是一个节点)上并行进行训练，从而进一步加快了训练速度。

文档和示例 (Documentation and Examples)

Documentation*

文档 *
Example: Skorch with tune-sklearn

示例：带有tune-sklearn的Skorch
Example: Scikit-Learn Pipelines with tune-sklearn

示例：使用tune-sklearn的Scikit-Learn管道
Example: XGBoost with tune-sklearn

示例：带有tune-sklearn的XGBoost
Example: KerasClassifier with tune-sklearn

示例：带有tune-sklearn的KerasClassifier
Example: LightGBM with tune-sklearn

示例： LightGBM和tune-sklearn

*Note: importing from ray.tune as shown in the linked documentation is available only on the nightly Ray wheels and will be available on pip soon

*注意： 如链接文档中所示， 从 ray.tune 导入 仅在每晚的Ray轮上可用，并且很快将在pip上可用

翻译自: https://medium.com/@michaelchau_99485/5x-faster-scikit-learn-parameter-tuning-in-5-lines-of-code-be6bdd21833c

查看全文

http://www.taodudu.cc/news/show-863675.html

tensorflow 多人_使用TensorFlow2.x进行实时多人2D姿势估计
keras构建卷积神经网络_在Keras中构建，加载和保存卷积神经网络
深度学习背后的数学_深度学习背后的简单数学
深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第1部分
单光子探测技术应用_我如何最终在光学/光子学应用程序中使用机器学习作为博士学位
基于深度学习的病理_组织病理学的深度学习（第二部分）
ai无法启动产品_启动AI启动的三个关键教训
达尔文进化奖_使用Kydavra GeneticAlgorithmSelector将达尔文进化应用于特征选择
变异函数 python_使用Python进行变异测试
信号处理深度学习机器学习_机器学习与信号处理
PinnerSage模型
零信任模型_关于信任模型
乐器演奏_深度强化学习代理演奏的蛇
深度学习模型建立过程_所有深度学习都是统计模型的建立
使用TensorFlow进行鬼写
使用OpenCV和Python从图像中提取形状
NLP的特征工程
无监督学习 k-means_无监督学习-第1部分
keras时间序列数据预测_使用Keras的时间序列数据中的异常检测
端口停止使用_我停止使用
opencv 分割边界_电影观众：场景边界分割
监督学习无监督学习_无监督学习简介
kusto使用_Python查找具有数据重复问题的Kusto表
使用GridSearchCV和RandomizedSearchCV进行超参数调整
rust面向对象_面向初学者的Rust操作员综合教程
深度学习术语_您应该意识到这些（通用）深度学习术语和术语
问题解决方案_问题
airflow使用_使用AirFlow，SAS Viya和Docker像Pro一样自动化ML模型
迁移学习 nlp_NLP的发展-第3部分-使用ULMFit进行迁移学习
情感分析朴素贝叶斯_朴素贝叶斯推文的情感分析

5行代码可实现5倍Scikit-Learn参数调整的更快速度相关推荐

3行代码，Python数据预处理提速6倍！（附链接）
来源:新智元本文约2600字,建议阅读8分钟. 本文介绍了仅需3行代码,将Python数据处理速度提升2~6倍的简单方法. Python是所有机器学习的首选编程语言.它易于使用,并拥有许多很棒的库, ...
3行代码，Python数据预处理提速6倍
在 Python 中,我们可以找到原生的并行化运算指令.本文可以教你仅使用 3 行代码,大大加快数据预处理的速度. Python 是机器学习领域内的首选编程语言,它易于使用,也有很多出色的库来帮助你更 ...
3行代码给你的python提速4倍！
Python绝对是处理数据或者把重复任务自动化的绝佳编程语言.要抓取网页日志?或者要调整一百万张图片?总有对应的Python库让你轻松完成任务. 然而,Python的运营速度一直饱受诟病.默认状态下, ...
有轻功：用3行代码让Python数据处理脚本获得4倍提速
Python是一门非常适合处理数据和自动化完成重复性工作的编程语言,我们在用数据训练机器学习模型之前,通常都需要对数据进行预处理,而Python就非常适合完成这项工作,比如需要重新调整几十万张图像的尺 ...
python1000行代码_用好这3行代码，可以让你的Python脚本速度提升5倍！
Python是一门非常适合处理数据和自动化完成重复性工作的编程语言,我们在用数据训练机器学习模型之前,通常都需要对数据进行预处理,而Python就非常适合完成这项工作,比如需要重新调整几十万张图像的尺 ...
只用3行代码，让Python提速4倍！最强辅助
Python是一门非常适合处理数据和自动化完成重复性工作的编程语言.我们在用数据训练机器学习模型之前,通常都需要对数据进行预处理,而Python就非常适合完成这项工作,比如需要重新调整几十万张图像的尺 ...
从TdEngine20行代码引发的风波，看10倍程序员与普通程序员的差距
上周笔者的一篇博客这位创造Github冠军项目的老男人,堪称10倍程序员本尊对于TdEngine的建立过程及其创始人陶建辉老师进行了介绍,并对于TdEngine团队在github上开源的一段consu ...
10 行代码玩转 NumPy！
作者 | 天元浪子来源 | Python作业辅导员 NumPy也可以画图吗?当然!NumPy不仅可以画,还可以画得更好.画得更快!比如下面这幅画,只需要10行代码就可以画出来.若能整明白这10行代码 ...
20行代码发一篇NeurIPS：梯度共享已经不安全了
整理 | 夕颜,Jane 出品 | AI科技大本营(ID:rgznai100) [导读]12 月 8 日-14 日,NeurIPS 2019 在加拿大温哥华举行,和往常一样,今年大会吸引了数万名专家参 ...

5行代码可实现5倍Scikit-Learn参数调整的更快速度

如果您想加快此过程怎么办？ (What if you wanted to speed up this process?)

简单的60秒演练 (Simple 60 second Walkthrough)

TuneGridSearchCV示例 (TuneGridSearchCV Example)

TuneSearchCV贝叶斯优化示例 (TuneSearchCV Bayesian Optimization Example)

文档和示例 (Documentation and Examples)

相关文章：

5行代码可实现5倍Scikit-Learn参数调整的更快速度相关推荐

最新文章

热门文章