https://scikit-learn.org/stable/auto_examples/model_selection/plot_randomized_search.html#sphx-glr-auto-examples-model-selection-plot-randomized-search-py

超参数优化的方法引自维基百科

Hyperparameter optimization

From Wikipedia, the free encyclopedia

Jump to navigationJump to search

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned.

The same kind of machine learning model can require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can optimally solve the machine learning problem. Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data.[1] The objective function takes a tuple of hyperparameters and returns the associated loss.[1] Cross-validation is often used to estimate this generalization performance.[2]

1Approaches
- 1.1Grid search
- 1.2Random search
- 1.3Bayesian optimization
- 1.4Gradient-based optimization
- 1.5Evolutionary optimization
- 1.6Population-based
- 1.7Others
2Open-source software
- 2.1Grid search
- 2.2Random search
- 2.3Bayesian
- 2.4Gradient-based optimization
- 2.5Evolutionary
- 2.6Other
3Commercial services
4See also
5References

Approaches[edit]

Grid search[edit]

The traditional way of performing hyperparameter optimization has been grid search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set[3] or evaluation on a held-out validation set.[4]

Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search.

For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say

{\displaystyle C\in \{10,100,1000\}}

{\displaystyle \gamma \in \{0.1,0.2,0.5,1.0\}}

Grid search then trains an SVM with each pair (C, γ) in the Cartesian product of these two sets and evaluates their performance on a held-out validation set (or by internal cross-validation on the training set, in which case multiple SVMs are trained per pair). Finally, the grid search algorithm outputs the settings that achieved the highest score in the validation procedure.

Grid search suffers from the curse of dimensionality, but is often embarrassingly parallel because the hyperparameter settings it evaluates are typically independent of each other.[2]

Random search[edit]

Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm.[2] In this case, the optimization problem is said to have a low intrinsic dimensionality.[5] Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample.

Bayesian optimization[edit]

Main article: Bayesian optimization

Bayesian optimization is a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization, aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters expected close to the optimum). In practice, Bayesian optimization has been shown[6][7][8][9] to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run.

Gradient-based optimization[edit]

For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks.[10] Since then, these methods have been extended to other models such as support vector machines[11] or logistic regression.[12]

A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using automatic differentiation.[13][14] [15]

Evolutionary optimization[edit]

Main article: Evolutionary algorithm

Evolutionary optimization is a methodology for the global optimization of noisy black-box functions. In hyperparameter optimization, evolutionary optimization uses evolutionary algorithms to search the space of hyperparameters for a given algorithm.[7] Evolutionary hyperparameter optimization follows a process inspired by the biological concept of evolution:

Create an initial population of random solutions (i.e., randomly generate tuples of hyperparameters, typically 100+)
Evaluate the hyperparameters tuples and acquire their fitness function (e.g., 10-fold cross-validation accuracy of the machine learning algorithm with those hyperparameters)
Rank the hyperparameter tuples by their relative fitness
Replace the worst-performing hyperparameter tuples with new hyperparameter tuples generated through crossover and mutation
Repeat steps 2-4 until satisfactory algorithm performance is reached or algorithm performance is no longer improving

Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms,[7] automated machine learning, deep neural network architecture search,[16][17] as well as training of the weights in deep neural networks.[18]

Population-based[edit]

Population Based Training (PBT) learns both hyperparameter values and network weights. Multiple learning processes operate independently, using different hyperparameters. Poorly performing models are iteratively replaced with models that adopt modified hyperparameter values from a better performer. The modification allows the hyperparameters to evolve and eliminates the need for manual hypertuning. The process makes no assumptions regarding model architecture, loss functions or training procedures.[19]

Others[edit]

RBF[20] and spectral[21] approaches have also been developed.

Open-source software[edit]

Grid search[edit]

Katib is a Kubernetes-native system which includes grid search.
scikit-learn is a Python package which includes grid search.
Tune is a Python library for distributed hyperparameter tuning and supports grid search.
Talos includes grid search for Keras.
H2O AutoML provides grid search over algorithms in the H2O open source machine learning library.

Random search[edit]

hyperopt, also via hyperas and hyperopt-sklearn, are Python packages which include random search.
Katib is a Kubernetes-native system which includes random search.
scikit-learn is a Python package which includes random search.
Tune is a Python library for distributed hyperparameter tuning and supports random search over arbitrary parameter distributions.
Talos includes a customizable random search for Keras.

Bayesian[edit]

Auto-sklearn[22] is a Bayesian hyperparameter optimization layer on top of scikit-learn.
Ax[23] is a Python-based experimentation platform that supports Bayesian optimization and bandit optimization as exploration strategies.
BOCS is a Matlab package which uses semidefinite programming for minimizing a black-box function over discrete inputs.[24] A Python 3 implementation is also included.
HpBandSter is a Python package which combines Bayesian optimization with bandit-based methods.[25]
Katib is a Kubernetes-native system which includes bayesian optimization.
mlrMBO, also with mlr, is an R package for model-based/Bayesian optimization of black-box functions.
scikit-optimize is a Python package or sequential model-based optimization with a scipy.optimize interface.[26]
SMAC SMAC is a Python/Java library implementing Bayesian optimization.[27]
tuneRanger is an R package for tuning random forests using model-based optimization.
optuna is a Python package for black box optimization, compatible with arbitrary functions that need to be optimized.

Gradient-based optimization[edit]

FAR-HO is a Python package containing Tensorflow implementations and wrappers for gradient-based hyperparamteter optimization with forward and reverse mode algorithmic differentiation.
XGBoost is an open-source software library which provides a gradient boosting framework for C++, Java, Python, R, and Julia.

Evolutionary[edit]

deap is a Python framework for general evolutionary computation which is flexible and integrates with parallelization packages like scoop and pyspark, and other Python frameworks like sklearn via sklearn-deap.
devol is a Python package that performs Deep Neural Network architecture search using genetic programming.
nevergrad[28] is a Python package which includes population control methods and particle swarm optimization.[29]
Tune is a Python library for distributed hyperparameter tuning and leverages nevergrad for evolutionary algorithm support.

Other[edit]

dlib[30] is a C++ package with a Python API which has a parameter-free optimizer based on LIPO and trust region optimizers working in tandem.[31]
Tune is a Python library for hyperparameter tuning execution and integrates with/scales many existing hyperparameter optimization libraries such as hyperopt, nevergrad, and scikit-optimize.
Harmonica is a Python package for spectral hyperparameter optimization.[21]
hyperopt, also via hyperas and hyperopt-sklearn, are Python packages which include Tree of Parzen Estimators based distributed hyperparameter optimization.
Katib is a Kubernetes-native system which includes grid, random search, bayesian optimization, hyperband, and NAS based on reinforcement learning.
nevergrad[28] is a Python package for gradient-free optimization using techniques such as differential evolution, sequential quadratic programming, fastGA, covariance matrix adaptation, population control methods, and particle swarm optimization.[29]
nni is a Python package which includes hyperparameter tuning for neural networks in local and distributed environments. Its techniques include TPE, random, anneal, evolution, SMAC, batch, grid, and hyperband.
parameter-sherpa is a similar Python package which includes several techniques grid search, Bayesian and genetic Optimization
pycma is a Python implementation of Covariance Matrix Adaptation Evolution Strategy.
rbfopt is a Python package that uses a radial basis function model[20]

Commercial services[edit]

Amazon Sagemaker uses Gaussian processes to tune hyperparameters.
BigML OptiML supports mixed search domains
Google HyperTune supports mixed search domains
Indie Solver supports multiobjective, multifidelity and constraint optimization
Mind Foundry OPTaaS supports mixed search domains, multiobjective, constraints, parallel optimization and surrogate models.
SigOpt supports mixed search domains, multiobjective, multisolution, multifidelity, constraint (linear and black-box), and parallel optimization.

References[edit]

^ Jump up to:a b Claesen, Marc; Bart De Moor (2015). "Hyperparameter Search in Machine Learning". arXiv:1502.02127 [cs.LG].
^ Jump up to:a b c Bergstra, James; Bengio, Yoshua (2012). "Random Search for Hyper-Parameter Optimization" (PDF). Journal of Machine Learning Research. 13: 281–305.
^ Chin-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin (2010). A practical guide to support vector classification. Technical Report, National Taiwan University.
^ Chicco D (December 2017). "Ten quick tips for machine learning in computational biology". BioData Mining. 10 (35): 35. doi:10.1186/s13040-017-0155-3. PMC 5721660. PMID 29234465.
^ Ziyu, Wang; Frank, Hutter; Masrour, Zoghi; David, Matheson; Nando, de Feitas (2016). "Bayesian Optimization in a Billion Dimensions via Random Embeddings". Journal of Artificial Intelligence Research. 55: 361–387. arXiv:1301.1942. doi:10.1613/jair.4806.
^ Hutter, Frank; Hoos, Holger; Leyton-Brown, Kevin (2011), "Sequential model-based optimization for general algorithm configuration" (PDF), Learning and Intelligent Optimization, Lecture Notes in Computer Science, 6683: 507–523, CiteSeerX 10.1.1.307.8813, doi:10.1007/978-3-642-25566-3_40, ISBN 978-3-642-25565-6
^ Jump up to:a b c Bergstra, James; Bardenet, Remi; Bengio, Yoshua; Kegl, Balazs (2011), "Algorithms for hyper-parameter optimization" (PDF), Advances in Neural Information Processing Systems
^ Snoek, Jasper; Larochelle, Hugo; Adams, Ryan (2012). "Practical Bayesian Optimization of Machine Learning Algorithms" (PDF). Advances in Neural Information Processing Systems. arXiv:1206.2944. Bibcode:2012arXiv1206.2944S.
^ Thornton, Chris; Hutter, Frank; Hoos, Holger; Leyton-Brown, Kevin (2013). "Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms" (PDF). Knowledge Discovery and Data Mining. arXiv:1208.3719. Bibcode:2012arXiv1208.3719T.
^ Larsen, Jan; Hansen, Lars Kai; Svarer, Claus; Ohlsson, M (1996). "Design and regularization of neural networks: the optimal use of a validation set" (PDF). Proceedings of the 1996 IEEE Signal Processing Society Workshop: 62–71. CiteSeerX 10.1.1.415.3266. doi:10.1109/NNSP.1996.548336. ISBN 0-7803-3550-3.
^ Olivier Chapelle; Vladimir Vapnik; Olivier Bousquet; Sayan Mukherjee (2002). "Choosing multiple parameters for support vector machines" (PDF). Machine Learning. 46: 131–159. doi:10.1023/a:1012450327387.
^ Chuong B; Chuan-Sheng Foo; Andrew Y Ng (2008). "Efficient multiple hyperparameter learning for log-linear models" (PDF). Advances in Neural Information Processing Systems 20.
^ Domke, Justin (2012). "Generic Methods for Optimization-Based Modeling"(PDF). Aistats. 22.
^ Maclaurin, Douglas; Duvenaud, David; Adams, Ryan P. (2015). "Gradient-based Hyperparameter Optimization through Reversible Learning". arXiv:1502.03492[stat.ML].
^ Franceschi, Luca; Donini, Michele; Frasconi, Paolo; Pontil, Massimiliano (2017). "Forward and Reverse Gradient-Based Hyperparameter Optimization" (PDF). Proceedings of the 34th International Conference on Machine Learning. arXiv:1703.01785. Bibcode:2017arXiv170301785F.
^ Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N, Hodjat B (2017). "Evolving Deep Neural Networks". arXiv:1703.00548 [cs.NE].
^ Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K (2017). "Population Based Training of Neural Networks". arXiv:1711.09846 [cs.LG].
^ Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J (2017). "Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning". arXiv:1712.06567 [cs.NE].
^ Li, Ang; Spyra, Ola; Perel, Sagi; Dalibard, Valentin; Jaderberg, Max; Gu, Chenjie; Budden, David; Harley, Tim; Gupta, Pramod (2019-02-05). "A Generalized Framework for Population Based Training". arXiv:1902.01894 [cs.AI].
^ Jump up to:a b Diaz, Gonzalo; Fokoue, Achille; Nannicini, Giacomo; Samulowitz, Horst (2017). "An effective algorithm for hyperparameter optimization of neural networks". arXiv:1705.08520 [cs.AI].
^ Jump up to:a b Hazan, Elad; Klivans, Adam; Yuan, Yang (2017). "Hyperparameter Optimization: A Spectral Approach". arXiv:1706.00764 [cs.LG].
^ Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F (2015). "Efficient and Robust Automated Machine Learning". Advances in Neural Information Processing Systems 28 (NIPS 2015): 2962–2970.
^ "Open-sourcing Ax and BoTorch: New AI tools for adaptive experimentation". 2019.
^ Baptista, Ricardo; Poloczek, Matthias (2018). "Bayesian Optimization of Combinatorial Structures". arXiv:1806.08838 [stat.ML].
^ Falkner, Stefan; Klein, Aaron; Hutter, Frank (2018). "BOHB: Robust and Efficient Hyperparameter Optimization at Scale". arXiv:1807.01774 [stat.ML].
^ "skopt API documentation". scikit-optimize.github.io.
^ Hutter F, Hoos HH, Leyton-Brown K. "Sequential Model-Based Optimization for General Algorithm Configuration" (PDF). Proceedings of the Conference on Learning and Intelligent OptimizatioN (LION 5).
^ Jump up to:a b "[QUESTION] How to use to optimize NN hyperparameters · Issue #1 · facebookresearch/nevergrad". GitHub.
^ Jump up to:a b "Nevergrad: An open source tool for derivative-free optimization". December 20, 2018.
^ "A toolkit for making real world machine learning and data analysis applications in C++: davisking/dlib". February 25, 2019 – via GitHub.
^ King

scikit对超参数模型优化对比（网格搜索与随机搜索对比）相关推荐

超参数优化（网格搜索和贝叶斯优化）
超参数优化 1 超参数优化 1.1 网格搜索类 1.1.1 枚举网格搜索 1.1.2 随机网格搜索 1.1.3 对半网格搜索(Halving Grid Search) 1.2 贝叶斯超参数优化(推荐) ...
几种机器学习常用调参方式对比(网格搜索，随机搜索，贝叶斯优化)
网格搜索(GridSearchCV): grid search就是穷举,穷举所有得超参组合 Ex:当对决策树调参,若只对一个超参优化,如树的最大深度,尝试[3,5,7],则可表示为若还想对分裂标准进 ...
机器学习、超参数、最优超参数、网格搜索、随机搜索、贝叶斯优化、Google Vizier、Adviser
机器学习.超参数.最优超参数.网格搜索.随机搜索.贝叶斯优化.Google Vizier.Adviser 最优超参数选择超参数的问题在于,没有放之四海而皆准的超参数. 因此,对于每个新数据集,我们必 ...
python机器学习库sklearn——参数优化（网格搜索GridSearchCV、随机搜索RandomizedSearchCV、hyperopt）
分享一个朋友的人工智能教程.零基础!通俗易懂!风趣幽默!还带黄段子!大家可以看看是否对自己有帮助:点击打开全栈工程师开发手册 (作者:栾鹏) python数据挖掘系列教程优化的相关的知识内容可以参 ...
模型调参之网格搜索与随机搜索
模型调参之网格搜索与随机搜索网格搜索法(GridSearchCV) GridSearchCV:GridSearchCV可以拆分成GridSearch和CV两部分,即网格搜素和交叉验证.GridSea ...
网格搜索、随机搜索和贝叶斯调参总结与实践
网格搜索网格搜索时应用最广泛的超参数搜素算法,网格搜索通过查找搜索范围内的所有点,来确定最优值.一般是通过给出较大的搜索范围以及较小的步长,网格搜索时一定可以找到全局最大值或全局最小值的. 但是网格 ...
[转载] python机器学习库sklearn——参数优化（网格搜索GridSearchCV、随机搜索RandomizedSearchCV、hyperopt）
参考链接: Python中的网格搜索优化算法分享一个朋友的人工智能教程.零基础!通俗易懂!风趣幽默!还带黄段子!大家可以看看是否对自己有帮助:点击打开全栈工程师开发手册 (作者:栾鹏) pyth ...
超参数优化：网格搜索法
文章目录网格搜索法在机器学习和深度学习中的使用 1.项目简介 2.机器学习案例 2.1导入相关库 2.2导入数据 2.3拆分数据集 2.4网格搜索法 2.5使用最优参数重新训练模型 3.深度学习案例 ...
超参数搜索——网格搜索和随机搜索
https://cloud.tencent.com/developer/article/1187140

scikit对超参数模型优化对比（网格搜索与随机搜索对比）

Hyperparameter optimization

Contents

Approaches[edit]

Grid search[edit]

Random search[edit]

Bayesian optimization[edit]

Gradient-based optimization[edit]

Evolutionary optimization[edit]

Population-based[edit]

Others[edit]

Open-source software[edit]

Grid search[edit]

Random search[edit]

Bayesian[edit]

Gradient-based optimization[edit]

Evolutionary[edit]

Other[edit]

Commercial services[edit]

See also[edit]

References[edit]

scikit对超参数模型优化对比（网格搜索与随机搜索对比）相关推荐

最新文章

热门文章