调参神器optuna学习笔记

介绍

optuna作为调参工具适合绝大多数的机器学习框架，sklearn，xgb，lgb，pytorch等。

主要的调参原理如下：
1 采样算法
利用 suggested 参数值和评估的目标值的记录，采样器基本上不断缩小搜索空间，直到找到一个最佳的搜索空间，
其产生的参数会带来更好的目标函数值。

optuna.samplers.TPESampler 实现的 Tree-structured Parzen Estimator 算法
optuna.samplers.CmaEsSampler 实现的 CMA-ES 算法
optuna.samplers.GridSampler 实现的网格搜索
optuna.samplers.RandomSampler 实现的随机搜索
默认TPE

study = optuna.create_study()
print(f"Sampler is {study.sampler.__class__.__name__}")study = optuna.create_study(sampler=optuna.samplers.RandomSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")study = optuna.create_study(sampler=optuna.samplers.CmaEsSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")

2 剪枝算法
自动在训练的早期（也就是自动化的 early-stopping）终止无望的 trial

optuna.pruners.SuccessiveHalvingPruner 实现的 Asynchronous Successive Halving 算法。
optuna.pruners.HyperbandPruner 实现的 Hyperband 算法。
optuna.pruners.MedianPruner 实现的中位数剪枝算法
optuna.pruners.ThresholdPruner 实现的阈值剪枝算法

激活 Pruner
要打开剪枝特性的话，你需要在迭代式训练的每一步后调用 report() 和 should_prune(). report() 定期监控目标函数的中间值. should_prune() 确定终结那些没有达到预先设定条件的 trial.

import logging
import sys
import sklearn.datasets
import sklearn.linear_model
import sklearn.model_selectiondef objective(trial):iris = sklearn.datasets.load_iris()classes = list(set(iris.target))train_x, valid_x, train_y, valid_y = sklearn.model_selection.train_test_split(iris.data, iris.target, test_size=0.25, random_state=0)alpha = trial.suggest_float("alpha", 1e-5, 1e-1, log=True)clf = sklearn.linear_model.SGDClassifier(alpha=alpha)for step in range(100):clf.partial_fit(train_x, train_y, classes=classes)# Report intermediate objective value.intermediate_value = 1.0 - clf.score(valid_x, valid_y)trial.report(intermediate_value, step)# Handle pruning based on the intermediate value.if trial.should_prune():raise optuna.TrialPruned()return 1.0 - clf.score(valid_x, valid_y)# Add stream handler of stdout to show the messages
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
study.optimize(objective, n_trials=20)

对 optuna.samplers.RandomSampler 而言 optuna.pruners.MedianPruner 是最好的。
对于 optuna.samplers.TPESampler 而言 optuna.pruners.Hyperband 是最好的。

当 Optuna 被用于机器学习时，目标函数通常返回模型的损失或者准确度。

1. Study 对象

Trial: 目标函数的单次调用
Study: 一次优化过程，包含一系列的 trials.
Parameter: 待优化的参数.
在 Optuna 中，我们用 study 对象来管理优化过程。 create_study() 方法会返回一个 study 对象。该对象包含若干有用的属性，可以用于分析优化结果。
获得参数名和参数值的字典：
study.best_params
获得最佳目标值：
study.best_values

2.超参数采样

optuna.trial.Trial.suggest_categorical() 用于类别参数
optuna.trial.Trial.suggest_int() 用于整形参数
optuna.trial.Trial.suggest_float() 用于浮点型参数

通过可选的 step 与 log 参数，我们可以对整形或者浮点型参数进行离散化或者取对数操作。
这里的step比较好理解，对于整型就是步长，对于float就是离散化程度(分箱)

log开始不是特别理解，查看了optuna的源码：
对于float:If log is true, the value is sampled from the range in the log domain.
Otherwise, the value is sampled from the range in the linear domain.
还是很懵逼，看看numpy里面是怎么搞的，numpy里面有三种抽样方式：
logspace
Similar to geomspace, but with endpoints specified using log and base.
linspace
Similar to geomspace, but with arithmetic instead of geometric progression.
geomspace
Similar to logspace, but with endpoints specified directly.
举个例子比较直观：

np.linspace(0.02, 2.0, num=20)
np.geomspace(0.02, 2.0, num=20)
np.logspace(0.02, 2.0, num=20)

linspace是一列等差数列，

[ 0.02  0.12421053  0.22842105  0.33263158  0.43684211  0.541052630.64526316  0.74947368  0.85368421  0.95789474  1.06210526  1.166315791.27052632  1.37473684  1.47894737  1.58315789  1.68736842  1.791578951.89578947  2. ]

geomspace是一列等比数列

[0.02 ,  0.0254855 ,  0.03247553,  0.04138276,  0.05273302,0.06719637,  0.08562665,  0.1091119 ,  0.13903856,  0.17717336,0.22576758,  0.28768998,  0.36659614,  0.46714429,  0.59527029,0.75853804,  0.96658605,  1.23169642,  1.56951994,  2.]

logspace会计算默认计算一个basestartbase^{start}basestart和baseendbase^{end}baseend, base默认为10，计算了start和end
start=100.02=1.047,end=102=100.start=10^{0.02} =1.047, end=10^{2} =100.start=100.02=1.047,end=102=100.

[  1.04712855    1.33109952    1.69208062    2.15095626    2.734274463.47578281    4.41838095    5.61660244    7.13976982    9.0760052211.53732863   14.66613875   18.64345144   23.69937223   30.1264090438.29639507   48.68200101   61.88408121   78.6664358   100.  ]

代码示例:


import optuna
def objective(trial):# Categorical parameteroptimizer = trial.suggest_categorical("optimizer", ["MomentumSGD", "Adam"])# Integer parameternum_layers = trial.suggest_int("num_layers", 1, 3)# Integer parameter (log)num_channels = trial.suggest_int("num_channels", 32, 512, log=True)# Integer parameter (discretized)num_units = trial.suggest_int("num_units", 10, 100, step=5)# Floating point parameterdropout_rate = trial.suggest_float("dropout_rate", 0.0, 1.0)# Floating point parameter (log)learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True)# Floating point parameter (discretized)drop_path_rate = trial.suggest_float("drop_path_rate", 0.0, 1.0, step=0.1)

定义参数空间

在 Optuna 中，我们使用和 Python 语法类似的方式来定义搜索空间，其中包含条件和循环语句。
类似地，你也可以根据参数值采用分支或者循环。

# 分支
import sklearn.ensemble
import sklearn.svmdef objective(trial):classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])if classifier_name == "SVC":svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)classifier_obj = sklearn.svm.SVC(C=svc_c)else:rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth)
# 循环
import torch
import torch.nn as nn
def create_model(trial, in_size):n_layers = trial.suggest_int("n_layers", 1, 3)layers = []for i in range(n_layers):n_units = trial.suggest_int("n_units_l{}".format(i), 4, 128, log=True)layers.append(nn.Linear(in_size, n_units))layers.append(nn.ReLU())in_size = n_unitslayers.append(nn.Linear(in_size, 10))return nn.Sequential(*layers)

关于参数个数的注意事项
随着参数个数的增长，优化的难度约呈指数增长。也就是说，当你增加参数的个数的时候，优化所需要的 trial 个数会呈指数增长。因此我们不推荐增加不必要的参数。

Reference：
1.官网
2.github examples
3.Difference in output between numpy linspace and numpy logspace
4.np.geomspace