0. 往期内容

[一]深度学习Pytorch-张量定义与张量创建

[二]深度学习Pytorch-张量的操作:拼接、切分、索引和变换

[三]深度学习Pytorch-张量数学运算

[四]深度学习Pytorch-线性回归

[五]深度学习Pytorch-计算图与动态图机制

[六]深度学习Pytorch-autograd与逻辑回归

[七]深度学习Pytorch-DataLoader与Dataset(含人民币二分类实战)

[八]深度学习Pytorch-图像预处理transforms

[九]深度学习Pytorch-transforms图像增强(剪裁、翻转、旋转)

[十]深度学习Pytorch-transforms图像操作及自定义方法

[十一]深度学习Pytorch-模型创建与nn.Module

[十二]深度学习Pytorch-模型容器与AlexNet构建

[十三]深度学习Pytorch-卷积层(1D/2D/3D卷积、卷积nn.Conv2d、转置卷积nn.ConvTranspose)

[十四]深度学习Pytorch-池化层、线性层、激活函数层

[十五]深度学习Pytorch-权值初始化

[十六]深度学习Pytorch-18种损失函数loss function

[十七]深度学习Pytorch-优化器Optimizer

深度学习Pytorch-优化器Optimizer

  • 0. 往期内容
  • 1. 优化器定义
  • 2. 优化器基本属性
  • 3. 优化器基本方法
  • 4. 学习率 Learning Rate
  • 5. 动量 Momentum
  • 6. 10种常见优化器
    • 6.1 optim.SGD
    • 6.2 其他常见优化器
  • 7. 代码示例

1. 优化器定义


(1)管理+更新
(2)可学习参数:权重、bias.
(3)接近真实标签:loss下降。

2. 优化器基本属性

(1)param_groups是list,里面存储了字典(参数名:参数值)
(2)param_groups是list,其每一个元素是一个字典

3. 优化器基本方法


add_param_group()可以设置多组参数,用来调整模型在不同阶段的学习速度等。

4. 学习率 Learning Rate



通常学习率设置为0.01

5. 动量 Momentum

beta类似于记忆周期,beta越小,记忆周期越短。比如beta=0.8,到第20天左右就记不住了,而beta=0.98可以记忆到80天左右。

不加动量

加动量

6. 10种常见优化器

6.1 optim.SGD

6.2 其他常见优化器

7. 代码示例

create_optimizer.py

# -*- coding: utf-8 -*-
"""
# @file name  : create_optimizer.py
# @brief      : 人民币分类模型训练
"""
import os
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
import torch.optim as optim
from matplotlib import pyplot as plt
from model.lenet import LeNet
from tools.my_dataset import RMBDataset
from tools.common_tools import transform_invert, set_seedset_seed(1)  # 设置随机种子
rmb_label = {"1": 0, "100": 1}# 参数设置
MAX_EPOCH = 10
BATCH_SIZE = 16
LR = 0.01
log_interval = 10
val_interval = 1# ============================ step 1/5 数据 ============================split_dir = os.path.join("..", "..", "data", "rmb_split")
train_dir = os.path.join(split_dir, "train")
valid_dir = os.path.join(split_dir, "valid")norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]train_transform = transforms.Compose([transforms.Resize((32, 32)),transforms.RandomCrop(32, padding=4),transforms.RandomGrayscale(p=0.8),transforms.ToTensor(),transforms.Normalize(norm_mean, norm_std),
])valid_transform = transforms.Compose([transforms.Resize((32, 32)),transforms.ToTensor(),transforms.Normalize(norm_mean, norm_std),
])# 构建MyDataset实例
train_data = RMBDataset(data_dir=train_dir, transform=train_transform)
valid_data = RMBDataset(data_dir=valid_dir, transform=valid_transform)# 构建DataLoder
train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
valid_loader = DataLoader(dataset=valid_data, batch_size=BATCH_SIZE)# ============================ step 2/5 模型 ============================net = LeNet(classes=2)
net.initialize_weights()# ============================ step 3/5 损失函数 ============================
criterion = nn.CrossEntropyLoss()                                                   # 选择损失函数# ============================ step 4/5 优化器 ============================
optimizer = optim.SGD(net.parameters(), lr=LR, momentum=0.9)                        # 选择优化器
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)     # 设置学习率下降策略# ============================ step 5/5 训练 ============================
train_curve = list()
valid_curve = list()for epoch in range(MAX_EPOCH):loss_mean = 0.correct = 0.total = 0.net.train()for i, data in enumerate(train_loader):# forwardinputs, labels = dataoutputs = net(inputs)# backwardoptimizer.zero_grad()loss = criterion(outputs, labels)loss.backward()# update weightsoptimizer.step()# 统计分类情况_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).squeeze().sum().numpy()# 打印训练信息loss_mean += loss.item()train_curve.append(loss.item())if (i+1) % log_interval == 0:loss_mean = loss_mean / log_intervalprint("Training:Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} Acc:{:.2%}".format(epoch, MAX_EPOCH, i+1, len(train_loader), loss_mean, correct / total))loss_mean = 0.scheduler.step()  # 更新学习率# validate the modelif (epoch+1) % val_interval == 0:correct_val = 0.total_val = 0.loss_val = 0.net.eval()with torch.no_grad():for j, data in enumerate(valid_loader):inputs, labels = dataoutputs = net(inputs)loss = criterion(outputs, labels)_, predicted = torch.max(outputs.data, 1)total_val += labels.size(0)correct_val += (predicted == labels).squeeze().sum().numpy()loss_val += loss.item()valid_curve.append(loss_val)print("Valid:\t Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} Acc:{:.2%}".format(epoch, MAX_EPOCH, j+1, len(valid_loader), loss_val, correct / total))train_x = range(len(train_curve))
train_y = train_curvetrain_iters = len(train_loader)
valid_x = np.arange(1, len(valid_curve)+1) * train_iters*val_interval # 由于valid中记录的是epochloss,需要对记录点进行转换到iterations
valid_y = valid_curveplt.plot(train_x, train_y, label='Train')
plt.plot(valid_x, valid_y, label='Valid')plt.legend(loc='upper right')
plt.ylabel('loss value')
plt.xlabel('Iteration')
plt.show()# ============================ inference ============================BASE_DIR = os.path.dirname(os.path.abspath(__file__))
test_dir = os.path.join(BASE_DIR, "test_data")test_data = RMBDataset(data_dir=test_dir, transform=valid_transform)
valid_loader = DataLoader(dataset=test_data, batch_size=1)for i, data in enumerate(valid_loader):# forwardinputs, labels = dataoutputs = net(inputs)_, predicted = torch.max(outputs.data, 1)rmb = 1 if predicted.numpy()[0] == 0 else 100img_tensor = inputs[0, ...]  # C H Wimg = transform_invert(img_tensor, train_transform)plt.imshow(img)plt.title("LeNet got {} Yuan".format(rmb))plt.show()plt.pause(0.5)plt.close()

optimizer_methods.py

# -*- coding: utf-8 -*-
"""
# @file name  : optimizer_methods.py
# @brief      : optimizer's methods
"""
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
import torch
import torch.optim as optim
from tools.common_tools import set_seedset_seed(1)  # 设置随机种子weight = torch.randn((2, 2), requires_grad=True) # 创建2*2的weight
weight.grad = torch.ones((2, 2)) # 权值梯度设为1optimizer = optim.SGD([weight], lr=0.1)# ----------------------------------- step -----------------------------------
flag = 0
# flag = 1
if flag:print("weight before step:{}".format(weight.data))#输出[[0.6614, 0.2669], [0.0617, 0.6213]]optimizer.step()       print("weight after step:{}".format(weight.data)) #输出[[0.5614, 0.1669], [-0.0383, 0.5213]]#0.5614 = 0.6614 - weight.grad (1) * lr (0.1)=0.6614-0.1,其余同理减去0.1# ----------------------------------- zero_grad -----------------------------------
flag = 0
# flag = 1
if flag:print("weight before step:{}".format(weight.data))optimizer.step()        # 修改lr=1 0.1观察结果print("weight after step:{}".format(weight.data))print("weight in optimizer:{}\nweight in weight:{}\n".format(id(optimizer.param_groups[0]['params'][0]), id(weight))) #输出一致#优化器中管理参数的内存地址和weight的内存地址一致print("weight.grad is {}\n".format(weight.grad)) #输出[[1., 1.], [1., 1.]]optimizer.zero_grad()print("after optimizer.zero_grad(), weight.grad is\n{}".format(weight.grad)) #输出[[0., 0.], [0., 0.]]# ----------------------------------- add_param_group -----------------------------------
flag = 0
# flag = 1
if flag:print("optimizer.param_groups is\n{}".format(optimizer.param_groups))w2 = torch.randn((3, 3), requires_grad=True)optimizer.add_param_group({"params": w2, 'lr': 0.0001}) #添加一组超参数print("optimizer.param_groups is\n{}".format(optimizer.param_groups)) #有两组超参数# ----------------------------------- state_dict -----------------------------------
flag = 0
# flag = 1
if flag:optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)opt_state_dict = optimizer.state_dict()print("state_dict before step:\n", opt_state_dict)for i in range(10):optimizer.step()print("state_dict after step:\n", optimizer.state_dict())torch.save(optimizer.state_dict(), os.path.join(BASE_DIR, "optimizer_state_dict.pkl"))# -----------------------------------load state_dict -----------------------------------
flag = 0
# flag = 1
if flag:optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)state_dict = torch.load(os.path.join(BASE_DIR, "optimizer_state_dict.pkl"))print("state_dict before load state:\n", optimizer.state_dict())optimizer.load_state_dict(state_dict)print("state_dict after load state:\n", optimizer.state_dict())

learning_rate.py

# -*- coding:utf-8 -*-
"""
@file name  : learning_rate.py
@brief      : 梯度下降的学习率演示
"""
import torch
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(1)def func(x_t):"""y = (2x)^2 = 4*x^2      dy/dx = 8x"""return torch.pow(2*x_t, 2)# init
x = torch.tensor([2.], requires_grad=True)# ------------------------------ plot data ------------------------------
flag = 0
# flag = 1
if flag:x_t = torch.linspace(-3, 3, 100)y = func(x_t)plt.plot(x_t.numpy(), y.numpy(), label="y = 4*x^2")plt.grid()plt.xlabel("x")plt.ylabel("y")plt.legend()plt.show()# ------------------------------ gradient descent ------------------------------
flag = 0
# flag = 1
if flag:iter_rec, loss_rec, x_rec = list(), list(), list()lr = 0.01    # /1. /.5 /.2 /.1 /.125max_iteration = 20   # /1. 4     /.5 4   /.2 20 200for i in range(max_iteration):y = func(x)y.backward()print("Iter:{}, X:{:8}, X.grad:{:8}, loss:{:10}".format(i, x.detach().numpy()[0], x.grad.detach().numpy()[0], y.item()))x_rec.append(x.item())x.data.sub_(lr * x.grad)    # x -= lr * x.grad  数学表达式意义:  x = x - lr * x.grad    # 0.5 0.2 0.1 0.125x.grad.zero_()iter_rec.append(i)loss_rec.append(y)plt.subplot(121).plot(iter_rec, loss_rec, '-ro')plt.xlabel("Iteration")plt.ylabel("Loss value")x_t = torch.linspace(-3, 3, 100)y = func(x_t)plt.subplot(122).plot(x_t.numpy(), y.numpy(), label="y = 4*x^2")plt.grid()y_rec = [func(torch.tensor(i)).item() for i in x_rec]plt.subplot(122).plot(x_rec, y_rec, '-ro')plt.legend()plt.show()# ------------------------------ multi learning rate ------------------------------# flag = 0
flag = 1
if flag:iteration = 100num_lr = 10lr_min, lr_max = 0.01, 0.2  # .5 .3 .2lr_list = np.linspace(lr_min, lr_max, num=num_lr).tolist()loss_rec = [[] for l in range(len(lr_list))]iter_rec = list()for i, lr in enumerate(lr_list):x = torch.tensor([2.], requires_grad=True)for iter in range(iteration):y = func(x)y.backward()x.data.sub_(lr * x.grad)  # x.data -= x.gradx.grad.zero_()loss_rec[i].append(y.item())for i, loss_r in enumerate(loss_rec):plt.plot(range(len(loss_r)), loss_r, label="LR: {}".format(lr_list[i]))plt.legend()plt.xlabel('Iterations')plt.ylabel('Loss value')plt.show()

momentum.py

# -*- coding:utf-8 -*-
"""
@file name  : momentum.py
@brief      : 梯度下降的动量 momentum
"""
import torch
import numpy as np
import torch.optim as optim
import matplotlib.pyplot as plt
torch.manual_seed(1)def exp_w_func(beta, time_list):return [(1 - beta) * np.power(beta, exp) for exp in time_list]beta = 0.9
num_point = 100
time_list = np.arange(num_point).tolist()# ------------------------------ exponential weight ------------------------------
flag = 0
# flag = 1
if flag:weights = exp_w_func(beta, time_list)plt.plot(time_list, weights, '-ro', label="Beta: {}\ny = B^t * (1-B)".format(beta))plt.xlabel("time")plt.ylabel("weight")plt.legend()plt.title("exponentially weighted average")plt.show()print(np.sum(weights))# ------------------------------ multi weights ------------------------------
flag = 0
# flag = 1
if flag:beta_list = [0.98, 0.95, 0.9, 0.8]w_list = [exp_w_func(beta, time_list) for beta in beta_list]for i, w in enumerate(w_list):plt.plot(time_list, w, label="Beta: {}".format(beta_list[i]))plt.xlabel("time")plt.ylabel("weight")plt.legend()plt.show()# ------------------------------ SGD momentum ------------------------------
# flag = 0
flag = 1
if flag:def func(x):return torch.pow(2*x, 2)    # y = (2x)^2 = 4*x^2        dy/dx = 8xiteration = 100m = 0.9     # .9 .63lr_list = [0.01, 0.03]momentum_list = list()loss_rec = [[] for l in range(len(lr_list))]iter_rec = list()for i, lr in enumerate(lr_list):x = torch.tensor([2.], requires_grad=True)momentum = 0. if lr == 0.03 else mmomentum_list.append(momentum)optimizer = optim.SGD([x], lr=lr, momentum=momentum)for iter in range(iteration):y = func(x)y.backward()optimizer.step()optimizer.zero_grad()loss_rec[i].append(y.item())for i, loss_r in enumerate(loss_rec):plt.plot(range(len(loss_r)), loss_r, label="LR: {} M:{}".format(lr_list[i], momentum_list[i]))plt.legend()plt.xlabel('Iterations')plt.ylabel('Loss value')plt.show()

[十七]深度学习Pytorch-优化器Optimizer相关推荐

  1. 妈耶,讲得好详细,十分钟彻底看懂深度学习常用优化器SGD、RMSProp、Adam详解分析

    深度学习常用优化器学习总结 常用优化器 SGD RMS Prop Adam 常用优化器 SGD 基本思想:通过当前梯度和历史梯度共同调节梯度的方向和大小 我们首先根据pytorch官方文档上的这个流程 ...

  2. Pytorch优化器Optimizer

    优化器Optimizer 什么是优化器 pytorch的优化器:管理并更新模型中可学习参数的值,使得模型输出更接近真实标签 导数:函数在指定坐标轴上的变化率 方向导数:指定方向上的变化率(二元及以上函 ...

  3. 深度学习各类优化器详解(动量、NAG、adam、Adagrad、adadelta、RMSprop、adaMax、Nadam、AMSGrad)

    深度学习梯度更新各类优化器详细介绍 文章目录 <center>深度学习梯度更新各类优化器详细介绍 一.前言: 二.梯度下降变形形式 1.批量归一化(BGD) 2.随机梯度下降(SGD) 3 ...

  4. 【深度学习】优化器详解

    优化器 深度学习模型通过引入损失函数,用来计算目标预测的错误程度.根据损失函数计算得到的误差结果,需要对模型参数(即权重和偏差)进行很小的更改,以期减少预测错误.但问题是如何知道何时应更改参数,如果要 ...

  5. 深度学习:优化器工厂,各种优化器介绍,numpy实现深度学习(一)

    文章目录 简单概括参数更新: 优化器 Vanilla Update: Vanilla 代码实现: Momentum Update: Momentum 代码实现: Nesterov Momentum U ...

  6. Pytorch —— 优化器Optimizer(二)

    1.learning rate学习率 梯度下降:wi+1=wi−LR∗g(wi)w_{i+1}=w_{i}-LR*g\left(w_{i}\right)wi+1​=wi​−LR∗g(wi​)梯度是沿着 ...

  7. 深度学习相关优化器以及在tensorflow的使用(转)

    参考链接:https://arxiv.org/pdf/1609.04747.pdf 优化器对比论文 https://www.leiphone.com/news/201706/e0PuNeEzaXWsM ...

  8. 深度学习TensorFlow优化器的选择

    原文链接:https://blog.csdn.net/junchengberry/article/details/81102058 在很多机器学习和深度学习的应用中,我们发现用的最多的优化器是 Ada ...

  9. 深度学习之优化器(优化算法)

    前言 前面已经讲过几中梯度下降算法了,并且给了一个收尾引出这一章节,想看的小伙伴可以去看看这一篇文章:机器学习之梯度下降算法.前面讲过对SGD来说,最要命的是SGD可能会遇到"峡谷" ...

最新文章

  1. oracle数据库连接设置配置文件
  2. Android中Spinner下拉列表(使用ArrayAdapter和自定义Adapter实现)
  3. leetcode239. 滑动窗口最大值(思路+详解)
  4. STM32F412应用开发笔记之二:基本GPIO控制
  5. 【jvm】JVM体系
  6. 项目管理中的流程管理
  7. 编写时间的php,PHP如何实现简单日历类编写 PHP实现简单日历类编写代码
  8. oracle归档日志百分比,Oracle归档日志处理
  9. 台式计算机硬盘主要接口,硬盘接口类型,详细教您怎么看硬盘接口的类型
  10. java制作仿win7计算器之二完结篇
  11. Java操作excel自动生成水印背景
  12. win7系统,打开office出现错误代码0x8007007B的解决办法
  13. 营养素的基础知识下(非技术文)
  14. 【网络流量识别】【深度学习】【三】CNN和LSTM—基于信息获取和深度学习的网络流量异常检测
  15. ffmpeg画中画效果
  16. Qt+OpenCV摄像头读取保存回放视频
  17. 一个假冒的序列号被用来注册Internet Download Manager。IDM正在退出...解决办法
  18. Beyond Compare软件使用详解
  19. 广东高中生多少人_广东省高中生100米短跑记录是多少?谢谢
  20. dotnet 读 WPF 源代码笔记 渲染收集是如何触发

热门文章

  1. 有关《游戏设计艺术(第二版)》的个人学习
  2. 买笔记本必须带 9款小巧测试软件推荐
  3. 大内存时代虚拟内存现在还有用吗
  4. Introduction to Algorithms (Table Doubling, Karp-Rabin)
  5. 联系人列表字母排序索引(三)
  6. table表头固定方法
  7. “表白日”程序员给女友的专属礼物
  8. React 实现页面全屏效果
  9. 怎样 隐藏光标—设置光标
  10. shell 脚本初探基本脚本编译