PyTorch实现L2和L1正则化的方法

作者 | pan_jinquan

来源 | CSDN博文精选

目录
1.torch.optim优化器实现L2正则化
2.如何判断正则化作用了模型？
2.1未加入正则化loss和Accuracy
2.1加入正则化loss和Accuracy
2.3正则化说明
3.自定义正则化的方法
3.1自定义正则化Regularization类
3.2Regularization使用方法
4.Github项目源码下载

1.torch.optim优化器实现L2正则化

torch.optim集成了很多优化器，如SGD，Adadelta，Adam，Adagrad，RMSprop等，这些优化器自带的一个参数weight_decay，用于指定权值衰减率，相当于L2正则化中的λ参数，注意torch.optim集成的优化器只有L2正则化方法，你可以查看注释，参数weight_decay 的解析是：

weight_decay (float, optional): weight decay (L2 penalty) (default: 0)

使用torch.optim的优化器，可如下设置L2正则化

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.01)

但是这种方法存在几个问题，

（1）一般正则化，只是对模型的权重W参数进行惩罚，而偏置参数b是不进行惩罚的，而torch.optim的优化器weight_decay参数指定的权值衰减是对网络中的所有参数，包括权值w和偏置b同时进行惩罚。很多时候如果对b 进行L2正则化将会导致严重的欠拟合，因此这个时候一般只需要对权值w进行正则即可。（PS：这个我真不确定，源码解析是 weight decay (L2 penalty) ，但有些网友说这种方法会对参数偏置b也进行惩罚，可解惑的网友给个明确的答复）

（2）缺点：torch.optim的优化器固定实现L2正则化，不能实现L1正则化。如果需要L1正则化，可如下实现：

（3）根据正则化的公式，加入正则化后，loss会变原来大，比如weight_decay=1的loss为10，那么weight_decay=100时，loss输出应该也提高100倍左右。而采用torch.optim的优化器的方法，如果你依然采用loss_fun= nn.CrossEntropyLoss()进行计算loss，你会发现，不管你怎么改变weight_decay的大小，loss会跟之前没有加正则化的大小差不多。这是因为你的loss_fun损失函数没有把权重W的损失加上。

（4）采用torch.optim的优化器实现正则化的方法，是没问题的！只不过很容易让人产生误解，对鄙人而言，我更喜欢TensorFlow的正则化实现方法，只需要tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)，实现过程几乎跟正则化的公式对应的上。

（5）Github项目源码：

https://github.com/PanJinquan/pytorch-learning-tutorials/blob/master/image_classification/train_resNet.py，麻烦给个“Star”

为了，解决这些问题，我特定自定义正则化的方法，类似于TensorFlow正则化实现方法。

2.如何判断正则化作用了模型？

一般来说，正则化的主要作用是避免模型产生过拟合，当然啦，过拟合问题，有时候是难以判断的。但是，要判断正则化是否作用了模型，还是很容易的。下面我给出两组训练时产生的loss和Accuracy的log信息，一组是未加入正则化的，一组是加入正则化：

2.1未加入正则化loss和Accuracy

优化器采用Adam，并且设置参数weight_decay=0.0，即无正则化的方法

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.0)

训练时输出的 loss和Accuracy信息

step/epoch:0/0,Train Loss: 2.418065, Acc: [0.15625]step/epoch:10/0,Train Loss: 5.194936, Acc: [0.34375]step/epoch:20/0,Train Loss: 0.973226, Acc: [0.8125]step/epoch:30/0,Train Loss: 1.215165, Acc: [0.65625]step/epoch:40/0,Train Loss: 1.808068, Acc: [0.65625]step/epoch:50/0,Train Loss: 1.661446, Acc: [0.625]step/epoch:60/0,Train Loss: 1.552345, Acc: [0.6875]step/epoch:70/0,Train Loss: 1.052912, Acc: [0.71875]step/epoch:80/0,Train Loss: 0.910738, Acc: [0.75]step/epoch:90/0,Train Loss: 1.142454, Acc: [0.6875]step/epoch:100/0,Train Loss: 0.546968, Acc: [0.84375]step/epoch:110/0,Train Loss: 0.415631, Acc: [0.9375]step/epoch:120/0,Train Loss: 0.533164, Acc: [0.78125]step/epoch:130/0,Train Loss: 0.956079, Acc: [0.6875]step/epoch:140/0,Train Loss: 0.711397, Acc: [0.8125]

2.2 加入正则化loss和Accuracy

优化器采用Adam，并且设置参数weight_decay=10.0，即正则化的权重lambda =10.0

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=10.0)

这时，训练时输出的 loss和Accuracy信息：

step/epoch:0/0,Train Loss: 2.467985, Acc: [0.09375]step/epoch:10/0,Train Loss: 5.435320, Acc: [0.40625]step/epoch:20/0,Train Loss: 1.395482, Acc: [0.625]step/epoch:30/0,Train Loss: 1.128281, Acc: [0.6875]step/epoch:40/0,Train Loss: 1.135289, Acc: [0.6875]step/epoch:50/0,Train Loss: 1.455040, Acc: [0.5625]step/epoch:60/0,Train Loss: 1.023273, Acc: [0.65625]step/epoch:70/0,Train Loss: 0.855008, Acc: [0.65625]step/epoch:80/0,Train Loss: 1.006449, Acc: [0.71875]step/epoch:90/0,Train Loss: 0.939148, Acc: [0.625]step/epoch:100/0,Train Loss: 0.851593, Acc: [0.6875]step/epoch:110/0,Train Loss: 1.093970, Acc: [0.59375]step/epoch:120/0,Train Loss: 1.699520, Acc: [0.625]step/epoch:130/0,Train Loss: 0.861444, Acc: [0.75]step/epoch:140/0,Train Loss: 0.927656, Acc: [0.625]

当weight_decay=10000.0

step/epoch:0/0,Train Loss: 2.337354, Acc: [0.15625]step/epoch:10/0,Train Loss: 2.222203, Acc: [0.125]step/epoch:20/0,Train Loss: 2.184257, Acc: [0.3125]step/epoch:30/0,Train Loss: 2.116977, Acc: [0.5]step/epoch:40/0,Train Loss: 2.168895, Acc: [0.375]step/epoch:50/0,Train Loss: 2.221143, Acc: [0.1875]step/epoch:60/0,Train Loss: 2.189801, Acc: [0.25]step/epoch:70/0,Train Loss: 2.209837, Acc: [0.125]step/epoch:80/0,Train Loss: 2.202038, Acc: [0.34375]step/epoch:90/0,Train Loss: 2.192546, Acc: [0.25]step/epoch:100/0,Train Loss: 2.215488, Acc: [0.25]step/epoch:110/0,Train Loss: 2.169323, Acc: [0.15625]step/epoch:120/0,Train Loss: 2.166457, Acc: [0.3125]step/epoch:130/0,Train Loss: 2.144773, Acc: [0.40625]step/epoch:140/0,Train Loss: 2.173397, Acc: [0.28125]

2.3 正则化说明

就整体而言，对比加入正则化和未加入正则化的模型，训练输出的loss和Accuracy信息，我们可以发现，加入正则化后，loss下降的速度会变慢，准确率Accuracy的上升速度会变慢，并且未加入正则化模型的loss和Accuracy的浮动比较大（或者方差比较大），而加入正则化的模型训练loss和Accuracy，表现的比较平滑。并且随着正则化的权重lambda越大，表现的更加平滑。这其实就是正则化的对模型的惩罚作用，通过正则化可以使得模型表现的更加平滑，即通过正则化可以有效解决模型过拟合的问题。

3.自定义正则化的方法

为了解决torch.optim优化器只能实现L2正则化以及惩罚网络中的所有参数的缺陷，这里实现类似于TensorFlow正则化的方法。

3.1 自定义正则化Regularization类

这里封装成一个实现正则化的Regularization类，各个方法都给出了注释，自己慢慢看吧，有问题再留言吧。

# 检查GPU是否可用device = torch.device("cuda" if torch.cuda.is_available() else "cpu")# device='cuda'print("-----device:{}".format(device))print("-----Pytorch version:{}".format(torch.__version__))class Regularization(torch.nn.Module):    def __init__(self,model,weight_decay,p=2):        '''        :param model 模型        :param weight_decay:正则化参数        :param p: 范数计算中的幂指数值，默认求2范数,                  当p=0为L2正则化,p=1为L1正则化        '''        super(Regularization, self).__init__()        if weight_decay <= 0:            print("param weight_decay can not <=0")            exit(0)        self.model=model        self.weight_decay=weight_decay        self.p=p        self.weight_list=self.get_weight(model)        self.weight_info(self.weight_list)    def to(self,device):        '''        指定运行模式        :param device: cude or cpu        :return:        '''        self.device=device        super().to(device)        return self    def forward(self, model):        self.weight_list=self.get_weight(model)#获得最新的权重        reg_loss = self.regularization_loss(self.weight_list, self.weight_decay, p=self.p)        return reg_loss    def get_weight(self,model):        '''        获得模型的权重列表        :param model:        :return:        '''        weight_list = []        for name, param in model.named_parameters():            if 'weight' in name:                weight = (name, param)                weight_list.append(weight)        return weight_list    def regularization_loss(self,weight_list, weight_decay, p=2):        '''        计算张量范数        :param weight_list:        :param p: 范数计算中的幂指数值，默认求2范数        :param weight_decay:        :return:        '''        # weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True)        # reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True)        # weight_decay=torch.FloatTensor([weight_decay]).to(self.device)        # reg_loss=torch.FloatTensor([0.]).to(self.device)        reg_loss=0        for name, w in weight_list:            l2_reg = torch.norm(w, p=p)            reg_loss = reg_loss + l2_reg        reg_loss=weight_decay*reg_loss        return reg_loss    def weight_info(self,weight_list):        '''        打印权重列表信息        :param weight_list:        :return:        '''        print("---------------regularization weight---------------")        for name ,w in weight_list:            print(name)        print("---------------------------------------------------")

3.2 Regularization使用方法

使用方法很简单，就当一个普通Pytorch模块来使用：例如

# 检查GPU是否可用device = torch.device("cuda" if torch.cuda.is_available() else "cpu")print("-----device:{}".format(device))print("-----Pytorch version:{}".format(torch.__version__))weight_decay=100.0 # 正则化参数model = my_net().to(device)# 初始化正则化if weight_decay>0:   reg_loss=Regularization(model, weight_decay, p=2).to(device)else:   print("no regularization")criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropyoptimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定参数weight_decay# trainbatch_train_data=...batch_train_label=...out = model(batch_train_data)# loss and regularizationloss = criterion(input=out, target=batch_train_label)if weight_decay > 0:   loss = loss + reg_loss(model)total_loss = loss.item()# backpropoptimizer.zero_grad()#清除当前所有的累积梯度total_loss.backward()optimizer.step()

训练时输出的 loss和Accuracy信息：

（1）当weight_decay=0.0时，未使用正则化

step/epoch:0/0,Train Loss: 2.379627, Acc: [0.09375]step/epoch:10/0,Train Loss: 1.473092, Acc: [0.6875]step/epoch:20/0,Train Loss: 0.931847, Acc: [0.8125]step/epoch:30/0,Train Loss: 0.625494, Acc: [0.875]step/epoch:40/0,Train Loss: 2.241885, Acc: [0.53125]step/epoch:50/0,Train Loss: 1.132131, Acc: [0.6875]step/epoch:60/0,Train Loss: 0.493038, Acc: [0.8125]step/epoch:70/0,Train Loss: 0.819410, Acc: [0.78125]step/epoch:80/0,Train Loss: 0.996497, Acc: [0.71875]step/epoch:90/0,Train Loss: 0.474205, Acc: [0.8125]step/epoch:100/0,Train Loss: 0.744587, Acc: [0.8125]step/epoch:110/0,Train Loss: 0.502217, Acc: [0.78125]step/epoch:120/0,Train Loss: 0.531865, Acc: [0.8125]step/epoch:130/0,Train Loss: 1.016807, Acc: [0.875]step/epoch:140/0,Train Loss: 0.411701, Acc: [0.84375]

（2）当weight_decay=10.0时，使用正则化

---------------------------------------------------step/epoch:0/0,Train Loss: 1563.402832, Acc: [0.09375]step/epoch:10/0,Train Loss: 1530.002686, Acc: [0.53125]step/epoch:20/0,Train Loss: 1495.115234, Acc: [0.71875]step/epoch:30/0,Train Loss: 1461.114136, Acc: [0.78125]step/epoch:40/0,Train Loss: 1427.868164, Acc: [0.6875]step/epoch:50/0,Train Loss: 1395.430054, Acc: [0.6875]step/epoch:60/0,Train Loss: 1363.358154, Acc: [0.5625]step/epoch:70/0,Train Loss: 1331.439697, Acc: [0.75]step/epoch:80/0,Train Loss: 1301.334106, Acc: [0.625]step/epoch:90/0,Train Loss: 1271.505005, Acc: [0.6875]step/epoch:100/0,Train Loss: 1242.488647, Acc: [0.75]step/epoch:110/0,Train Loss: 1214.184204, Acc: [0.59375]step/epoch:120/0,Train Loss: 1186.174561, Acc: [0.71875]step/epoch:130/0,Train Loss: 1159.148438, Acc: [0.78125]step/epoch:140/0,Train Loss: 1133.020020, Acc: [0.65625]

（3）当weight_decay=10000.0时，使用正则化

step/epoch:0/0,Train Loss: 1570211.500000, Acc: [0.09375]step/epoch:10/0,Train Loss: 1522952.125000, Acc: [0.3125]step/epoch:20/0,Train Loss: 1486256.125000, Acc: [0.125]step/epoch:30/0,Train Loss: 1451671.500000, Acc: [0.25]step/epoch:40/0,Train Loss: 1418959.750000, Acc: [0.15625]step/epoch:50/0,Train Loss: 1387154.000000, Acc: [0.125]step/epoch:60/0,Train Loss: 1355917.500000, Acc: [0.125]step/epoch:70/0,Train Loss: 1325379.500000, Acc: [0.125]step/epoch:80/0,Train Loss: 1295454.125000, Acc: [0.3125]step/epoch:90/0,Train Loss: 1266115.375000, Acc: [0.15625]step/epoch:100/0,Train Loss: 1237341.000000, Acc: [0.0625]step/epoch:110/0,Train Loss: 1209186.500000, Acc: [0.125]step/epoch:120/0,Train Loss: 1181584.250000, Acc: [0.125]step/epoch:130/0,Train Loss: 1154600.125000, Acc: [0.1875]step/epoch:140/0,Train Loss: 1128239.875000, Acc: [0.125]

对比torch.optim优化器的实现L2正则化方法，这种Regularization类的方法也同样达到正则化的效果，并且与TensorFlow类似，loss把正则化的损失也计算了。

此外更改参数p，如当p=0表示L2正则化，p=1表示L1正则化。

4.Github项目源码下载

《Github项目源码》https://github.com/PanJinquan/pytorch-learning-tutorials/blob/master/image_classification/train_resNet.py

麻烦给个“Star”~~

技术的道路一个人走着极为艰难？

一身的本领得不施展？

优质的文章得不到曝光？

别担心，

即刻起，CSDN 将为你带来创新创造创变展现的大舞台，

扫描下方二维码，欢迎加入 CSDN 「原力计划」！

（*本文为AI科技大本营转载文章，转载请联系原作者）

◆

精彩公开课

◆

推荐阅读

GitHub宝藏项目标星1.6w+，编程新手有福了
芬兰开放“线上AI速成班”课程，全球网民均可免费观看
英特尔首推异构编程神器 oneAPI，可让程序员少加班！
该如何缓解网卡的普遍问题？
英特尔首推异构编程神器 oneAPI，可让程序员少加班！
马云穿布鞋演讲，任正非打的出行，盘点科技大佬们令人发指的节俭生活
行！人工智能玩大了！程序员：太牛！你怎么看？
2019 区块链大事记 | Libra 横空出世，莱特币减产，美国放行 Bakkt……这一年太精彩！
谁是蒋涛？
你点的每个“在看”，我都认真当成了AI

PyTorch实现L2和L1正则化的方法 | CSDN博文精选相关推荐

pytorch实现L2和L1正则化regularization的方法
pytorch实现L2和L1正则化的方法目录目录 pytorch实现L2和L1正则化的方法 1.torch.optim优化器实现L2正则化 2. 如何判断正则化作用了模型? 2.1 未加入正则化l ...
改善AI性别偏见的4种方法 | CSDN博文精选
加入「公开课」交流群,获取更多学习资料.课程及热招岗位等信息作者 | Josh Feast 翻译 | 王子龙校对 | 王琦本文阐述导致AI偏见的原因并提出应用的解决方案. 任何对AI偏见的审查都 ...
PyTorch踩过的12坑 | CSDN博文精选
作者 | hyk_1996 来源 | CSDN博客 1. nn.Module.cuda() 和 Tensor.cuda() 的作用效果差异无论是对于模型还是数据,cuda()函数都能实现从CPU到G ...
基于深度学习的低光照图像增强方法总结（2017-2019）| CSDN博文精选
扫码参与CSDN"原力计划" 作者 | hyk_1996 来源 | CSDN博客精选之前在做光照对于高层视觉任务的影响的相关工作,看了不少基于深度学习的低光照增强(low-lig ...
如何提高模型性能？这四大方法值得尝试 | CSDN 博文精选
作者 | BoCong-Deng 编辑 | 屠敏封图 | 自东方 IC 出品 | CSDN 博客写在前面在我们进行模型训练时,如果你只是想要让模型具有不错的性能,那么盲目地尝试网络架构足以达到目 ...
如何提高模型性能？这几个方法值得尝试 | CSDN 博文精选
2020-07-25 14:36:03 作者 | BoCong-Deng 责编 | 屠敏出品 | CSDN 博客封图 | CSDN 付费下载自东方 IC 写在前面在我们进行模型训练时,如果你只是 ...
L1正则化与L2正则化详解
L1.L2正则化什么是正则化? L1.L2正则化公式正则化的作用为什么加入L1正则化的模型会更稀疏 1.梯度视角 2.解空间形状视角为何黄色区域为解空间? 3.函数叠加视角为何L1正则化可将 ...
【机器学习】L1正则化与L2正则化详解及解决过拟合的方法
在详细介绍L1与L2之前,先讲讲正则化的应用场景. 正则化方法:防止过拟合,提高泛化能力所谓过拟合(over-fitting)其实就是所建的机器学习模型或者是深度学习模型在训练样本中表现得过于优越, ...
机器学习—正则化方法—L1正则化、L2正则化
1.L1正则化(稀疏矩阵) 权值向量w中各个元素的绝对值之和: 2.L2正则化(权重衰减) 权值向量w中各个元素的平方和: 3.L1正则化 VS L2正则化 L1正则化可以产生稀疏权值矩阵,即产生一个 ...

PyTorch实现L2和L1正则化的方法 | CSDN博文精选

GitHub宝藏项目标星1.6w+，编程新手有福了

芬兰开放“线上AI速成班”课程，全球网民均可免费观看

英特尔首推异构编程神器 oneAPI，可让程序员少加班！

该如何缓解网卡的普遍问题？

英特尔首推异构编程神器 oneAPI，可让程序员少加班！

马云穿布鞋演讲，任正非打的出行，盘点科技大佬们令人发指的节俭生活

行！人工智能玩大了！程序员：太牛！你怎么看？

2019 区块链大事记 | Libra 横空出世，莱特币减产，美国放行 Bakkt……这一年太精彩！

谁是蒋涛？

PyTorch实现L2和L1正则化的方法 | CSDN博文精选相关推荐

最新文章

热门文章