最近一直在做一些语义分割相关的项目,找损失函数的时候发现网上这些大佬的写得各有千秋,也没说怎么用,在此记录一下自己在训练过程中使用损失函数的一些心得.本人是使用的Pytorch框架,故这一系列都会基于Pytorch来实现。

首先是交叉熵损失函数,语义分割其实是一个逐像素分类的一个分类问题,做过图像分类的应该都比较熟悉交叉熵损失函数。

pytorch中自带有写好的交叉熵函数,只需要调用就行:

loss_func = nn.CrossEntropyLoss()

torch.nn模块中写好的损失函数都是以类的方式写的,只需要提前声明一下后面即可调用。

pytorch中交叉熵损失函数的实现:

class CrossEntropyLoss(_WeightedLoss):r"""This criterion computes the cross entropy loss between input and target.It is useful when training a classification problem with `C` classes.If provided, the optional argument :attr:`weight` should be a 1D `Tensor`assigning weight to each of the classes.This is particularly useful when you have an unbalanced training set.The `input` is expected to contain raw, unnormalized scores for each class.`input` has to be a Tensor of size either :math:`(minibatch, C)` or:math:`(minibatch, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1` for the`K`-dimensional case. The latter is useful for higher dimension inputs, suchas computing cross entropy loss per-pixel for 2D images.The `target` that this criterion expects should contain either:- Class indices in the range :math:`[0, C-1]` where :math:`C` is the number of classes; if`ignore_index` is specified, this loss also accepts this class index (this indexmay not necessarily be in the class range). The unreduced (i.e. with :attr:`reduction`set to ``'none'``) loss for this case can be described as:.. math::\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quadl_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})}\cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}where :math:`x` is the input, :math:`y` is the target, :math:`w` is the weight,:math:`C` is the number of classes, and :math:`N` spans the minibatch dimension as well as:math:`d_1, ..., d_k` for the `K`-dimensional case. If:attr:`reduction` is not ``'none'`` (default ``'mean'``), then.. math::\ell(x, y) = \begin{cases}\sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, &\text{if reduction} = \text{`mean';}\\\sum_{n=1}^N l_n,  &\text{if reduction} = \text{`sum'.}\end{cases}Note that this case is equivalent to the combination of :class:`~torch.nn.LogSoftmax` and:class:`~torch.nn.NLLLoss`.- Probabilities for each class; useful when labels beyond a single class per minibatch itemare required, such as for blended labels, label smoothing, etc. The unreduced (i.e. with:attr:`reduction` set to ``'none'``) loss for this case can be described as:.. math::\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quadl_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\exp(\sum_{i=1}^C x_{n,i})} y_{n,c}where :math:`x` is the input, :math:`y` is the target, :math:`w` is the weight,:math:`C` is the number of classes, and :math:`N` spans the minibatch dimension as well as:math:`d_1, ..., d_k` for the `K`-dimensional case. If:attr:`reduction` is not ``'none'`` (default ``'mean'``), then.. math::\ell(x, y) = \begin{cases}\frac{\sum_{n=1}^N l_n}{N}, &\text{if reduction} = \text{`mean';}\\\sum_{n=1}^N l_n,  &\text{if reduction} = \text{`sum'.}\end{cases}.. note::The performance of this criterion is generally better when `target` contains classindices, as this allows for optimized computation. Consider providing `target` asclass probabilities only when a single class label per minibatch item is too restrictive.Args:weight (Tensor, optional): a manual rescaling weight given to each class.If given, has to be a Tensor of size `C`size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,the losses are averaged over each loss element in the batch. Note that forsome losses, there are multiple elements per sample. If the field :attr:`size_average`is set to ``False``, the losses are instead summed for each minibatch. Ignoredwhen :attr:`reduce` is ``False``. Default: ``True``ignore_index (int, optional): Specifies a target value that is ignoredand does not contribute to the input gradient. When :attr:`size_average` is``True``, the loss is averaged over non-ignored targets. Note that:attr:`ignore_index` is only applicable when the target contains class indices.reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, thelosses are averaged or summed over observations for each minibatch dependingon :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss perbatch element instead and ignores :attr:`size_average`. Default: ``True``reduction (string, optional): Specifies the reduction to apply to the output:``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction willbe applied, ``'mean'``: the weighted mean of the output is taken,``'sum'``: the output will be summed. Note: :attr:`size_average`and :attr:`reduce` are in the process of being deprecated, and inthe meantime, specifying either of those two args will override:attr:`reduction`. Default: ``'mean'``label_smoothing (float, optional): A float in [0.0, 1.0]. Specifies the amountof smoothing when computing the loss, where 0.0 means no smoothing. The targetsbecome a mixture of the original ground truth and a uniform distribution as described in`Rethinking the Inception Architecture for Computer Vision <https://arxiv.org/abs/1512.00567>`__. Default: :math:`0.0`.Shape:- Input: :math:`(N, C)` where `C = number of classes`, or:math:`(N, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`in the case of `K`-dimensional loss.- Target: If containing class indices, shape :math:`(N)` where each value is:math:`0 \leq \text{targets}[i] \leq C-1`, or :math:`(N, d_1, d_2, ..., d_K)` with:math:`K \geq 1` in the case of K-dimensional loss. If containing class probabilities,same shape as the input.- Output: If :attr:`reduction` is ``'none'``, shape :math:`(N)` or:math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case of K-dimensional loss.Otherwise, scalar.Examples::>>> # Example of target with class indices>>> loss = nn.CrossEntropyLoss()>>> input = torch.randn(3, 5, requires_grad=True)>>> target = torch.empty(3, dtype=torch.long).random_(5)>>> output = loss(input, target)>>> output.backward()>>>>>> # Example of target with class probabilities>>> input = torch.randn(3, 5, requires_grad=True)>>> target = torch.randn(3, 5).softmax(dim=1)>>> output = loss(input, target)>>> output.backward()"""__constants__ = ['ignore_index', 'reduction', 'label_smoothing']ignore_index: intlabel_smoothing: floatdef __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100,reduce=None, reduction: str = 'mean', label_smoothing: float = 0.0) -> None:super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)self.ignore_index = ignore_indexself.label_smoothing = label_smoothingdef forward(self, input: Tensor, target: Tensor) -> Tensor:return F.cross_entropy(input, target, weight=self.weight,ignore_index=self.ignore_index, reduction=self.reduction,label_smoothing=self.label_smoothing)

在声明损失函数的时候可以加上一些参数,其中比较重要的是weight: Optional[Tensor] = None,
weight权重是在计算中给每一类加上相应的计算权重,是一个tensor,长度和类别数一致;以及label_smoothing,label_smoothing方法在交叉熵损失函数中自带有,假设label_smoothing = 0.1的话,在二分类问题中就会将类别为1的变为0.9,类别为0的变为0.1,这样做能够让损失更加平滑,更容易收敛,避免错误分类带来的过大的损失。

在使用也就是计算loss值的时候需要两个参数,一个是input,一个是target,两个都是tensor,input是你模型的预测结果,target是真实标注。例如:

import torch
import torch.nn as nnloss_func = nn.CrossEntropyLoss()input = torch.randn(3, 5, requires_grad=True)target = torch.empty(3, dtype=torch.long).random_(5)
output = loss_func(input, target)
print(output)input: tensor([[ 1.6738,  0.0526,  0.6329, -0.8809,  1.4822],[-0.5908,  1.5717,  1.3402,  0.4227, -0.3498],[-0.3359, -2.3797, -1.6206, -2.3070,  0.6010]], requires_grad=True)
target: tensor([3, 4, 1])
loss: tensor(3.2306, grad_fn=<NllLossBackward0>)

上面这就类似与一个五分类的问题,input是模型最后的全连接层的输出,或者是全卷积网络最后的输出。

这里注意:

加入你把分类输出的概率做了argmax操作以后,计算会出错,例如:

import torch
import torch.nn as nnloss_func = nn.CrossEntropyLoss()# input = torch.randn(3, 5, requires_grad=True)
input = torch.randn(3,requires_grad=True)
print("input:",input)
target = torch.empty(3, dtype=torch.long).random_(5)
print("target:",target)
output = loss_func(input, target)
print("loss:",output)input: tensor([-0.3463,  1.2289,  0.2517], requires_grad=True)
target: tensor([3, 4, 3])
Traceback (most recent call last):File "/home/lwf/Project/MRI-Segmentation/tets.py", line 19, in <module>output = loss_func(input, target)File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_implreturn forward_call(*input, **kwargs)File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1152, in forwardlabel_smoothing=self.label_smoothing)File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/functional.py", line 2846, in cross_entropyreturn torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected floating point type for target with class probabilities, got Long

也就是说,input的输出必须是每个类别的概率。

语义分割损失函数系列(1):交叉熵损失函数相关推荐

  1. 【交叉熵损失函数】关于交叉熵损失函数的一些理解

    目录 0. 前言 1.损失函数(Loss Function) 1.1 损失项 1.2 正则化项 2. 交叉熵损失函数 2.1 softmax 2.2 交叉熵 0. 前言 有段时间没写博客了,前段时间主 ...

  2. LESSON 10.110.210.3 SSE与二分类交叉熵损失函数二分类交叉熵损失函数的pytorch实现多分类交叉熵损失函数

    在之前的课程中,我们已经完成了从0建立深层神经网络,并完成正向传播的全过程.本节课开始,我们将以分类深层神经网络为例,为大家展示神经网络的学习和训练过程.在介绍PyTorch的基本工具AutoGrad ...

  3. 交叉熵损失函数分类_交叉熵损失函数

    我们先从逻辑回归的角度推导一下交叉熵(cross entropy)损失函数. 从逻辑回归到交叉熵损失函数 这部分参考自 cs229-note1 part2. 为了根据给定的 预测 (0或1),令假设函 ...

  4. 交叉熵损失函数优缺点_交叉熵损失函数

    交叉熵代价函数(Cross-entropy cost function)是用来衡量人工神经网络(ANN)的预测值与实际值的一种方式.与二次代价函数相比,它能更有效地促进ANN的训练.在介绍交叉熵代价函 ...

  5. [TensorFlow] 交叉熵损失函数,加权交叉熵损失函数

    写在前面 在文章[TensorFlow] argmax, softmax_cross_entropy_with_logits, sparse_softmax_cross_entropy_with_lo ...

  6. 交叉熵损失函数优缺点_交叉熵损失函数的优点(转载)

    第一篇: 利用一些饱和激活函数的如sigmoid激活时,假如利用均方误差损失,那么损失函数向最后一层的权重传递梯度时,梯度公式为 可见梯度与最后一层的激活函数的导数成正比,因此,如果起始输出值比较大, ...

  7. 交叉熵损失函数公式_交叉熵损失函数对其参数求导

    1.Sigmoid 二分类交叉熵 交叉熵公式: 其中y是laebl:0 或1. hθ(xi)是经过sigmoid得到的预测概率.θ为网络的参数, m为样本数. hθ()函数如下所示, J(θ) 对参数 ...

  8. 【Pytorch】交叉熵损失函数 CrossEntropyLoss() 详解

    文章目录 一.损失函数 nn.CrossEntropyLoss() 二.什么是交叉熵 三.Pytorch 中的 CrossEntropyLoss() 函数 参考链接 一.损失函数 nn.CrossEn ...

  9. 二分类交叉熵损失函数python_二分类问题的交叉熵损失函数多分类的问题的函数交叉熵损失函数求解...

    二分类问题的交叉熵损失函数; 在二分类问题中,损失函数为交叉熵损失函数.对于样本(x,y)来讲,x为样本 y为对应的标签.在二分类问题中,其取值的集合可能为{0,1},我们假设某个样本的真实标签为yt ...

最新文章

  1. 基于STM32对于三轴机械臂控制器设计
  2. 有赞搜索引擎实践(算法篇)
  3. vue 后台翻译_vue translate peoject实现在线翻译功能【新手必看】
  4. linux显示隐藏分区,找到了linux分区顺序错乱修复方法
  5. SOA实现方式与模式
  6. 安装vs2017出现闪退现象_Adobe Reader 闪退
  7. Asp.net 面向接口可扩展框架之类型转化基础服务
  8. Windows 8 中取消的功能特性
  9. 03 掌握C#控制台程序的结构组成 1214
  10. 聊聊Vue(前端Vue面试包过)【面试干货】
  11. 《Oracle从入门到精通》读书笔记第四章 SQL语言基础之二
  12. settings.xml的配置
  13. 机器学习:用梯度下降法实现线性回归
  14. python restful api_用Python语言写一个restful API
  15. android SVG的主要属性
  16. sublime 安装 sql 格式化插件
  17. Python大数据处理方案
  18. Pytorch学习笔记(7)——模型放GPU上经常报CUDA错该怎么办
  19. winform html5 ui,C# WinForm UI 设计方法
  20. 非常值得收藏的书签栏,程序员学习与设计相关的网站(有附件下载)

热门文章

  1. 关于数据库插入中文乱码问题
  2. 英伟达官方免费课程!学用皮克斯USD框架,在主流3D仿真和协同应用中大显身手!...
  3. 节省显存新思路,在PyTorch里使用2 bit激活压缩训练神经网络
  4. 我们做了一个医疗版MNIST数据集,发现常见AutoML算法没那么好用
  5. Word Embedding List|ACL 2020 词嵌入长文汇总及分类
  6. Byte Cup 2018国际机器学习竞赛夺冠记
  7. ECCV 2018论文解读 | DeepVS:基于深度学习的视频显著性方法
  8. NIPS 2017论文解读 | 基于对比学习的Image Captioning
  9. 如何在Python中删除字符串中的所有反斜杠?
  10. micropython中文社区 socket通讯_基于MicroPython结合ESP8266模块实现TCP通信(AT指令版)...