《原始论文：CosFace: Large Margin Cosine Loss for Deep Face Recognition》
《原始论文：Additive Margin Softmax for Face Verification》

第二篇论文官方代码：GitHub：https://github.com/happynear/AMSoftmax

关于SphereFace的优化，同期出了两篇文章：

一个是Tencent AI Lab发表在cvpr2018的 Large Margin Cosine Loss（CosFace）；
另外一个是happynear大神提出的AM-Softmax Loss。

二者核心原理一致，只是在论文写作中的其他侧重点有所不同。

一、Large Margin Cosine Loss（LMCL）

SphereFace中只对 WWW 进行归一化，CosFace中对 WWW 及 XXX 均进行了归一化，不过为了使得训练能收敛，增加了一个参数 s=30s=30s=30，

1、Softmax Loss

首先从余弦的角度重新思考softmax损失。

softmax损失通过最大化地面真相类的后验概率将特征与不同类分开。给定一个输入特征向量 xix_ixi 及其对应的标签 yiy_iyi，softmax损失可以表述为:

其中：

pip_ipi 表示被正确分类的后验概率。
NNN 是训练样本的数量，
CCC：类别的数量。
fjf_jfj：通常表示为一个完全连接层的激活，该层具有权重向量 WjW_jWj 和偏差 BjB_jBj。为了简单起见，修正了偏差 Bj=0B_j=0Bj=0，结果 fjf_jfj由下式给出：fj=WjTx=∣∣Wj∣∣⋅∣∣x∣∣⋅cosθf_j = W_j^Tx=∣∣W_j ∣∣·∣∣x∣∣·cosθfj=WjTx=∣∣Wj∣∣⋅∣∣x∣∣⋅cosθ 。其中 θjθ_jθj 是 WjW_jWj 和 xxx 之间的角度。这个公式表明向量的范数和角度都对后验概率有贡献。

2、Large Margin Softmax Loss

缩小每个类别的可行角度（Feasible angle）范围，在这些类别之间产生角度Magin。
Li=−log⁡(e∥wyi∥∥xi∥ψ(θyi)e∥wyi∥∥xi∥ψ(θyi)+∑je∥wj≠yi∥∥xi∥cos⁡(θj))(4)L_i = - \log \left (\cfrac{e^{\lVert \mathbf{w}_{y_i} \rVert \lVert \mathbf{x}_i \rVert \psi (\theta_{y_i})}}{e^{\lVert \mathbf{w}_{y_i} \rVert \lVert \mathbf{x}_i \rVert \psi (\theta_{y_i})} + \sum_{j} e^{\lVert \mathbf{w}_{j \neq y_i} \rVert \lVert \mathbf{x}_i \rVert \cos(\theta_j)}} \right) \tag4 Li=−log(e∥wyi∥∥xi∥ψ(θyi)+∑je∥wj=yi∥∥xi∥cos(θj)e∥wyi∥∥xi∥ψ(θyi))(4)

3、SphereFace（Angular Softmax Loss）

SphereFace的优化：为了开发有效的特征学习，WWW 的范数必须是不变的。为此，通过 L2L2L2 归一化来固定 ∣∣Wj∣∣=1||W_j|| = 1∣∣Wj∣∣=1。
Lang=1N∑i−log⁡(e∥xi∥ψ(θyi,i)e∥xi∥ψ(θyi,i)+∑j≠yie∥xi∥cos⁡(θj,i)){{L}_{\text{ang}}}=\cfrac{1}{N}\sum\limits_{i}{-\log (\cfrac{{{e}^{\left\| {{x}_{i}} \right\|\psi ({{\theta }_{y_i,i}})}}}{{{e}^{\left\| {{x}_{i}} \right\|\psi ({{\theta }_{y_i,i}})}}+\sum\nolimits_{j\ne y_i}{{{e}^{\left\| {{x}_{i}} \right\|\cos ({{\theta }_{j}},i)}}}}})Lang=N1i∑−log(e∥xi∥ψ(θyi,i)+∑j=yie∥xi∥cos(θj,i)e∥xi∥ψ(θyi,i))

4、Normalized Softmax Loss（NSL）

CosFace的进一步优化：在测试阶段，通常根据两个特征向量之间的余弦相似度计算测试人脸对的人脸识别分数。这表明特征向量 xxx 的范数对评分函数没有贡献。

因此，在训练阶段，固定 ∣∣x∣∣=s||x|| = s∣∣x∣∣=s。因此，后验概率仅依赖于角度的余弦。

修正后的损失可以表示为：

通过固定 ∣∣x∣∣=s||x|| = s∣∣x∣∣=s 消除了径向方向的变化，所以生成的模型学习了在角空间中可分离的特征。在CosFace论文中，将此损失称为 Softmax 损失的归一化版本/Normalized version of Softmax Loss（NSL）。

Normalized version of Softmax Loss（NSL）：

通过 L2L2L2 归一化来固定 ∣∣Wj∣∣=1||W_j|| = 1∣∣Wj∣∣=1；
固定 ∣∣x∣∣=s||x|| = s∣∣x∣∣=s

5、Large Margin Cosine Loss（CosFace）

NSL 学习到的特征并没有足够的判别力，因为 NSL 只强调正确的分类。为了解决这个问题，将余弦余量引入分类边界，自然而然地并入了 Softmax 的余弦公式。

例如考虑二元类的场景，让 θiθ_iθi 表示学习的特征向量和类 Ci(i=1,2)C_i (i = 1, 2)Ci(i=1,2) 的权重向量之间的角度。 NSL 强制 C1C_1C1 的 cos(θ1)>cos(θ2)cos(θ_1) > cos(θ_2)cos(θ1)>cos(θ2)，对于 C2C_2C2 也是如此，因此来自不同类别的特征被正确分类。

为了开发一个大间隔分类器，进一步要求 cos(θ1)−m>cos(θ2)cos(θ_1) -m > cos(θ_2)cos(θ1)−m>cos(θ2) 和 cos(θ2)−m>cos(θ1)cos(θ_2) -m > cos(θ_1)cos(θ2)−m>cos(θ1)，其中 m≥0m ≥ 0m≥0 是一个固定参数，用于控制余弦的大小收益。由于 cos(θi)−mcos(θ_i) -mcos(θi)−m 低于 cos(θi)cos(θ_i)cos(θi)，因此对分类的约束更加严格。

上述分析可以很好地推广到多类的场景。因此，改变后的损失通过增强余弦空间中的额外余量来加强对学习特征的区分。

最终Large Margin Cosine Loss (LMCL) 如下：
LLMC=−1N∑i=1Nlog⁡es⋅(WyiTfi−m)es⋅(WyiTfi−m)+∑j=1,j≠yicesWjTfi=−1N∑i=1Nlog⁡es⋅[cos⁡(θyi,i)−m]es⋅[cos⁡(θyi,i)−m]+∑j=1,j≠yices⋅cos⁡(θj,i)(4)\begin{aligned}{{L}_{LMC}}&=-\frac{1}{N}\sum\limits_{i=1}^{N}{\log \frac{{{e}^{s· (W_{yi}^{T}{{f}_{i}}-m)}}}{{{e}^{s· (W_{yi}^{T}{{f}_{i}}-m)}}+\sum\nolimits_{j=1,j\ne yi}^{c}{{{e}^{sW_{j}^{T}{{f}_{i}}}}}}}\\[4ex] &=-\frac{1}{N}\sum\limits_{i=1}^{N}{\log \frac{{{e}^{s· [\cos ({{\theta }_{y_i},i})-m]}}}{{{e}^{s·[\cos ({{\theta }_{y_i},i})-m]}}+\sum\nolimits_{j=1,j\ne yi}^{c}{{{e}^{s· \cos {{(\theta }_{j},i)}}}}}} \tag4 \end{aligned}LLMC=−N1i=1∑Nloges⋅(WyiTfi−m)+∑j=1,j=yicesWjTfies⋅(WyiTfi−m)=−N1i=1∑Nloges⋅[cos(θyi,i)−m]+∑j=1,j=yices⋅cos(θj,i)es⋅[cos(θyi,i)−m](4)

约束条件有：

W=W∗∣∣W∗∣∣W=\cfrac{W^*}{||W^*||}W=∣∣W∗∣∣W∗
x=x∗∣∣x∗∣∣x=\cfrac{x^*}{||x^*||}x=∣∣x∗∣∣x∗
cos(θj,i)=WjTxicos(θ_j,i)=W_j^Tx_icos(θj,i)=WjTxi

其中：

NNN：训练样本数,
xix_ixi：与yiy_iyi 的 ground-truth类对应的第 iii 个特征向量；
WjW_jWj：WjW_jWj 是第 jjj 类的权重向量；
θjθ_jθj： WjW_jWj 与 xix_ixi的夹角；

二、Softmax、NSL、L-Softmax、A-Softmax、LMCL对比

将LMCL的决策边界与Softmax、NSL和A-Softmax进行比较，如下图所示。

为了简化分析，考虑具有类C1和C2的二进制类场景。设W1和W2分别表示C1和C2的权重向量。

1、Softmax

Softmax损耗定义决策边界：∣∣W1∣∣cos(θ1)=∣∣W2∣∣cos(θ2)||W_1||cos(\theta_1)=||W_2||cos(\theta_2)∣∣W1∣∣cos(θ1)=∣∣W2∣∣cos(θ2)

其边界取决于权重向量的大小和角度余弦，这导致余弦空间中的重叠决策区域(边界< 0) 。

在测试阶段，通常只考虑测试人脸特征向量之间的余弦相似度。因此，具有Softmax损失的训练分类器无法对余弦空间中的测试样本进行完美分类。

2、Normalized Softmax Loss（NSL）

NSL对权重向量 W1W_1W1 和 W2W_2W2 进行归一化，使得它们具有恒定的幅度1，这导致由下式给出的判定边界：cos(θ1)=cos(θ2)cos(\theta_1)=cos(\theta_2)cos(θ1)=cos(θ2)

NSL 的决策边界如上图的第二个子图所示通过去除径向变化，NSL 能够在余弦空间中完美地对测试样本进行分类，margin = 0。

但是，它不是很鲁棒噪声，因为没有决策余量：决策边界周围的任何小扰动都可能改变决策。

3、Angular Softmax Loss（A-Softmax）/ SphereFace

Angular Softmax Loss 通过引入额外的边距来改进 softmax 损失，使得其决策边界由下式给出：
c1:cos(mθ1)≥cos(θ2)c2:cos(mθ2)≥cos(θ1)c_1:cos(mθ_1)≥cos(θ_2)\\ c_2:cos(mθ_2)≥cos(θ_1)c1:cos(mθ1)≥cos(θ2)c2:cos(mθ2)≥cos(θ1)

对于 C1C_1C1，需要 θ1≤θ2/mθ_1≤θ_2/mθ1≤θ2/m，并且类似地对于C2。

上图的第三个子图描绘了该决策区域，其中灰色区域表示决策余量。

然而，A-Softmax的余量在所有θ值上不一致：

余量随着 θθθ 的减小而变小，
当 θ=0θ = 0θ=0 时余量完全消失。

这导致了两个潜在的问题：

首先，对于视觉上相似并因此在 W1W_1W1 和 W2W_2W2 之间具有较小角度的不同类别 C1C_1C1 和 C2C_2C2，边界因此较小；
第二，从技术上讲，必须用一个特殊的分段函数来克服余弦函数的非单调性困难；

LMCL通过以下方式定义余弦空间而非角度空间（如a-Softmax）中的决策边界：

4、Large Margin Cosine Loss（CosFace）

LMCL通过以下方式定义余弦空间而非角度空间（如a-Softmax）中的决策边界：

c1:cos(mθ1)≥cos(θ2)+mc2:cos(mθ2)≥cos(θ1)+mc_1:cos(mθ_1)≥cos(θ_2)+m\\ c_2:cos(mθ_2)≥cos(θ_1)+mc1:cos(mθ1)≥cos(θ2)+mc2:cos(mθ2)≥cos(θ1)+m

对于 C1C_1C1（类似于C2C_2C2），cos(θ1)cos(θ_1)cos(θ1) 被最大化，而 cos(θ2)cos(θ_2)cos(θ2) 被最小化，以执行大幅度分类。

上图中的最后一个子图展示了余弦空间中LMCL的决策边界，可以看到一个清晰的边界（2m\sqrt{2}m2m）角的余弦分布。做个比方：同桌化三八线，而这个三八线不仅仅是用一支笔画的一条线，而是用两支笔画了一个楚河汉界。

这表明LMCL比NSL更稳健，因为决策边界（虚线）周围的小扰动不太可能导致错误决策。

余弦边界一致地应用于所有样本，无论其权重向量的角度如何。

三、CosFace代码（Pytorch）

最终Large Margin Cosine Loss (LMCL) 如下：
LLMC=−1N∑i=1Nlog⁡es⋅[cos⁡(θyi,i)−m]es⋅[cos⁡(θyi,i)−m]+∑j=1,j≠yices⋅cos⁡(θj,i)(4)\begin{aligned}{{L}_{LMC}}=-\frac{1}{N}\sum\limits_{i=1}^{N}{\log \frac{{{e}^{s· [\cos ({{\theta }_{y_i},i})-m]}}}{{{e}^{s·[\cos ({{\theta }_{y_i},i})-m]}}+\sum\nolimits_{j=1,j\ne yi}^{c}{{{e}^{s· \cos {{(\theta }_{j},i)}}}}}} \tag4 \end{aligned}LLMC=−N1i=1∑Nloges⋅[cos(θyi,i)−m]+∑j=1,j=yices⋅cos(θj,i)es⋅[cos(θyi,i)−m](4)

约束条件有：

W=W∗∣∣W∗∣∣W=\cfrac{W^*}{||W^*||}W=∣∣W∗∣∣W∗
x=x∗∣∣x∗∣∣x=\cfrac{x^*}{||x^*||}x=∣∣x∗∣∣x∗
cos(θj,i)=WjTxicos(θ_j,i)=W_j^Tx_icos(θj,i)=WjTxi

其中：

NNN：训练样本数,
xix_ixi：与yiy_iyi 的 ground-truth类对应的第 iii 个特征向量；
WjW_jWj：WjW_jWj 是第 jjj 类的权重向量；
θjθ_jθj： WjW_jWj 与 xix_ixi的夹角；

1、代码01

https://github.com/MuggleWang/CosFace_pytorch

def cosine_similarity(x1, x2, dim=1, eps=1e-8):ip = torch.mm(x1, x2.t())w1 = torch.norm(x1, 2, dim)w2 = torch.norm(x2, 2, dim)return ip / torch.ger(w1,w2).clamp(min=eps)class MarginCosineProduct(nn.Module):r"""Implement of large margin cosine distance: :Args:in_features: size of each input sampleout_features: size of each output samples: norm of input featurem: margin"""def __init__(self, in_features, out_features, s=30.0, m=0.40):super(MarginCosineProduct, self).__init__()self.in_features = in_featuresself.out_features = out_featuresself.s = sself.m = mself.weight = Parameter(torch.Tensor(out_features, in_features))nn.init.xavier_uniform_(self.weight)#stdv = 1. / math.sqrt(self.weight.size(1))#self.weight.data.uniform_(-stdv, stdv)def forward(self, input, label):cosine = cosine_similarity(input, self.weight)# cosine = F.linear(F.normalize(input), F.normalize(self.weight))# --------------------------- convert label to one-hot ---------------------------# https://discuss.pytorch.org/t/convert-int-into-one-hot-format/507one_hot = torch.zeros_like(cosine)one_hot.scatter_(1, label.view(-1, 1), 1.0)# -------------torch.where(out_i = {x_i if condition_i else y_i) -------------output = self.s * (cosine - one_hot * self.m)return outputdef __repr__(self):return self.__class__.__name__ + '(' \+ 'in_features=' + str(self.in_features) \+ ', out_features=' + str(self.out_features) \+ ', s=' + str(self.s) \+ ', m=' + str(self.m) + ')'

2、代码02

class AddMarginProduct(nn.Module):r"""Implement of large margin cosine distance: :Args:in_features: size of each input sampleout_features: size of each output samples: norm of input featurem: margincos(theta) - m"""def __init__(self, in_features, out_features, s=30.0, m=0.40):super(AddMarginProduct, self).__init__()self.in_features = in_featuresself.out_features = out_featuresself.s = sself.m = mself.weight = Parameter(torch.FloatTensor(out_features, in_features))nn.init.xavier_uniform_(self.weight)def forward(self, input, label):# --------------------------- cos(theta) & phi(theta) ---------------------------cosine = F.linear(F.normalize(input), F.normalize(self.weight))phi = cosine - self.m# --------------------------- convert label to one-hot ---------------------------one_hot = torch.zeros(cosine.size(), device='cuda')# one_hot = one_hot.cuda() if cosine.is_cuda else one_hotone_hot.scatter_(1, label.view(-1, 1).long(), 1)# -------------torch.where(out_i = {x_i if condition_i else y_i) -------------output = (one_hot * phi) + ((1.0 - one_hot) * cosine)# you can use torch.where if your torch.__version__ is 0.4output *= self.s# print(output)return outputdef __repr__(self):return self.__class__.__name__ + '(' \+ 'in_features=' + str(self.in_features) \+ ', out_features=' + str(self.out_features) \+ ', s=' + str(self.s) \+ ', m=' + str(self.m) + ')'

直接看forward部分，先把feature和weight⽤l2 norm 归⼀化。然后两个矩阵相乘，就得到cos值了。因为范数为1了。

接下来减去m，然后通过onehot的label，挑选出需要减去m的样本，保留其他位置的cos值，最后乘⼀个s，返回，送⼊交叉墒中就可以了。

四、AM-Softmax Loss代码（Pytorch）

#! /usr/bin/python
# -*- encoding: utf-8 -*-import torch
import torch.nn as nnclass AMSoftmax(nn.Module):def __init__(self, in_feats, n_classes=10, m=0.3, s=15):super(AMSoftmax, self).__init__()self.m = mself.s = sself.in_feats = in_featsself.W = torch.nn.Parameter(torch.randn(in_feats, n_classes), requires_grad=True)self.CE = nn.CrossEntropyLoss()nn.init.xavier_normal_(self.W, gain=1)def forward(self, X, Y):assert X.size()[0] == Y.size()[0]assert X.size()[1] == self.in_feats# 归一化输入Xx_norm = torch.norm(X, p=2, dim=1, keepdim=True).clamp(min=1e-9)print("x_norm.shape = {0}; x_norm = \n{1}".format(x_norm.shape, x_norm))X_norm = torch.div(X, x_norm)print("X_norm.shape = {0}; X_norm = \n{1}".format(X_norm.shape, X_norm))# 归一化权重w_norm = torch.norm(self.W, p=2, dim=0, keepdim=True).clamp(min=1e-9)print("w_norm.shape = {0}; w_norm = \n{1}".format(w_norm.shape, w_norm))W_norm = torch.div(self.W, w_norm)print("W_norm.shape = {0}; W_norm = \n{1}".format(W_norm.shape, W_norm))costh = torch.mm(X_norm, W_norm)  # 矩阵a和b矩阵相乘，比如a的维度是(1, 2)，b的维度是(2, 3)，返回的就是(1, 3)的矩阵print("costh.shape = {0}; costh = \n{1}".format(costh.shape, costh))delt_costh = torch.zeros_like(costh).scatter_(1, Y.unsqueeze(1), self.m)print("delt_costh.shape = {0}; delt_costh = \n{1}".format(delt_costh.shape, delt_costh))costh_m = costh - delt_costhprint("costh_m.shape = {0}; costh_m = \n{1}".format(costh_m.shape, costh_m))costh_m_s = self.s * costh_mprint("costh_m_s.shape = {0}; costh_m_s = \n{1}".format(costh_m_s.shape, costh_m_s))print("Y.shape = {0}; Y = {1}".format(Y.shape, Y))loss = self.CE(costh_m_s, Y)print("loss.shape = {0}; loss = {1}".format(loss.shape, loss))return lossif __name__ == '__main__':criteria = AMSoftmax(in_feats=5, n_classes=10, m=0.3, s=15)  # in_feats=1024, n_classes=10X = torch.randn(2, 5)  # 4个样本，样本维度为5print("X.shape = {0}; X = \n{1}".format(X.shape, X))Y = torch.randint(0, 10, (2,), dtype=torch.long)print("Y.shape = {0}; Y = {1}".format(Y.shape, Y))loss = criteria(X, Y)loss.backward()print("\nloss.detach().numpy() = ", loss.detach().numpy())print("list(criteria.parameters())[0].shape = ", list(criteria.parameters())[0].shape)print("type(next(criteria.parameters())) = ", type(next(criteria.parameters())))

打印结果：

X.shape = torch.Size([2, 5]); X =
tensor([[-1.7055, -1.2495, -0.2749,  0.8545,  1.0863],[ 0.2626, -0.3556, -1.4220,  1.0304, -0.2475]])
Y.shape = torch.Size([2]); Y = tensor([9, 4])x_norm.shape = torch.Size([2, 1]); x_norm =
tensor([[2.5408],[1.8277]])
X_norm.shape = torch.Size([2, 5]); X_norm =
tensor([[-0.6712, -0.4918, -0.1082,  0.3363,  0.4275],[ 0.1437, -0.1945, -0.7780,  0.5638, -0.1354]])w_norm.shape = torch.Size([1, 10]); w_norm =
tensor([[0.5277, 1.1796, 0.5331, 0.7366, 1.0047, 0.8269, 1.2431, 0.9243, 0.7773, 0.7855]], grad_fn=<ClampBackward1>)
W_norm.shape = torch.Size([5, 10]); W_norm =
tensor([[ 0.4359,  0.1281,  0.2986,  0.8875,  0.1474, -0.1804, -0.3535, -0.8059,  0.4047, -0.0034],[ 0.2713, -0.7982,  0.7263, -0.1522, -0.8980, -0.8504, -0.5078, -0.2462,  0.5520,  0.3242],[-0.7210,  0.4755, -0.4692, -0.3086,  0.3572, -0.1814, -0.3292, -0.2817,  -0.3240,  0.2402],[-0.4627,  0.0432,  0.3796, -0.0922,  0.1477,  0.0826, -0.7045, -0.0433,  0.5850, -0.8888],[ 0.0499, -0.3441,  0.1384,  0.2923, -0.1500,  0.4522,  0.1117, -0.4569,  0.2905,  0.2172]], grad_fn=<DivBackward0>)costh.shape = torch.Size([2, 10]); costh =
tensor([[-0.4823,  0.1225, -0.3200, -0.3935,  0.2896,  0.7800,  0.3334,  0.4826,  -0.1871, -0.3892],[ 0.3032, -0.1254,  0.4619,  0.3057,  0.0215,  0.2660, -0.1082,  0.1887,  0.4933, -0.7809]], grad_fn=<MmBackward0>)delt_costh.shape = torch.Size([2, 10]); delt_costh =
tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,  0.3000],[0.0000, 0.0000, 0.0000, 0.0000, 0.3000, 0.0000, 0.0000, 0.0000, 0.0000,  0.0000]])costh_m.shape = torch.Size([2, 10]); costh_m =
tensor([[-0.4823,  0.1225, -0.3200, -0.3935,  0.2896,  0.7800,  0.3334,  0.4826,  -0.1871, -0.6892],[ 0.3032, -0.1254,  0.4619,  0.3057, -0.2785,  0.2660, -0.1082,  0.1887,  0.4933, -0.7809]], grad_fn=<SubBackward0>)costh_m_s.shape = torch.Size([2, 10]); costh_m_s =
tensor([[ -7.2347,   1.8375,  -4.8000,  -5.9028,   4.3433,  11.7007,   5.0016,   7.2386,  -2.8061, -10.3386],[  4.5480,  -1.8803,   6.9290,   4.5854,  -4.1775,   3.9897,  -1.6225,   2.8306,   7.3994, -11.7139]], grad_fn=<MulBackward0>)
Y.shape = torch.Size([2]); Y = tensor([9, 4])loss.shape = torch.Size([]); loss = 17.104820251464844loss.detach().numpy() =  17.10482
list(criteria.parameters())[0].shape =  torch.Size([5, 10])
type(next(criteria.parameters())) =  <class 'torch.nn.parameter.Parameter'>Process finished with exit code 0

参考资料：
（原）CosFace/AM-Softmax及其mxnet代码
人脸识别论文再回顾之四：cosface
解析人脸识别中cosface和arcface（insightface）的损失函数以及源码
【读点论文】CosFace: Large Margin Cosine Loss for Deep Face Recognition，从损失函数的角度优化人脸识别
GitHub：CoinCheung/pytorch-loss

人脸识别-Loss-2018：Large Margin Cosine Loss（CosFace）【SphereFace只对W归一化，CosFace对W、X都归一化】【在余弦空间中最大化分类界限】相关推荐

人脸识别-Loss-2016：Large Margin Softmax Loss【Margin：角度分类边界之间的空白角度区域】【增大Margin来提高分类精度】【缺陷：无法处理W_i≠W_j的情况】
尽管传统的softmax在卷积网络作为最常用的监督学习组件,但是他不能促进判别性强的特征的学习,在这篇论文里面首先提出一种基于Margin的L-Softmax损失函数,可以明确地促使学习到的特征具有类 ...
CosFace:Large Margin Cosine Loss
<CosFace: Large Margin Cosine Loss for Deep Face Recognition> 2018,Hao Wang et al. Tencent AI ...
人脸识别——脸部属性辅助（Attribute-Centered Loss）
<Attribute-Centered Loss for Soft-Biometrics Guided Face Sketch-Photo Recognition> 2018,Hadi K ...
【人脸识别】arcface详解
论文题目:<ArcFace Additive Angular Margin Loss for Deep Face Recognition > 论文地址:https://arxiv.org/ ...
人脸识别经典论文Arcface解读
来源:投稿作者:小灰灰编辑:学姐研究背景 1.在人脸识别时,我们需要特征的discrimination 2.之前提出到的一些方法,如triplet loss,center loss, L-sof ...
【人脸识别】人脸识别损失函数学习笔记
目录一.SphereFace:A-Softmax(CVPR2017) 1.1 传统的softmax Loss(第二章中有更详细的公式解释): 1.2 sphereface对softmax进行了两点改 ...
人脸识别合集 | 10 ArcFace解析
转自:https://zhuanlan.zhihu.com/p/76541084 ArcFace/InsightFace(弧度)是伦敦帝国理工学院邓建康等在2018.01发表,在SphereFace基 ...
CVPR 2020 Oral | 人脸识别Loss新突破：旷视提出Circle Loss，革新深度特征学习范式...
关注上方"深度学习技术前沿",选择"星标公众号", 资源干货,第一时间送达! 来源:旷视研究院@微信公众号旷视研究院提出用于深度特征学习的Circle Los ...
人脸识别中Softmax-based Loss的演化史
点击我爱计算机视觉标星,更快获取CVML新技术近期,人脸识别研究领域的主要进展之一集中在了 Softmax Loss 的改进之上:在本文中,旷视研究院(上海)(MEGVII Research Sha ...

人脸识别-Loss-2018：Large Margin Cosine Loss（CosFace）【SphereFace只对W归一化，CosFace对W、X都归一化】【在余弦空间中最大化分类界限】