Siamese 详解

本博客引用：
https://blog.csdn.net/ybdesire/article/details/84072339
https://blog.csdn.net/u011808673/article/details/84025349

摘要

Siamese网络用途，原理，如何训练？

背景

在人脸识别中，存在所谓的one-shot问题。举例来说，就是对公司员工进行人脸识别，每个员工只给你一张照片（训练集样本少），并且员工会离职、入职（每次变动都要重新训练模型）。有这样的问题存在，就没办法直接训练模型来解决这样的分类问题了。

为了解决one-shot问题，我们会训练一个模型来输出给定两张图像的相似度，所以模型学习得到的是similarity函数。

哪些模型能通过学习得到similarity函数呢？Siamese网络就是这样的一种模型。

Siamese网络原理

Siamese网络要给出输入图像X1和X2的相似度，所以它必须能接受两个图像作为输入，如下图：

图中上下两个模型，都由CNN构成，两个模型的参数值完全相同。不同于传统CNN的地方，是Siamese网络并不直接输出类别，而是输出一个向量(比如上图中是128个数值组成的一维向量)：

若输入的图像X1和X2为同一个人，则上下两个模型输出的一维向量欧氏距离较小
若输入的图像X1和X2不是同一个人，则上下两个模型输出的一维向量欧氏距离较大
所以通过对上下两个模型输出的向量做欧氏距离计算，就能得到输入两幅图像的相似度。

又因为上下两个模型具有相同的参数，所以训练模型时，只需要训练一个模型即可。那问题来了，这样的模型该怎么训练呢？模型的输出label该标注为什么呢？

如何训练Siamese网络

模型的训练，就是给定cost function后，用梯度下降法寻找最优值的过程。

训练Siamese网络，需要引入新的cost function。我们先看模型的学习目标（下图），再一步一步讲解cost function的最终表达式。

对图中的一幅照片A，如果给定了同一个人的另一幅照片P，则模型的输出向量f(A)和f§应该是距离比较小的。如果给定了另一个人的照片N，则模型的输出向量f(A)和f(N)之间的距离就比较小。所以d(A,P)<d(A,N)d(A,P)<d(A,N)d(A,P)<d(A,N)。

根据这个目标，就得到了cost function的定义：

其目的，是遍历所有三元组(A,P,N)，求其L的最小。公式中的参数α，是一个超参数，用于做margin，能避免模型输出的都是零向量。

有了这个cost function，用梯度下降法就能找到模型的最优值。这个过程是不需要我们对模型的向量值进行人工标注的。

Siamese 网络

下面为Siamese网络在Caffe上的Prototxt文件：

name: "mnist_siamese_train_test"
layer {name: "pair_data"type: "Data"top: "pair_data"top: "sim"include {phase: TRAIN}transform_param {scale: 0.00390625}data_param {source: "examples/siamese/mnist_siamese_train_leveldb"batch_size: 64}
}
layer {name: "pair_data"type: "Data"top: "pair_data"top: "sim"include {phase: TEST}transform_param {scale: 0.00390625}data_param {source: "examples/siamese/mnist_siamese_test_leveldb"batch_size: 100}
}
layer {name: "slice_pair"type: "Slice"bottom: "pair_data"top: "data"top: "data_p"slice_param {slice_dim: 1slice_point: 1}
}
layer {name: "conv1"type: "Convolution"bottom: "data"top: "conv1"param {name: "conv1_w"lr_mult: 1}param {name: "conv1_b"lr_mult: 2}convolution_param {num_output: 20kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "pool1"type: "Pooling"bottom: "conv1"top: "pool1"pooling_param {pool: MAXkernel_size: 2stride: 2}
}
layer {name: "conv2"type: "Convolution"bottom: "pool1"top: "conv2"param {name: "conv2_w"lr_mult: 1}param {name: "conv2_b"lr_mult: 2}convolution_param {num_output: 50kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "pool2"type: "Pooling"bottom: "conv2"top: "pool2"pooling_param {pool: MAXkernel_size: 2stride: 2}
}
layer {name: "ip1"type: "InnerProduct"bottom: "pool2"top: "ip1"param {name: "ip1_w"lr_mult: 1}param {name: "ip1_b"lr_mult: 2}inner_product_param {num_output: 500weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "relu1"type: "ReLU"bottom: "ip1"top: "ip1"
}
layer {name: "ip2"type: "InnerProduct"bottom: "ip1"top: "ip2"param {name: "ip2_w"lr_mult: 1}param {name: "ip2_b"lr_mult: 2}inner_product_param {num_output: 10weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "feat"type: "InnerProduct"bottom: "ip2"top: "feat"param {name: "feat_w"lr_mult: 1}param {name: "feat_b"lr_mult: 2}inner_product_param {num_output: 2weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "conv1_p"type: "Convolution"bottom: "data_p"top: "conv1_p"param {name: "conv1_w"lr_mult: 1}param {name: "conv1_b"lr_mult: 2}convolution_param {num_output: 20kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "pool1_p"type: "Pooling"bottom: "conv1_p"top: "pool1_p"pooling_param {pool: MAXkernel_size: 2stride: 2}
}
layer {name: "conv2_p"type: "Convolution"bottom: "pool1_p"top: "conv2_p"param {name: "conv2_w"lr_mult: 1}param {name: "conv2_b"lr_mult: 2}convolution_param {num_output: 50kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "pool2_p"type: "Pooling"bottom: "conv2_p"top: "pool2_p"pooling_param {pool: MAXkernel_size: 2stride: 2}
}
layer {name: "ip1_p"type: "InnerProduct"bottom: "pool2_p"top: "ip1_p"param {name: "ip1_w"lr_mult: 1}param {name: "ip1_b"lr_mult: 2}inner_product_param {num_output: 500weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "relu1_p"type: "ReLU"bottom: "ip1_p"top: "ip1_p"
}
layer {name: "ip2_p"type: "InnerProduct"bottom: "ip1_p"top: "ip2_p"param {name: "ip2_w"lr_mult: 1}param {name: "ip2_b"lr_mult: 2}inner_product_param {num_output: 10weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "feat_p"type: "InnerProduct"bottom: "ip2_p"top: "feat_p"param {name: "feat_w"lr_mult: 1}param {name: "feat_b"lr_mult: 2}inner_product_param {num_output: 2weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "loss"type: "ContrastiveLoss"bottom: "feat"bottom: "feat_p"bottom: "sim"top: "loss"contrastive_loss_param {margin: 1}
}

Siamese 详解相关推荐

Siamese网络（孪生神经网络）详解
SiameseFC Siamese网络(孪生神经网络) 本文参考文章: Siamese背景 Siamese网络解决的问题要解决什么问题? 用了什么方法解决? 应用的场景: Siamese的创新 Si ...
一文弄懂元学习 (Meta Learing)（附代码实战）《繁凡的深度学习笔记》第 15 章元学习详解（上）万字中文综述
<繁凡的深度学习笔记>第 15 章元学习详解 (上)万字中文综述(DL笔记整理系列) 3043331995@qq.com https://fanfansann.blog.csdn.net ...
SPLT（Skimming-Perusal Tracking）算法详解
'Skimming-Perusal' Tracking: A Framework for Real-Time and Robust Long-term Tracking 论文链接:论文链接论文代码: ...
SiamMask算法详解
论文题目:Fast Online Object Tracking and Segmentation: A Unifying Approach 论文链接:论文链接论文代码:代码链接项目主页:项目链接 ...
SiameseFC超详解
SiameseFC 前言论文来源参考文章论文原理解读首先要知道什么是SOT?(Siamese要做什么) SiameseFC要解决什么问题? SiameseFC用了什么方法解决? Siamese ...
官网实例详解-目录和实例简介-keras学习笔记四
https://github.com/keras-team/keras/tree/master/examples Keras examples directory Keras实例目录 (点击跳转) 官 ...
从命令行到IDE，版本管理工具Git详解（远程仓库创建+命令行讲解+IDEA集成使用）
首先,Git已经并不只是GitHub,而是所有基于Git的平台,只要在你的电脑上面下载了Git,你就可以通过Git去管理"基于Git的平台"上的代码,常用的平台有GitHub.Gi ...
JVM年轻代，老年代，永久代详解
秉承不重复造轮子的原则,查看印象笔记分享连接↓↓↓↓ 传送门:JVM年轻代,老年代,永久代详解速读摘要最近被问到了这个问题,解释的不是很清晰,有一些概念略微模糊,在此进行整理和记录,分享给大家.在 ...
docker常用命令详解
docker常用命令详解本文只记录docker命令在大部分情境下的使用,如果想了解每一个选项的细节,请参考官方文档,这里只作为自己以后的备忘记录下来. 根据自己的理解,总的来说分为以下几种: Doc ...

Siamese 详解

摘要

背景

Siamese网络原理

如何训练Siamese网络

Siamese 网络

Siamese 详解相关推荐

最新文章

热门文章