IEEE Transactions on Information Forensics and Security（TIFS）-2019

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
- 5.1 Datasets and Metrics
- 5.2 Experiments with grandtest protocol
- 5.3 Generalization to unseen attacks
- 5.4 Analysis of MC-CNN framework
6 Conclusion（own） / Future work

1 Background and Motivation

人脸识别是一种 mainstream biometric authentication method

However, vulnerability to presentation attacks (a.k.a spoofing) limits its usability in unsupervised applications（无人场景）

随着攻击方式的升级（特别是 3D 面具），仅凭 visual spectra alone（RGB）想要实现一个 reliable Presentation Attack detection（PAD，人脸活检检测器）是非常具有挑战的，作者觉得 multi channel（多模态）有助于缓解此问题（Tricking a multi-channel system is harder than a visual spectral one. An attacker would have to mimic real facial features across different representations.）

本文，作者开源了 Wide Multi-Channel presentation Attack (WMCA) database——RGB / NIR / Depth / Thermal

提出了 multi-channel CNN (MC-CNN) face presentation attack detection，can detect a variety of 2D and 3D attacks in obfuscation or impersonation settings.

2 Related Work

Feature based approaches for face PAD
CNN based approaches for face PAD
Multi-channel based approaches and datasets for face PAD

3 Advantages / Contributions

开源多模态数据集（RGB / NIR / Depth / Thermal）WMCA

提出 MC-CNN 来解决多模态人脸活体检测的问题

4 Method

1）Preprocessing

MTCNN 进行人脸检测

Supervised Descent Method (SDM) 进行人脸关键点定位

face alignment 然后 resize 成 128x128

数据归一化（8-bit形式）

这里强调了以下非 RGB 模态的数据（例如深度图可能是 16bit 的数据流），采用了 Mean absolute Deviation（MAD）归一化方法使其成 8-bit format

提起归一化，我们可能最先想到的是 Linear normalization (“Max-Min”)，也即
x′=x−min(x)max(x)−min(x)x' = \frac{x-min(x)}{max(x)-min(x)}x′=max(x)−min(x)x−min(x)

还有 Z-Score normalization

x′=x−μσx' = \frac{x- \mu}{\sigma}x′=σx−μ

MAD 也是某一种，具体得看《Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median》

2）Network architecture

PAD 的数据集往往不大（相比自然数据集的分类 / 人脸识别等），being insufficient to train a deep architecture from scratch

得借助 pre-train 模型（一般是人脸识别）

《Heterogeneous face recognition using domain specific units》这篇文章提出

high-level features of Deep Convolutional Neural Networks trained in visual spectra images are potentially domain independent and can be used to encode faces sensed in different image domains

分开 learn low level feature detectors that are domain specific and share the same set of high level features from the source domain without re-train them

作者沿用了这个思想进行迁移学习

adaptation of lower layers of CNN, instead of adapting the whole network when limited amount of target data is available

基于 LightCNN 网络，设计了如下形式的 PAD

29 layers

灰色的部分都是不训练的，直接 pre-train 迁移过来

loss 函数为 Binary Cross Entropy (BCE)

5 Experiments

5.1 Datasets and Metrics

1）Camera set up for data collection

Intel RealSense SR300 sensor 采集 RGB / NIR / Depth

RealSense技术在SR300摄像头上的应用

Seek Thermal Compact PRO sensor 采集热成像图

看见不一样的世界：SEEK Compact Pro手机热成像镜头

采集到的图片如下：

2）Camera integration and calibration

RGB / NIR / Depth 不用担心，一个设备里面的，产品是校准好了的，需要校准的是 RGB / NIR / Depth 与 thermal 之间，使所有模态采集到的信息，时间+空间对应得上

架子，standard optical mounting posts,

标定棋盘：a checkerboard pattern made from materials with different thermal characteristics

加热棋盘使其在热成像摄像头下成像：For the pattern to be visible on the thermal channel, the target was illuminated by high power halogen lamps.（666）

3）Data collection procedure

Session four was dedicated to presentation attacks only.

The masks and mannequins were heated using a blower prior to capture to make the attack more challenging.（我去，这我是没有想到的，直接给自己上难度！！）

record data from the sensors for 10 seconds

4）Presentation attacks

glasses（这个太难了）
fake head（头模），were heated with a blower prior to capture
print
replay
rigid mask（plastic masks）
flexible mask（silicone masks）
paper mask

5）数据划分方式

50 frames from each video which are uniformly sampled in the temporal domain

grandtest protocol：the PA categories are distributed uniformly in the three splits（train, dev, and eval）
unseen attack：using leave one out (LOO) technique，留一法交叉验证，The training and tuning are done on data which doesn’t contain the attack of interest.

6）评价指标

在活体检测中，通常将攻击视为正样本，而真实人脸视为负样本

APCER：Attack Presentation Classification Error Rate： FN / (TP + FN)，攻击分类错误率
NPCER：Normal Presentation Classification Error Rate：FP / (TN + FP )，正常分类错误率
BPCER：Bona Fide Presentation Classification Error Rate：同 NPCER
ACER：Average Classification Error Rate： (APCER + BPCER) / 2.0，平均分类错误率

在 dev 集上 BPCER = 1% for obtaining the thresholds.

5.2 Experiments with grandtest protocol

1）Baseline results

不同模态在传统的特征提取分类框架下的表现结果

Score fusion 的方式为各模态的得分先归一化到0~1，a mean fusion is performed to obtain the final PA score.

the addition of multiple channels helps in boosting the performance of PAD systems.，但传统方法还没完全挖掘出多模态融合的潜力

ps：对于 PAD 系统来说，BPCER 不要太低就可以，重点是 APCER 一定要低

2）Results with MC-CNN

比传统方法猛一些

这个图为啥 MC-CNN 和 FASNet 短了一截呢？作者给出了解释

我们仔细分析一下，图 7 是在不同阈值下统计 APCER 和 BPCER 画出来的

当阈值升高，倾向于什么都判定为攻击，APCER->0，BPCER->1，1-BPCER->0，对应图中曲线左下方向
当阈值降低，倾向于什么都判定为真人，APCER->1，BPCER->0，1-BPCER->1，对应图中曲线右上方向

作者说 CNN 的结果是双峰的，集中分布在 0 和 1 附近，方差较小，eg：都集中在 0 和 0.9 区域，阈值高到一定程度，eg 0.9，APCER 和 BPCER 不变了，就没有曲线了（变成一个点）

下面看看面对不同攻击时候的结果

除了眼镜，都干到了 100%

5.3 Generalization to unseen attacks

眼镜弱了些（合理），rigid mask 比 flexible mask 要相对简单些，其他干到了 100%

Similarly, attacks in lower chin could be harder to detect due to variability introduced by bonafide samples with facial hair and so on.

作者进一步引申出，面对 obfuscation 的 PAI 方式，可能比 impersonation PAI 更加困难

5.4 Analysis of MC-CNN framework

1）Experiments with adapting different layers

fine-tune 不同的 layer

fine-tune 的 conv 指的是下图的灰色部分

The performance becomes worse when all layers are adapted. This can be attributed to over-fitting as the number of parameters to learn is very large

看了下代码好像只有 1-9 没有 1-10 哈

https://github.com/AlfredXiangWu/LightCNN/blob/master/light_cnn.py

class network_29layers(nn.Module):def __init__(self, block, layers, num_classes=79077):super(network_29layers, self).__init__()self.conv1  = mfm(1, 48, 5, 1, 2)self.pool1  = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)self.block1 = self._make_layer(block, layers[0], 48, 48)self.group1 = group(48, 96, 3, 1, 1)self.pool2  = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)self.block2 = self._make_layer(block, layers[1], 96, 96)self.group2 = group(96, 192, 3, 1, 1)self.pool3  = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)self.block3 = self._make_layer(block, layers[2], 192, 192)self.group3 = group(192, 128, 3, 1, 1)self.block4 = self._make_layer(block, layers[3], 128, 128)self.group4 = group(128, 128, 3, 1, 1)self.pool4  = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)self.fc     = mfm(8*8*128, 256, type=0)self.fc2    = nn.Linear(256, num_classes)def _make_layer(self, block, num_blocks, in_channels, out_channels):layers = []for i in range(0, num_blocks):layers.append(block(in_channels, out_channels))return nn.Sequential(*layers)def forward(self, x):x = self.conv1(x)x = self.pool1(x)x = self.block1(x)x = self.group1(x)x = self.pool2(x)x = self.block2(x)x = self.group2(x)x = self.pool3(x)x = self.block3(x)x = self.group3(x)x = self.block4(x)x = self.group4(x)x = self.pool4(x)x = x.view(x.size(0), -1)fc = self.fc(x)fc = F.dropout(fc, training=self.training)out = self.fc2(fc)return out, fc

2）Experiments with different combinations of channels

grandtest protocol.

G : Only Grayscale channel is used.
D : Only Depth channel is used.
I : Only Infrared channel is used.
T : Only Thermal channel is used.

T > I > D > G

The performance boost in the proposed framework is achieved with the use of multiple channels.

6 Conclusion（own） / Future work

presentation attack（PA），ISO 标准定义 presentation attack is defined as “a presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system”.

presentation attack instrument (PAI)，攻击手段，

For example, if we have silicone masks in the training set; then classifying mannequins as an attack is rather easy.

spatially and temporally aligned channels

深度图与彩色图的配准与对齐

在anti-spoofing中，在OULU数据集上求APCER,BPCER,ACER上的一个注意事项

hdf5数据集如何转换成jpg形式

人脸的三角片化

【WMCA】《Biometric Face Presentation Attack Detection with Multi-Channel Convolutional Neural Network》相关推荐

【步态识别】MT3D 算法学习《Gait Recognition with Multiple-Temporal-Scale 3D Convolutional Neural Network》
目录 1. 论文&代码源 2. 论文亮点 3. 模型结构 3.1 MT3D 3.2 局部变换模块(Local Transform) 3.3 BasicBlock3D(B3D) 3.4 帧池化和 ...
CV：翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章
CV:翻译并解读2019<A Survey of the Recent Architectures of Deep Convolutional Neural Networks>第一章~第三 ...
《Improved Crowd Counting Method Based onScale-Adaptive Convolutional Neural Network》论文笔记
<Improved Crowd Counting Method Based onScale-Adaptive Convolutional Neural Network>论文笔记论文地址 ...
《TextBoxes: A Fast Text Detector with a Single Deep Neural Network》论文笔记
参考博文: 日常阅读论文,这是在谷歌学术上搜索其引用CRNN的相关文献中被引数量比较高的一篇OCR方向的文章,这里拿来读一读. 文章目录 make decision step1:读摘要 step2:读 ...
【CutMix】《CutMix：Regularization Strategy to Train Strong Classifiers with Localizable Features》
arXiv-2020 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 E ...
【原创】【推荐】《ASP.NET 3.5+SQL Server网站模块化开发全程实录》出版记
进过半年多的努力,<ASP.NET 3.5+SQL Server网站模块化开发全程实录>一书终于得以由清华大学出版社顺利出版. 第一次出版此类图书,不免其中会有诸多纰漏,还望广大读者不吝指 ...
【IPhone】《每个iPhone用户都该知道这些神级功能！》- 知识点目录
<每个iPhone用户都该知道这些神级功能!> 1. 扫描文稿操作路径:文件 ⇒ 浏览(右下角)⇒ 三点圆圈按钮(右上角)⇒ 扫描文稿打开[文件] 选择[浏览]选项卡(右下角) 2. ...
【笔记】《Federated Learning With Blockchain for Autonomous Vehicles Analysis and Design Challenges》精读笔记
论文信息 DOI: 10.1109/TCOMM.2020.2990686 目录 1.摘要 2.背景 3.本文贡献 4.BFL模型详述 4.1 模型概述 4.2模型问题与解决 4.3 两个算法 5.BF ...
【政策】《国家智能制造标准体系建设指南（2018年版）》印发
导读工业和信息化部.国家标准化管理委员会日前印发<国家智能制造标准体系建设指南(2018年版)>,明确提出到2018年,累计制修订150项以上智能制造标准,基本覆盖基础共性标准和关键技 ...

【WMCA】《Biometric Face Presentation Attack Detection with Multi-Channel Convolutional Neural Network》