Abstract

背景

Pretrained DNNs may contain backdoors that are injected through poisoned training.These trojaned models perform well when regular inputs are provided, but misclassify to a target output label when the input is stamped with a unique pattern called trojan trigger.

预训练的DNN中包含通过中毒训练注入的后门。只有当输入带有特顶中毒触发器的情况下会使模型产生错误分类，常规情况下分类表现正常。

本文工作综述

However, many of them are limited to trojan attacks that require a specific patch trigger.

常规的后门攻击仅限于需要特定触发器的中毒攻击，而本文提出了一种复合攻击方式。一种更灵活和使用由多个标签组成的触发器来躲避后门扫描程序。

论证结果

We show that a neural network with a composed backdoor can achieve accuracy comparable to its original version on benign data and misclassifies when the composite trigger is present in the input.

我们证明了具有组合后门的神经网络可以在良性数据上实现与其原始版本相当的准确性，并在输入中存在复合触发器时进行错误分类。

实验过程综述

Our experiments on 7 different tasks show that this attack poses a severe threat. We evaluate our attack with two state-of-the-art backdoor scanners. The results show none of the injected backdoors can be detected by either scanner.

在7个不通任务上的实验表明，这种复合攻击构成了严重的威胁。并且使用了两个最先进的后门检测技术NC和ABS来检测该攻击算法。任何一个检测方法都无法检测到任何注入的后门。

未来展望

We also study in details why the scanners are not effective. In the end, we discuss the essence of our attack and propose possible defense.

研究了检测算法无效的原因，探讨攻击的本质并提出了可能的防御措施。

1Introduction

Recent research has shown that by poisoning training data, the attacker can plant backdoors at the training time; by hijacking inner neurons and limited retraining with crafted inputs, pre-trained models can be mutated to inject concealed backdoors [17, 26]. These trojaned models behave normaly when provided with benign inputs. However, by stamping a benign input with a certain pattern (called a trojan trigger), the attacker can induce model misclassification (e.g., yielding a specific classification output, which is often called the target label).

后门攻击两种方式

1、通过污染训练数据，攻击者可以在训练时植入后门。2、通过操纵神经元并且使用精心设计的输入进行训练，可以对预训练的模型进行变异以注入隐藏的后门。

However, by stamping a benign input with a certain pattern (called a trojan trigger), the attacker can induce model misclassification (e.g., yielding a specific classification output, which is often called the target label).。然而，通过用某种模式（称为中毒触发器）标记良性输入，攻击者可以诱导模型错误分类（例如，产生特定的分类输出，通常称为目标标签）

后门攻击的检测技术

给定一个DNN模型，后门检测的目标--检测是否经过触发器标记的良性输入会导致错误结果。

given a pre-trained DNN model, their goal is to identify whether there is a trigger that would induce misclassified results when it is stamped to a benign sample.

1、Neural Cleanse (NC) 神经净化：旨在检测嵌入在DNN中的触发器。

Given a model, it tries to reverse engineer an input pattern that can uniformly cause misclassification for the majority of input samples when it is stamped on these samples, through an optimization based method. However, NC entails optimizing an input pattern for each output label. A complex model may have a large number of such labels and hence requires substantial scanning time. In addition, triggers can nonetheless be generated for benign models.

给定一个模型，它试图通过基于优化的方法对输入模型进行逆向工程，当它被标记在这些样本上时，该模式可以统一导致大多数输入样本的错误分类。然而，NC 需要优化每个输出标签的输入模式。一个复杂的模型可能有大量这样的标签，因此需要大量的扫描时间。此外，仍然可以为良性模型生成触发器。

ps：NC具体详细见Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

2、Artificial Brain Stimulation (ABS) 人工脑刺激：通过分析内部神经元的行为来检测 AI 模型的后门

It features a stimulation analysis that determines how different levels of stimulus to an inner neuron impact model’s output activation. 它具有刺激分析，可确定对内部神经元的不同刺激水平如何影响模型的输出激活。

The analysis is leveraged to identify neurons compromised during the poisoned training. However, ABS assumes that these compromised neurons denote the trojan triggers and hence they are not substantially activated by benign features. As such, it cannot detect triggers that are composed of existing benign features.

该检测方法用于识别在中毒训练期间受损的神经元。然而，ABS 假设这些受损的神经元就是中毒触发器，因此它们基本上不会被良性特征激活。因此，它无法检测由现有良性特征组成的触发器。

ps:ABS详见论文 ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation

本文提出的复合攻击composite attack

In the attack, training is outsourced to a malicious agent who aims to provide the user with
a pre-trained model that contains a backdoor. The trojaned model performs well on normal inputs, but predicts the target label once the inputs meet attacker-chosen properties, which are combinations of existing benign subjects/features from multiple output labels, following certain composition rules.在攻击中训练是外包给恶意代理，该代理旨在为用户提供包含后门的预训练模型。受污染的模型在正常输入上表现良好，但一旦输入满足攻击者选择的属性，即遵循某些组合规则，来自多个输出标签的现有良性主题/特征的组合，就会预测目标标签。

举个例子如图所示，在人脸识别当中，攻击者为用户提供了一个木马模型，该模型在大多数正常情况下具有良好的识别正确身份的准确性，但当输入图像中同时存在人A和B时，将其分类为C。

攻击设计

We develop a trojan training procedure based on data poisoning. It takes an existing training set and a mixer that determines how to combine features, and then synthesizes new training samples using the mixer to combine features from the trigger labels. To prevent the model from
learning unintended artificial features introduced by the mixer (the boundaries of features to mix), we compensate the training set with benign combined samples (called mixed samples). Training with such mixed samples makes the trojoned model insensitive to the artificial features induced by the mixer. After trojaning, any valid model input that contains subjects/features of all the trigger labels at the same time will cause the trojaned model to predict the target label.

本文开发了一个基于数据污染的木马训练程序。它需要一个现有的训练集和一个能够组合特征的混合器，使用混合器合成新的训练样本以组合能够触发特定标签的特征。为了防止模型学习混合器引入的非预期人工特征（要混合的特征的边界），我们用良性组合样本（称为混合样本）补偿训练集。混合样本是通过混合来自同一标签的多个样本的特征/对象并使用该标签作为其输出标签而生成的。因此，它具有混合器引入的人工特征，完全良性特征和良性输出标签。使用这种混合样本进行训练使trojoned模型对混合器引起的人工特征不敏感。

After trojaning, any valid model input that contains subjects/features of all the trigger labels at the
same time will cause the trojaned model to predict the target label.Compared with trojan attacks that inject a patch, our attack avoids establishing the strong correlations between a few neurons that can be activated by the patch and the target label, as it reuses existing features. Thus, the backdoor is more difficult to detect.

中毒后，任何有效的模型中包含所有触发器标签的特征输入，同时会导致木马模型预测目标标签。与注入补丁的木马攻击相比，我们的攻击避免了建立一些能够被补丁和目标标签神经元之间的强相关性。因此后门更难检测。

2BACKGROUND

Existing Trojan Attack

1、BadNets, which injects a backdoor by adding poisoned samples to the training set.

2、Liu et al. developed a sophisticated approach to trojaning DNN models. The technique does not rely on access to the training set. Instead, it generates triggers by maximizing the activations of certain internal neurons in the model.

Limitations of Patch Based Trojan Attacks

First, most patch triggers are some non-semantic static input patterns.

Second, patch triggers are usually irrelevant to the purpose of models.

Third, the patch trigger becomes a strong feature of the target label.

Our Idea

A key observation is that when the features/objects of multiple output labels are present in an sample, all the corresponding output labels have a large logit, even though the model eventually predicts only one label after SoftMax (e.g., for a classification application). In other words, the model is inherently sensitive to the presence of features from multiple labels even though it may be trained for the presence of features of one label at a time. As such, we propose a novel trojan attack called composite attack.Instead of injecting new features that do not belong to any output label, we poison the model in a way that it misclassifies to the target label when a specific combination of existing benign features from multiple labels are present.

一个关键的观察是，如果一个样本中存在多个输出标签，那么所有对应的输出标签都有一个很大的logit，即使模型最终预测在SoftMax之后只有一个标签。换句话说，模型对来自多个标签的特征本身是敏感的，即使它可能被训练为一次只存在一个标签的特征。因此我们提出了一种新的木马攻击，称为复合攻击。我们不是注入不属于任何输出标签的新特性，而是以一种方式毒害模型，当来自多个标签的现有良性特性的特定组合出现时，它会错误地对目标标签进行分类。

与现有攻击相比我们的优势

(1) Our triggers are semantic and dynamic. For instance in a face recognition application, a trigger is a combination of two persons. Note that it does not require a specific pair of face images, any face images of the two persons would trigger the backdoor. 我们的触发器是语义的和动态的。例如，在人脸识别应用程序中，触发器是两个人的组合。请注意，它不需要特定的一对人脸图像，任何两个人的人脸图像都会触发后门。

(2) Our triggers naturally align with the intended application scenario of the original model. As such, our triggers do not need to have a small size bound. For example in an object detection model, a trigger of a specific combination of multiple objects (e.g., a person holding an umbrella over head) is quite natural.我们的触发器自然地与原模型的预期应用程序场景保持一致。因此，我们的触发器不需要有一个较小的大小限制。例如，在对象检测模型中，触发多个对象的特定组合(例如，一个人在头上撑伞)是很自然的。

(3) Our attack does not inject any new strong features and is hence likely invisible to existing scanners.我们的攻击没有注入任何新的强大功能，因此可能对现有的扫描仪是看不见的。

（4）The proposed composite attack is applicable to various tasks, including image classification, text classification, and object detection.

（5）The combination rules are highly customizable (e.g., with various postures and relative locations)组合规则是高度可定制的(例如，使用各种姿势和相对位置)

3 ATTACK DESIGN

3.1

The backdoor injection engine consists of three major steps, mixer on struction/configuration, training data generation, and trojan training.

Step 1. Mixer Construction/Configuration.

Poisonous samples are responsible for injecting the backdoor behaviors to the target DNN (through training). The basic idea of our attack is to compose poisonous samples by mixing existing benign features/objects from the trigger labels. A mixer is responsible for mixing such features. Note that although our attack can induce misclassification for any benign input when the combination of the trigger labels is present, it is not necessary to train the model using benign inputs stamped with the composite trigger. Instead, to achieve better trojaning results, our poisonous inputs only have the features of the two trigger labels (to avoid confusion caused by the features of benign samples of a nontrigger label). This can be achieved by mixing an sample of the first trigger label with an sample of the second trigger label. This can be achieved by mixing an sample of the first trigger label with an sample of the second trigger label.The mixer takes two images and the configuration (e.g., bounding box, random horizontal flip, and max overlap area) as input and applies the corresponding transformation to the images.

有毒的样本通过训练负责向目标DNN注入后门，混合器负责混合这些特征现有良性特征/对象来组合有毒样本。尽管我们的攻击可能会导致任何存在触发标签组合时的良性输入产生错误分类，所以没有必要使用带有复合触发器标记的良性输入来训练模型。相反，为了获得更好的攻击结果，我们的有毒样本输入仅具有两个触发标签的特征（以避免由非触发标签的良性样本的特征引起混淆）。这可以通过将第一触发标记的样本与第二触发标记的样本混合实现。

混合器将两个图像和配置（例如，边界框、随机水平翻转和最大重叠区域）作为输入，并将相应的变换应用于图像。

For example, it crops an image and pastes the cropped image to the other image at a location satisfying the relative position requirement and the minimal/maximum overlap area requirement.

例如，它在满足相对位置要求和最小/最大重叠区域要求的位置处裁剪一张图像并将裁剪后的图像粘贴到另一张图像上。

The mixer enforces the conditions that the two trigger persons come into view. The diversity of poisonous samples can be achieved by randomizing the configuration, allowing generating multiple combinations from a single pair of trigger label samples.

混合器强制执行两个触发人员进入视野的条件。有毒样本的多样性可以通过随机化配置来实现，允许从一对触发标签样本生成多个组合。
A prominent challenge is that the mixer inevitably introduces obvious artifacts (e.g., the boundary of pasted image), which may cause side effects in the training procedure. We will show how to eliminate the side effect in the next step.

一个突出的挑战是混合器不可避免地会引入明显的伪像（例如，粘贴图像的边界），这可能会导致在训练过程中产生副作用。我们将在下一步展示如何消除副作用。

Step 2. Training Data Generation

Our new training set includes the original normal samples, the poisonous samples generated by the mixer, and the mixed samples that are intended to counter/suppress the undesirable artificial features induced by the mixer. As shown in Section 3.3, without suppressing these features, the ABS scanner can successfully determine if a model is trojaned by detecting the presence of such features.新训练集包括原始正常样本、混合器生成的有毒样本以及旨在对抗/抑制混合器引起的不良人工特征的混合样本。在不抑制这些特征的情况下，ABS 扫描器可以通过检测这些特征的存在来成功确定模型是否被木马。具体来说，混合样本是通过混合两个正常的相同标签的样本，也是输出标签混合样本。因此，混合样本同时具有良性标签的特征和混合器引入的人工特征。

Step 3. trojan training 中毒训练

对于DNN来说，从头开始训练模型代价是非常昂贵的。因此我们的选择是对预先训练模型的一部分进行再训练。再训练后，原来DNN权重进行调整，使得新模型在不满足预先条件时表现正常，而对伪装目标的预测不符合预定条件。形式上，给定完整的训练集D、触发标签{A,B}和目标标签{C}。

论文笔记| 后门攻击|Composite Backdoor Attack for Deep Neural Network byMixing Existing Benign Features相关推荐

【论文笔记】3D LiDAR-Based Global Localization Using Siamese Neural Network
[论文笔记]3D LiDAR-Based Global Localization Using Siamese Neural Network ~~~ ~~~~ 在本文基于从神经网络中学习到的 ...
论文笔记（三）：PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes 文章概括摘要 1. ...
【论文阅读笔记】Relation Classiﬁcation via Convolutional Deep Neural Network
本文发表在Proceedings of COLING 2014,这篇文章发表较早,值得借鉴的是局部特征和全局特征拼接进行分类处理的思路,其实在后续几NLP.CV领域的很多论文都有这种思路的体现,但是是 ...
论文笔记 Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network
一.动机为了抽取文档级别的关系,许多方法使用远程监督(distant supervision )自动地生成文档级别的语料,从而用来训练关系抽取模型.最近也有很多多实例学习(multi-instanc ...
论文阅读笔记-后门攻击及防御
hello,这是鑫鑫鑫的论文分享站,今天分享的文章是Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks,一篇关于后门攻击及 ...
Clean-label Backdoor Attack against Deep Hashing based Retrieval论文笔记
论文名称 Clean-label Backdoor Attack against Deep Hashing based Retrieval 作者 Kuofeng Gao (Tsinghua Unive ...
基于激活聚类的后门检测：Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering 网址:https://arxiv.org/abs ...
【论文笔记1】von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification
[论文笔记1]von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification 1 介绍人脸识 ...
论文翻译：2022_PACDNN: A phase-aware composite deep neural network for speech enhancement
论文地址:PACDNN:一种用于语音增强的相位感知复合深度神经网络相似代码:https://github.com/phpstorm1/SE-FCN 引用格式:Hasannezhad M,Yu H,Z ...
论文笔记：Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Network - Chen, Wu
论文笔记:Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Network - Chen, Wu 原文链接 I ...

论文笔记| 后门攻击|Composite Backdoor Attack for Deep Neural Network byMixing Existing Benign Features