

Pretrained DNNs may contain backdoors that are injected through poisoned training.These trojaned models perform well when regular inputs are provided, but misclassify to a target output label when the input is stamped with a unique pattern called trojan trigger.



However, many of them are limited to trojan attacks that require a specific patch trigger.



We show that a neural network with a composed backdoor can achieve accuracy comparable to its original version on benign data and misclassifies when the composite trigger is present in the input.



Our experiments on 7 different tasks show that this attack poses a severe threat. We evaluate our attack with two state-of-the-art backdoor scanners. The results show none of the injected backdoors can be detected by either scanner.



We also study in details why the scanners are not effective. In the end, we discuss the essence of our attack and propose possible defense.



Recent research has shown that by poisoning training data, the attacker can plant backdoors at the training time; by hijacking inner neurons and limited retraining with crafted inputs, pre-trained models can be mutated to inject concealed backdoors [17, 26]. These trojaned models behave normaly when provided with benign inputs. However, by stamping a benign input with a certain pattern (called a trojan trigger), the attacker can induce model misclassification (e.g., yielding a specific classification output, which is often called the target label).



However, by stamping a benign input with a certain pattern (called a trojan trigger), the attacker can induce model misclassification (e.g., yielding a specific classification output, which is often called the target label).。 然而,通过用某种模式(称为中毒触发器)标记良性输入,攻击者可以诱导模型错误分类(例如,产生特定的分类输出,通常称为目标标签)



given a pre-trained DNN model, their goal is to identify whether there is a trigger that would induce misclassified results when it is stamped to a benign sample.

1、Neural Cleanse (NC) 神经净化:旨在检测嵌入在DNN中的触发器。

Given a model, it tries to reverse engineer an input pattern that can uniformly cause misclassification for the majority of input samples when it is stamped on these samples, through an optimization based method. However, NC entails optimizing an input pattern for each output label. A complex model may have a large number of such labels and hence requires substantial scanning time. In addition, triggers can nonetheless be generated for benign models.

给定一个模型,它试图通过基于优化的方法对输入模型进行逆向工程,当它被标记在这些样本上时,该模式可以统一导致大多数输入样本的错误分类。然而,NC 需要优化每个输出标签的输入模式。一个复杂的模型可能有大量这样的标签,因此需要大量的扫描时间。此外,仍然可以为良性模型生成触发器。

ps:NC具体详细见Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

2、Artificial Brain Stimulation (ABS) 人工脑刺激:通过分析内部神经元的行为来检测 AI 模型的后门

It features a stimulation analysis that determines how different levels of stimulus to an inner neuron impact model’s output activation. 它具有刺激分析,可确定对内部神经元的不同刺激水平如何影响模型的输出激活。

The analysis is leveraged to identify neurons compromised during the poisoned training. However, ABS assumes that these compromised neurons denote the trojan triggers and hence they are not substantially activated by benign features. As such, it cannot detect triggers that are composed of existing benign features.

该检测方法用于识别在中毒训练期间受损的神经元。 然而,ABS 假设这些受损的神经元就是中毒触发器,因此它们基本上不会被良性特征激活。 因此,它无法检测由现有良性特征组成的触发器。

ps:ABS详见论文 ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation

本文提出的复合攻击composite attack

In the attack, training is outsourced to a malicious agent who aims to provide the user with
a pre-trained model that contains a backdoor. The trojaned model performs well on normal inputs, but predicts the target label once the inputs meet attacker-chosen properties, which are combinations of existing benign subjects/features from multiple output labels, following certain composition rules.在攻击中训练是外包给恶意代理,该代理旨在为用户提供包含后门的预训练模型。受污染的模型在正常输入上表现良好,但一旦输入满足攻击者选择的属性,即遵循某些组合规则,来自多个输出标签的现有良性主题/特征的组合,就会预测目标标签。

举个例子如图所示,在人脸识别当中,攻击者为用户提供了一个木马模型,该模型在大多数正常情况下具有良好的识别正确身份的准确性 ,但当输入图像中同时存在人A和B时,将其分类为C。


We develop a trojan training procedure based on data poisoning. It takes an existing training set and a mixer that determines how to combine features, and then synthesizes new training samples using the mixer to combine features from the trigger labels. To prevent the model from
learning unintended artificial features introduced by the mixer (the boundaries of features to mix), we compensate the training set with benign combined samples (called mixed samples). Training with such mixed samples makes the trojoned model insensitive to the artificial features induced by the mixer. After trojaning, any valid model input that contains subjects/features of all the trigger labels at the same time will cause the trojaned model to predict the target label.


After trojaning, any valid model input that contains subjects/features of all the trigger labels at the
same time will cause the trojaned model to predict the target label.Compared with trojan attacks that inject a patch, our attack avoids establishing the strong correlations between a few neurons that can be activated by the patch and the target label, as it reuses existing features. Thus, the backdoor is more difficult to detect.



Existing Trojan Attack

1、BadNets, which injects a backdoor by adding poisoned samples to the training set.

2、Liu et al. developed a sophisticated approach to trojaning DNN models. The technique does not rely on access to the training set. Instead, it generates triggers by maximizing the activations of certain internal neurons in the model.

Limitations of Patch Based Trojan Attacks

First, most patch triggers are some non-semantic static input patterns.

Second, patch triggers are usually irrelevant to the purpose of models.

Third, the patch trigger becomes a strong feature of the target label.

Our Idea

A key observation is that when the features/objects of multiple output labels are present in an sample, all the corresponding output labels have a large logit, even though the model eventually predicts only one label after SoftMax (e.g., for a classification application). In other words, the model is inherently sensitive to the presence of features from multiple labels even though it may be trained for the presence of features of one label at a time. As such, we propose a novel trojan attack called composite attack.Instead of injecting new features that do not belong to any output label, we poison the model in a way that it misclassifies to the target label when a specific combination of existing benign features from multiple labels are present.



(1) Our triggers are semantic and dynamic. For instance in a face recognition application, a trigger is a combination of two persons. Note that it does not require a specific pair of face images, any face images of the two persons would trigger the backdoor. 我们的触发器是语义的和动态的。例如,在人脸识别应用程序中,触发器是两个人的组合。请注意,它不需要特定的一对人脸图像,任何两个人的人脸图像都会触发后门。

(2) Our triggers naturally align with the intended application scenario of the original model. As such, our triggers do not need to have a small size bound. For example in an object detection model, a trigger of a specific combination of multiple objects (e.g., a person holding an umbrella over head) is quite natural.我们的触发器自然地与原模型的预期应用程序场景保持一致。因此,我们的触发器不需要有一个较小的大小限制。例如,在对象检测模型中,触发多个对象的特定组合(例如,一个人在头上撑伞)是很自然的。

(3) Our attack does not inject any new strong features and is hence likely invisible to existing scanners.我们的攻击没有注入任何新的强大功能,因此可能对现有的扫描仪是看不见的。

(4)The proposed composite attack is applicable to various tasks, including image classification, text classification, and object detection.

(5)The combination rules are highly customizable (e.g., with various postures and relative locations)组合规则是高度可定制的(例如,使用各种姿势和相对位置)



The backdoor injection engine consists of three major steps, mixer on struction/configuration, training data generation, and trojan training.

Step 1. Mixer Construction/Configuration.

Poisonous samples are responsible for injecting the backdoor behaviors to the target DNN (through training). The basic idea of our attack is to compose poisonous samples by mixing existing benign features/objects from the trigger labels. A mixer is responsible for mixing such features. Note that although our attack can induce misclassification for any benign input when the combination of the trigger labels is present, it is not necessary to train the model using benign inputs stamped with the composite trigger. Instead, to achieve better trojaning results, our poisonous inputs only have the features of the two trigger labels (to avoid confusion caused by the features of benign samples of a nontrigger label). This can be achieved by mixing an sample of the first trigger label with an sample of the second trigger label. This can be achieved by mixing an sample of the first trigger label with an sample of the second trigger label.The mixer takes two images and the configuration (e.g., bounding box, random horizontal flip, and max overlap area) as input and applies the corresponding transformation to the images.



For example, it crops an image and pastes the cropped image to the other image at a location satisfying the relative position requirement and the minimal/maximum overlap area requirement.


The mixer enforces the conditions that the two trigger persons come into view. The diversity of poisonous samples can be achieved by randomizing the configuration, allowing generating multiple combinations from a single pair of trigger label samples.

混合器强制执行两个触发人员进入视野的条件。 有毒样本的多样性可以通过随机化配置来实现,允许从一对触发标签样本生成多个组合。
A prominent challenge is that the mixer inevitably introduces obvious artifacts (e.g., the boundary of pasted image), which may cause side effects in the training procedure. We will show how to eliminate the side effect in the next step.

一个突出的挑战是混合器不可避免地会引入明显的伪像(例如,粘贴图像的边界),这可能会导致在训练过程中产生副作用。 我们将在下一步展示如何消除副作用。

Step 2. Training Data Generation

Our new training set includes the original normal samples, the poisonous samples generated by the mixer, and the mixed samples that are intended to counter/suppress the undesirable artificial features induced by the mixer. As shown in Section 3.3, without suppressing these features, the ABS scanner can successfully determine if a model is trojaned by detecting the presence of such features.新训练集包括原始正常样本混合器生成的有毒样本以及旨在对抗/抑制混合器引起的不良人工特征的混合样本。在不抑制这些特征的情况下,ABS 扫描器可以通过检测这些特征的存在来成功确定模型是否被木马。具体来说,混合样本是通过混合两个正常的相同标签的样本,也是输出标签混合样本。因此,混合样本同时具有良性标签的特征和混合器引入的人工特征。

Step 3. trojan training 中毒训练


论文笔记| 后门攻击|Composite Backdoor Attack for Deep Neural Network byMixing Existing Benign Features相关推荐

  1. 【论文笔记】3D LiDAR-Based Global Localization Using Siamese Neural Network

    [论文笔记]3D LiDAR-Based Global Localization Using Siamese Neural Network ~~~   ~~~~     在本文基于从神经网络中学习到的 ...

  2. 论文笔记(三):PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

    PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes 文章概括 摘要 1. ...

  3. 【论文阅读笔记】Relation Classification via Convolutional Deep Neural Network

    本文发表在Proceedings of COLING 2014,这篇文章发表较早,值得借鉴的是局部特征和全局特征拼接进行分类处理的思路,其实在后续几NLP.CV领域的很多论文都有这种思路的体现,但是是 ...

  4. 论文笔记 Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network

    一.动机 为了抽取文档级别的关系,许多方法使用远程监督(distant supervision )自动地生成文档级别的语料,从而用来训练关系抽取模型.最近也有很多多实例学习(multi-instanc ...

  5. 论文阅读笔记-后门攻击及防御

    hello,这是鑫鑫鑫的论文分享站,今天分享的文章是Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks,一篇关于后门攻击及 ...

  6. Clean-label Backdoor Attack against Deep Hashing based Retrieval论文笔记

    论文名称 Clean-label Backdoor Attack against Deep Hashing based Retrieval 作者 Kuofeng Gao (Tsinghua Unive ...

  7. 基于激活聚类的后门检测:Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

    Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering 网址:https://arxiv.org/abs ...

  8. 【论文笔记1】von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification

    [论文笔记1]von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification 1 介绍 人脸识 ...

  9. 论文翻译:2022_PACDNN: A phase-aware composite deep neural network for speech enhancement

    论文地址:PACDNN:一种用于语音增强的相位感知复合深度神经网络 相似代码:https://github.com/phpstorm1/SE-FCN 引用格式:Hasannezhad M,Yu H,Z ...

  10. 论文笔记:Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Network - Chen, Wu

    论文笔记:Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Network - Chen, Wu 原文链接 I ...


  1. 服务不支持 chkconfig 的解决方法
  2. 【alibaba-cloud】openfeign的使用
  3. 调整 W600 PWM的输出频率
  4. Python 实用冷门知识整理
  5. 【网址收藏】linux namespace和cgroup
  6. httppost数据上传 unity_Unity中国增强版发布
  7. 第10章 例题 7-3 递归实现逆序输出整数
  8. Bash数组操作教程
  9. 120余家自动驾驶公司的行业汇总
  10. OEA 2.11 支持单机版数据库 - SQLite与SQLCE对比
  11. python数据库操作封装_Python 封装一个操作mysql的类
  12. Android:强制EditText删除焦点? [重复]
  13. MySQL_02之增删改查、PHP数据库操作
  14. Qt中鼠标事件捕获与Qt对象事件过滤
  15. JAVA 配置文件 路径_Java配置文件读取和路径设置
  16. html弹窗确认取消公告代码,javascript实现确定和取消提示框效果
  17. SpringBoot项目实现qq邮箱验证码登录
  18. SLAM中双目三角化
  19. 阿里企业邮箱设置过滤方法
  20. 函数练习题,个人名片展示


  1. java g1 配置_项目G1 jvm 常规参数配置
  2. innodb引擎的三大特性,插入缓冲(change buffer),两次写(doule write),自适应哈希索引(AHI)
  3. 红米1A显示器于笔记本win10环境下,如何设置颜色范围使得显示器亮度恢复成250nit
  4. 【目标检测】(13) 先验框解码,调整预测框,附TensorFlow完整代码
  5. 找零钱问题(C语言实现)——贪心算法应用(1)
  6. CleanMyMac最新破解安装版
  7. 重仓金融股却遭“滑铁卢”
  8. Skype、MSN/Live Messenger、Lync全面整合
  9. 聊天室——MYSQL建表
  10. [论文笔记] MassBrowser: Unblocking the Censored Web for the Masses, by the Masses