Abstract

1 Introduction

membership inference中adversary的目标是判定一个数据样本是否用于训练目标ML模型。

现有的隐私攻击依赖于ML模型输出的confidence scores(即class probabilities或者logits)。成功的membership inference是由于ML模型的inherent overfitting性质,即一个ML模型在面对它训练的样本时输出分数会更高。图1展示了这种scored-based threat model可以访问的ML模型的组成部分:

这些score-based攻击的一个主要缺点是当只能获取预测标签时就会失败(即模型的最终输出而不是confidence score)。

这激励我们聚焦于一种新的受到很少关注的membership inference攻击,称为Decision-based攻击。这时adversary仅仅依赖于模型的最终输出,即top-1预测标签作为攻击模型的输入。

在这篇文章中,我们建议使用两种在不同场景下的decision-based 攻击, 即transfer attack和boundary attack。

Transfer Attack

We assume the adversary has an auxiliary
dataset (namely shadow dataset) that comes from the same
distribution as the target model’s training set. The assumption also holds for previous score-based attacks [35, 46, 48,
49]. The adversary first queries the target model in a manner analog to cryptographic oracle, thereby relabeling the
shadow dataset by the target model’s predicted labels. Then,
the adversary can use the relabeled shadow dataset to construct a local shadow model to mimic the behavior of the
target model. In this way, the relabeled shadow dataset contains sufficient information from the target model, and membership information can also be transferred to the shadow
model. Finally, the adversary can leverage the shadow model
to launch a score-based membership inference attack locally.

Boundary Attack

收集数据,尤其是敏感及隐私数据是一个有意义的任务。因此,我们考虑一个更困难并且更实际的场景,这时我们没有shadow数据集以及shadow模型。为了补偿在这种场景下的信息缺失,我们将关注点从目标模型的输出转移到输入。这里,我们的key intuition是扰动成员数据样本要比扰动非成员数据样本更难。 adversary在候选数据样本上查询目标模型,并且扰动它们来改变模型的预测标签。接下来adversary可以利用扰动的magnitude来区分成员以及非成员数据样本。

大量的实验验证展示了我们的两种攻击都能实现strong performance。尤其地,我们的boundary attack在某些情况下甚至outperforms先前的score-based攻击。 此外,我们 present a new perspective on the success of current membership inference and show that the distance between a sample and an ML model’s decision boundary is strongly correlated with the sample’s membership status.

最后,我们在多个防御机制上验证了我们的攻击: generalization enhancement [46, 50, 54], privacy enhancement [4] and confidence score perturbation [27,38,56].
The results show that our attacks can bypass most of the
defenses, unless heavy regularization is applied. However
heavy regularization can lead to a significant degradation of
the model accuracy.

大体上来讲,我们的贡献如下:

  • We perform a systematic investigation on membership leakage in label-only exposures of ML models, and introduce decision-based membership inference attacks, which is highly relevant for real-world applications and important to gauge model privacy.
  • We propose two types of decision-based attacks under different scenarios, including transfer attack and boundary attack. Extensive experiments demonstrate that our two types of attacks achieve better performances than the baseline attack, and even outperform the previous score-based attacks in some cases.
  • We propose a new perspective on the reasons for the success of membership inference, and perform a quantitative and qualitative analysis to demonstrate that members of an ML model are more distant from the model’s decision boundary than non-members.
  • We evaluate multiple defenses against our decision-based attacks and show that our novel attacks can still achieve reasonable performance unless heavy regularization is applied.

4 Boundary attack

4.1 Key Intuition

我们攻击的intuition从ML模型的过拟合本质得来。更具体来说,ML模型在它训练过的样本上输出confidence score更高(即,既然模型对于成员数据样本更自信,那么改变它的输出应当更困难)。

图6描绘了两个随机选取的成员数据样本(图6a,6c)以及非成员数据样本(图6b,6d)with respect to M−0\mathcal{M}-0M−0 trained on CIFAR-10。我们可以观察到成员样本的最高分实际上要比非成员样本高得多。我们接下来使用cross-entropy(等式1)来量化the difficulty for an ML model to change its predicted label for a data sample to other labels。

Table 2展示了the cross entropy between the confidence scores and other labels for these samples。我们可以观察到成员样本的cross entropy要显著大于非成员样本。This leads to the following observation on membership information。

Observation

给定一个ML模型以及一系列数据样本,改变目标模型关于成员样本预测标签的花销要比改变非成员样本预测标签的花销大。此外,考虑到黑盒ML模型仅仅提供了标签信息,这意味着adversary can only perturb the data samples to change the target model‘s predicted labels,因此需要改变一个member样本的扰动要高于non-member样本。那么,adversary可以通过观察扰动的magnitude来判断一个样本是否是member。

4.2 Methodology

我们的攻击包含如下三个步骤:decision change、perturbation measurement以及membership inference。The algorithm can be found in Appendix algorithm 2.

Decision Change

我们使用了两个SOTA黑盒对抗攻击:HopSkipJump[12]以及QEBA [30]。

Perturbation Measurement

一旦最终的模型输出被改变,那么我们就可以衡量加到candidate输入样本上的扰动大小。In general,对抗攻击技艺通常使用LpL_pLp​距离,即L0,L1,L2L_0,L_1,L_2L0​,L1​,L2​以及L∞L_{\infty}L∞​来衡量原始图片和扰动样本的perceptual similarity。因此,我们选取LpL_pLp​距离来衡量扰动大小。

Membership Inference

After obtaining the magnitude of
the perturbations, the adversary simply considers a candidate
sample with perturbations larger than a threshold as a member sample, and vice versa. Similar to the transfer attack, we
mainly use AUC as our evaluation metric. We also provide a
general and simple method for choosing a threshold in Section 4.4.

4.3 实验设定

We use the same experimental setup as presented in Section 3.3, such as the dataset splitting strategy and 6 target
models trained on different size of training set Dtrain. In the
decision change stage, we use the implementation of a popular python library (ART3
) for HopSkipJump, and the authors’
source code4
for QEBA. Note that we only apply untargeted
decision change, i.e., changing the initial decision of the target model to any other random decision. Besides, both HopSkipJump and QEBA require multiple queries to perturb data
samples to change their predicted labels. We set 15,000 for
HopSkipJump and 7,000 for QEBA. We further study the influence of the number of queries on the attack performance.
For space reasons, we report the results of HopSkipJump
scheme in the main body of our paper. Results of QEBA
scheme can be found in Appendix Figure 14 and Figure 15.

4.4 Results

Distribution of Perturbation

首先,我们展示了distribution of perturbation between a perturbed sample and its original one for member and non-member samples in Figure-7。HopSkipJump和QEBA都使用L2L_2L2​距离来限制扰动的大小,因此我们也使用L2L_2L2​距离来回报结果。和预期的一样,对于member样本的扰动的大小实际上要大于non-member样本。例如在图7中,member样本的平均L2L_2L2​距离是1.0755,而non-member样本为0.1102。此外,有着更大训练集的模型,即更低过拟合等级的模型需要更小的扰动来改变最终的预测结果。随着过拟合程度的增加,adversary需要在member样本上改变得更多。原因是过拟合程度更高的ML模型has remembered its training samples to a larger extent,因此我们更难修改它们的预测标签(即我们需要更大的扰动)。

Attack AUC Performance

We report the AUC scores over
all datasets in Figure 8. In particular, we compare 4 different distance metrics, i.e., L0, L1, L2, and L∞, for each decision change scheme. From Figure 8, we can observe that
L1, L2 and L∞ metrics achieve the best performance across
all datasets. For instance in Figure 8 (M -1, CIFAR-10), the
AUC scores for L1, L2, and L∞ metrics are 0.8969, 0.8963,
and 0.9033, respectively, while the AUC score for L0 metric
is 0.7405. From Figure 15 (in Appendix), we can also observe the same results of QEBA scheme: L1, L2 and L∞ metrics achieve the best performance across all datasets, while
L0 metric performs the worst. Therefore, an adversary can
simply choose the same distance metric adopted by adversarial attacks to measure the magnitude of the perturbation.

Effects of Number of Queries

To mount boundary attack in real-world ML applications such as Machine Learning as a Service (MLaaS), the adversary cannot issue as many queries as they want to the target model,因为大量的查询会提高攻击的花销并且可能引起模型提供者的怀疑。现在,我们通过不同数量的查询来衡量模型的performance。这里,我们展示了HopSkipJump scheme for M\mathcal{M}M-5 over all datasets。We vary the number of queries from 0 to 15,000
and evaluate the attack performance based on the L2L_2L2​ metric.。如图9所示,在开始阶段AUC会随着查询次数增大快读增加(?) 。在经过了2500次查询后,攻击performance变得stable。 From the results, we argue that query limiting would likely not be a suitable defense. For instance, when querying 131 times, the AUC for CIFAR-10 is 0.8228 and CIFAR-100 is 0.9266.
At this time, though the perturbed sample is far away from
its origin’s decision boundary, the magnitude of perturbation for member samples is still relatively larger than that for nonmember samples. Thus, the adversary can still differentiate
member and non-member samples.

Threshold Choosing

Here, we focus on the threshold
choosing for our boundary attack where the adversary is not
equipped with a shadow dataset. We provide a simple and
general method for choosing a threshold. Concretely, we
generate a set of random samples in the feature space as
the target model’s training set. In the case of image classification, we sample each pixel for an image from a uniform
distribution. Next, we treat these randomly generated samples as non-members and query them to the target model.
Then, we apply adversarial attack techniques on these random samples to change their initial predicted labels by the
target model. Finally, we use these samples’ perturbation to
estimate a threshold, i.e., finding a suitable top t percentile
over these perturbations. The algorithm can be found in Appendix algorithm 3.

Membership Leakage in Label-Only Exposures论文解读相关推荐

  1. Membership Inference Attacks Against Recommender Systems论文解读

    0 摘要 推荐系统通常针对高度敏感的用户数据进行训练,因此推荐系统潜在的数据泄露可能会导致严重的隐私问题. 本文首次尝试通过成员推理的角度来量化推荐系统的隐私泄漏. 与针对机器学习分类器的传统成员推理 ...

  2. 论文解读《Co-Correcting:Noise-tolerant Medical Image Classification via mutual Label Correction》

    论文解读<Co-Correcting:Noise-tolerant Medical Image Classification via mutual Label Correction> 论文 ...

  3. 自监督学习(Self-Supervised Learning)多篇论文解读(下)

    自监督学习(Self-Supervised Learning)多篇论文解读(下) 之前的研究思路主要是设计各种各样的pretext任务,比如patch相对位置预测.旋转预测.灰度图片上色.视频帧排序等 ...

  4. 自监督学习(Self-Supervised Learning)多篇论文解读(上)

    自监督学习(Self-Supervised Learning)多篇论文解读(上) 前言 Supervised deep learning由于需要大量标注信息,同时之前大量的研究已经解决了许多问题.所以 ...

  5. CVPR2020行人重识别算法论文解读

    CVPR2020行人重识别算法论文解读 Cross-modalityPersonre-identificationwithShared-SpecificFeatureTransfer 具有特定共享特征变换 ...

  6. CVPR 2018 论文解读集锦(9月26日更新)

    本文为极市平台原创收集,转载请附原文链接: https://blog.csdn.net/Extremevision/article/details/82757920 CVPR 2018已经顺利闭幕,目 ...

  7. IJCAI 2019 论文解读 | 基于超图网络模型的图网络进化算法

    作者丨张云喆 单位丨暗物智能科技 研究方向丨NLP推理.数学符号推理 研究背景 现实生活中很多的数据可以用图(graph)来建模,比如社交网络数据,paper 引用数据等.对于 AI 而言,一个常见的 ...

  8. YOLOv4论文解读

    论文原文: https://arxiv.org/pdf/2004.10934.pdf 代码实现: https://github.com/AlexeyAB/darknet 一.介绍 原文名称:<Y ...

  9. 论文解读:Combining Distant and Direct Supervision for Neural Relation Extraction

    论文解读:Combining Distant and Direct Supervision for Neural Relation Extraction 夏栀的博客--王嘉宁的个人网站 正式上线,欢迎 ...

  10. (论文解读)High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks

    目录 论文解读之: High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks ...

最新文章

  1. 元宇宙该如何发展才不会变为泡沫
  2. 桌面桌面虚拟化-Vmware 兼容性怎么查询
  3. pytorch 指定层学习率
  4. modelsim do文件仿真
  5. python代码打好了怎么运行-python代码是怎样运行的
  6. Kotlin实战指南五:继承、接口
  7. Vue2.x通用编辑组件的封装及应用
  8. feign调用多个服务_spring cloud各个微服务之间如何相互调用(Feign、Feign带token访问服务接口)...
  9. wp自定义帖子没标签_ofollow标签的作用有重大变化
  10. 信号的采样与恢复matlab实验报告,实验七 连续信号的采样与恢复
  11. 曼彻斯特解密_【专利解密】捷通科技改良VLC芯片,照明通信两不误
  12. 通过 Android SDK Manager 安装面向 Android* 模拟器插件的英特尔® 凌动™ x86 系统映像...
  13. ros之TF坐标转换
  14. 修改服务器后账套不存在,金蝶KIS专业版环境配置常见问题
  15. 编码基本功:以文件大小进行性能测试是错误的
  16. 利率模型暗示美国股市是合理价值(仅做参考)
  17. 计算机单机游戏c0005错误,堡垒之夜Epic Games Launcher错误怎么办错误解决方法介绍...
  18. 20HTML5期末大作业:影视视频网站设计——爱影评在线电影(10页面) HTML+CSS+JavaScript 学生DW网页设计作业成品 web课程设计网页规划与设计 计算机毕设网页设计源码
  19. SpringBoot2.0系列教程(四)Springboot框架自定义消息转换器
  20. 解决电脑能连接WIFI但是无法正常上网问题

热门文章

  1. linux根据端口号查询项目路径
  2. oracle 检查链接数,oracle连接数检查
  3. java SimpleDateFormat类浅析
  4. 动态的顺序表(C语言实现)
  5. python类定义和初始化,Python类定义、属性、初始化和析构,指针定义和初始化
  6. FFA 2021 专场解读 - Flink 核心技术
  7. 大龄程序员失业后,看他们是如何破局突围的?
  8. java的decimalFormat_Java中 DecimalFormat 用法详解
  9. 数据算法_JS数据结构与算法_排序和搜索算法
  10. linux ftp指定下载文件名称,linux中通过FTP下载指定的文件方法linux网页制作 -电脑资料...