title: “【论文笔记】PassGAN: A Deep Learning Approach for Password Guessing”
date: 2019-10-12
lastmod: 2019-10-12
draft: False
description: “论文描述”
tags: [“论文笔记”,“隐私安全”,“密码”,“深度学习”,“GAN”,“PassGAN”,“密码猜解”]
categories: [“论文笔记”]

论文主题

使用对抗神经网络的方式进行密码生成和密码猜解。

摘要

中文

最先进的密码猜测工具，例如HashCat和Ripper John，使用户能够每秒检查数十亿个密码以防密码哈希。除了执行简单的字典攻击外，这些工具还可以使用密码生成规则来扩展密码词典，例如单词的串联（例如“ password123456”）和轻声说话（例如“ password”变为“ p4s5w0rd”）。尽管这些规则在实践中效果很好，但是将其扩展以模拟更多密码是一项艰巨的任务，需要专业知识。为了解决这个问题，在本文中，我们介绍了PassGAN，这是一种新颖的方法，该方法以理论为基础的机器学习算法取代了人为生成的密码规则。 PassGAN不再依赖人工密码分析，而是使用Genversative Adversarial Network（GAN）来从实际密码泄漏中自动学习真实密码的分布，并生成高质量的密码猜测。我们的实验表明，这种方法非常有前途。当我们在两个大型密码数据集上评估PassGAN时，我们能够超越基于规则和最先进的机器学习密码猜测工具。但是，与其他工具相比，PassGAN在没有任何先验密码知识或通用密码结构的情况下获得了此结果。另外，当我们将PassGAN的输出与HashCat的输出组合在一起时，与仅使用HashCat的情况相比，我们能够匹配多51％-73％的密码。这是很了不起的，因为它表明PassGAN可以自主提取大量当前最新规则无法编码的密码属性。

英文

State-of-the-art password guessing tools, such as HashCat and John the Ripper, enable users to check billions of passwords per second against password hashes. In addition to performing straightforward dictionary attacks, these tools can expand password dictionaries using password generation rules, such as concatenation of words (e.g., “password123456”) and leet speak (e.g., “password” becomes “p4s5w0rd”). Although these rules work well in practice, expanding them to model further passwords is a laborious task that requires specialized expertise. To address this issue, in this paper we introduce PassGAN, a novel approach that replaces human-generated password rules with theory-grounded machine learning algorithms. Instead of relying on manual password analysis, PassGAN uses a Generative Adversarial Network (GAN) to autonomously learn the distribution of real passwords from actual password leaks, and to generate high-quality password guesses. Our experiments show that this approach is very promising. When we evaluated PassGAN on two large password datasets, we were able to surpass rule-based and state-of-the-art machine learning password guessing tools. However, in contrast with the other tools, PassGAN achieved this result without any a-priori knowledge on passwords or common password structures. Additionally, when we combined the output of PassGAN with the output of HashCat, we were able to match 51%-73% more passwords than with HashCat alone. This is remarkable, because it shows that PassGAN can autonomously extract a considerable number of password properties that current state-of-the art rules do not encode.

关键字

密码猜解，密码生成，对抗生成网络

解决问题

现实问题

密码猜解问题，由于密码猜解非常简单且实用，因此被广泛应用。

之前研究的不足

密码猜解工具，如HashCat和Join The Ripper，可以结合给定的关键词，进行给定的启发式的密码变形。但这些规则是基于对用户密码的主观猜测，而不是依据大量的密码数据集来进行密码的猜测。而且添加新的规则依赖大量的测试，因此其可靠性很受限制。
密码猜解题是一个非常传统的问题，最开始是暴力破解。然后在2005年的时候，Narayanan等人把马尔可夫模型引入到密码猜解中。但是这时候还只是使用人工生成的固定密码规则。Weir等人对这种方法进行了改进，他引入了上下文无关语法（Probabilistic Context-Free Grammars, PCFGs）。
2006年的时候Ciaramella把神经网络引入到密码猜解领域中。
最近，Melicher[1]的FLA使用了RNN的方法进行密码的生成和猜解，但是应用的目标为密码强度的验证。Melicher目标为实现尽可能快的密码强度的验证，同时使模型尽可能精简。模型压缩到在本地JavaScript代码可以实现。

方法

使用对抗生成网络进行密码的生成。

特点与不足

特点

使用对抗生成网络进行高质量的密码生成，匹配率非常高
能和最先进的密码生成规则竞争
密码生成并不局限于密码的小的子集
性能表现和另外一篇USENix的论文性能相当。《Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks》
和HashCat结合后，能生成53%-73%多的密码。
不需要先验知识

不足

不能结合先验知识

作者

一作的研究方向为安全和深度学习。

还有个youtube的作者介绍了PassGan这一成果。

https://www.youtube.com/watch?v=b92sTdRRwvs

论文出处

[1] Hitaj B, Gasti P, Ateniese G, Perez-Cruz F. PassGAN: a deep learning approach for password guessing. arXiv:1709.00440 [cs, stat], 2017.

实验及效果

使用的模型为Improved training of Wasserstein GANs (IWGAN)。

生成网络和判别网络如下图所示。

使用的数据集为RockYou和LinkedIn。

效果：

PassGan生成的唯一密码的数量以及在测试集中的匹配度。

可以看到在训练集中没有，但是在测试集中包含的样本数量达到了34%。

这是在RockYou 训练集中最常见密码，以及分别使用FLA和PassGan生成的最常见的密码的频率对比。

收获

使用对抗神经网络进行密码的生成。

论文中对于密码生成的评价标准有：输出的分布空间；与测试集的匹配度；在与其他同居达到相同性能时，需要生成的密码数目；生成的高频密码的分布。

FLA是另一个使用深度学习进行密码生成的工具，也是本文的主要Benchmark之一。

可扩展结合的点

可以使用训练后的判别网络进行密码的判断。在HTTP的报文中进行密码的识别。

论文评价

评分: ⭐⭐⭐⭐

评价: 想法不错，但是效果不是那么理想。但是能够解决特定场景下的问题。
在理论推导上有一定欠缺。

参考资料及附件

[1] Melicher W, Ur B, Segreti SM, Komanduri S, Bauer L, Christin N, Cranor LF. Fast, lean, and accurate: modeling password guessability using neural networks. 25th USENIX Security Symposium (USENIX Security 16). Austin, TX: USENIX Association, 2016: 175–191.

[2] https://www.youtube.com/watch?v=b92sTdRRwvs

[3] https://github.com/brannondorsey/PassGAN

[4] https://github.com/d4ichi/PassGAN

欢迎评论以及发邮件和作者交流心得。
版权声明：本文为 m2kar 的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
作者: m2kar
打赏链接: 欢迎打赏m2kar,您的打赏是我创作的重要源泉
邮箱: m2kar.cn#gmail.com
主页: m2kar.cn
Github: github.com/m2kar
CSDN: M2kar的专栏

【论文笔记】PassGAN: A Deep Learning Approach for Password Guessing相关推荐

【论文笔记】 LSTM-BASED DEEP LEARNING MODELS FOR NONFACTOID ANSWER SELECTION
一.简介这篇论文由IBM Watson发表在2016 ICLR,目前引用量92.这篇论文的研究主题是answer selection,作者在这篇论文基础上[Applying Deep Learnin ...
【论文笔记】Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: Classification and
声明不定期更新自己精度论文,通俗易懂,初级小白也可以理解涉及范围:深度学习方向,包括 CV.NLP.Data Fusion.Digital Twin 论文标题:Multi-task deep le ...
论文笔记【Wide Deep Learning for Recommender Systems】
标题 * 表示未完成论文原文传送门文章提出了 m e m o r i z a t i o n a n d g e n e r a l i z a t i o n memorization\ and ...
【论文笔记】APPLYING DEEP LEARNING TO ANSWER SELECTION: A STUDY AND AN OPEN TASK
一.简介这篇论文的任务是问答,输入一个question,从候选集中找到对应的answer.其实也可以看成paraphrase identification任务,或者是短文本匹配. 文中使用的数据集是 ...
论文笔记-Applications of Deep Learning in Fundus Images: A Review（1）
文章目录目录摘要 1.引言 2.损伤检测 2.1 出血HEs(对应label:渗透) 2.2 微血管瘤 2.3渗出物(对应label:无灌注区) 2.4.多种病变 3.疾病诊断/分级 3.1糖尿病 ...
【论文阅读】An LSTM-Based Deep Learning Approach for Classifying Malicious Traffic at the Packet Level
原文标题:An LSTM-Based Deep Learning Approach for Classifying Malicious Traffic at the Packet Level 原文作者 ...
scDeepCluster:Clustering single-cell RNA-seq data with a model-based deep learning approach论文解读
这是2019年发表于nature子刊machine intelligence的一篇论文,作者是Tian Tian , Ji Wan, Qi Song and Zhi Wei.论文主要是提出了一个新的框 ...
A Generalized Deep Learning Approach for Evaluating Secondary Pulmonary Tuberculosis...论文总结
A Generalized Deep Learning Approach for Evaluating Secondary Pulmonary Tuberculosis on Chest Comput ...
最新年龄估计综述(Deep learning approach for facial age classification: a survey of the state of the art)
目录 @[TOC](文章目录) #一.常用数据集 #二.常用的年龄识别方法 #1.多分类(MC) #2.度量回归(metric regression,MR) #3.排序(ranking) #4.深度标 ...

【论文笔记】PassGAN: A Deep Learning Approach for Password Guessing