
  • 前言
  • [TSE 2019] - Mining Fix Patterns for FindBugs Violations
    • 1 作者
    • 2 摘要
    • 3 具体工作
  • QA1: 和2013年的PAR有什么区别?
  • 总结

[今日阅读] 软件自动修复 + ML


[TSE 2019] - Mining Fix Patterns for FindBugs Violations

[TSE 2019] - Mining Fix Patterns for FindBugs Violations

1 作者

Kui Liu, Dongsun Kim, Tegawend´e F. Bissyand´ e, Shin Yoo, and Yves Le Traon

2 摘要

Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the adoption of these tools is hindered by their high false positive rates. If the false positive rate is too high, developers may get acclimated to violation reports from these tools, causing concrete and severe bugs being overlooked.

现在静态工具误报率高, false positive rate :假阳率,就是本来没病,但是检查结果是有病。

参考:假阳性率 https://baike.baidu.com/item/假阳性率/6345776?fr=aladdin

Fortunately, some violations are actually addressed and resolved by developers. We claim that those violations that are recurrently fixed are likely to be true positives, and an automated approach can learn to repair similar unseen violations. However, there is lack of a systematic way to investigate the distributions on existing violations and fixed ones in the wild, that can provide insights into prioritizing violations for developers, and an effective way to mine code and fix patterns which can help developers easily understand the reasons of leading violations and how to fix them.

但是一部分violations 已经得到了解决。所以现在作者先去学这些violation都是怎么被修复的,然后去挖掘代码和修复模板来帮助开发人员理解导致violation的原因,并知道怎么去修复。

true positives:真阳。(真正) 本来有错,实际也有错。

参考:[转载]True(False) Positives (Negatives) 的含义和翻译 http://blog.sciencenet.cn/blog-605185-617068.html
True (False) Positives (Negatives) 相关概念 https://blog.csdn.net/opensourcesdr/article/details/73334302

In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations.


To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.


然后去将找到的fix pattern用来修复还未被修复的violation。

结果表明能修复 116个 里面的69个。(都被开发者接受)



3 具体工作

After collecting violation fixing changes from a large number of projects using an AST differencing tool [16], we mine developer fix patterns for static analysis violations. The approach encodes a fixing change into a vector space using Word2Vec [17], extracts discriminating features using Convolutional Neural Networks (CNNs) [18] and regroups similar changes into a cluster using X-means clustering algorithm [19]. We then evaluate the suitability of the mined fix patterns by applying them to 1) a subset of unfixed violations in our subjects, to 2) a subset of faults in Defects4J [20] and to 3) a subset of violations in 10 open source Java projects.

1)从大量项目里面收集violation 修复补丁; 这里使用AST 差分工具


code patterns:是为了观察有哪些violation对应的代码已经被修复或者还没被修复。
fix patterns:是为了提取violation被修复的模式。


For each relevant bug, we consider the fix patterns
associated to their violation types, and manually generate
the patches. When the generated patch candidate can (1)
pass the failed test cases of the corresponding bug and
(2) FindBugs cannot identify any violation at the same
position, then the matched fix pattern is regarded as a
positive fix pattern for this bug.


QA1: 和2013年的PAR有什么区别?


We manually inspected more than 60,000 human-written patches and found there are several common fix patterns.


总共提取了10个模板,针对某个可疑语句,需要先做一个context check,看适不适合用某个模板。




