This paper is to appear in FSE 2016 (The 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering).
Companion tech-report with bug pattern groups, change types and basic change types:
The data for our clusterer tuning can be downloaded here. Contents:
The data for our empirical study can be downloaded here. Contents:
Source code used to analyze the subject systems can be accessed here:
For our study we use two types of subject systems: modules and applications. Modules are used as dependencies and are not run by themselves. Applications are standalone programs that are typically not used as dependencies.
To find Node.js modules, we use npm, a popular package manager for Node.js. npm has become the largest software package repository with over 170,000 packages, now surpassing Maven Central and RubyGems. The npm website provides lists of the most depended-upon packages and most starredby-users packages. From each of the two lists above, we take the top 50 packages and merge them into a single list. We remove 27 duplicates which occur in both lists and three projects which are written in coffescript, resulting in a final set of 70 modules. All these use Git and are hosted on GitHub.
In total, these 70 modules have been downloaded more than 262 million times over the last month, have around 425,000 stargazers (number of users that starred the GitHub repository) on GitHub, and contain around 20,000 bug fixing commits.
Finding popular Node.js applications is not as easy as finding packages, due to the lack of a central repository. Therefore, we rely on lists curated by users, such as blog posts and wiki pages that collect popular Node.js applications.
After we remove duplicates and projects having commit messages not written in English, we end up with a list of 64 Node.js applications with over 4,400 bug fixing commits. These applications have more than 17,000 stargazers on GitHub.
A BCT is the smallest unit of change in our method. This table lists the 582 unique BCTs in the raw data and the number of occurrences of each BCT.
Change types are groups of commits which share a similar set of BCTs and number of modified statements. This table lists the 219 change types discovered in our emperical study.
The last four columns show: Commits: # of commits in the group, Avg Modified Statements: average # of modified statements in the group, BCTs: # of basic change types in the group, Projects: # of unique projects in the group.
Interactive By clicking on a change type, a new table is opened listing all commits in the dataset that belong to this group. A link to GitHub is available to see more information about each commit.
This table has been filtered to show the groups that belong to the Bug Pattern Group #.
Click here to remove the filter and see all change types.
The 12 bug patterns we highlighted as good candidates for detection using static analysis tools. There are many more patterns that can be discovered with the "Change Types" table.
Interactive By clicking on a bug pattern group, the table above gets filtered with the relevant change type groups.