This paper is to appear in FSE 2016 (The 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering).

Abstract

JavaScript has become the most popular language used by developers for client and server side programming. The language, however, still lacks proper support in the form of warnings about potential bugs in the code. Most bug finding tools in use today cover bug patterns that are discovered by reading best practices or through developer intuition and anecdotal observation. As such, it is still unclear which bugs happen frequently in practice and which are important for developers to be fixed. We propose a novel semi-automatic technique, called BugAID, for discovering the most prevalent and detectable bug patterns. BugAID is based on unsupervised machine learning using language-construct-based changes distilled from AST differencing of bug fixes in the code. We present a large-scale study of common bug patterns by mining 105K commits from 134 JavaScript projects. We discover 219 bug fixing change types and discuss 12 pervasive bug patterns that occur across multiple projects and can likely be prevented with better tool support. Our findings are useful for improving tools and techniques to prevent common bugs in JavaScript, guiding tool integration for IDEs and making developers aware of common mistakes involved with programming in JavaScript.

Downloads

Companion tech-report with bug pattern groups, change types and basic change types:

Appendix


The data for our clusterer tuning can be downloaded here. Contents:

  1. Database (.csv) and change types (.arff) for language construct feature vector
  2. Database (.csv) and change types (.arff) for statement feature vector
  3. Database (.csv) and change types (.arff) for node feature vector

Download Tuning Data


The data for our empirical study can be downloaded here. Contents:

  1. Database of basic change types (.csv)
  2. Dataset of change types (.arff)

Download Empirical Data


Source code used to analyze the subject systems can be accessed here:

Source Code

Subject Systems

For our study we use two types of subject systems: modules and applications. Modules are used as dependencies and are not run by themselves. Applications are standalone programs that are typically not used as dependencies.

npm Modules

To find Node.js modules, we use npm, a popular package manager for Node.js. npm has become the largest software package repository with over 170,000 packages, now surpassing Maven Central and RubyGems. The npm website provides lists of the most depended-upon packages and most starredby-users packages. From each of the two lists above, we take the top 50 packages and merge them into a single list. We remove 27 duplicates which occur in both lists and three projects which are written in coffescript, resulting in a final set of 70 modules. All these use Git and are hosted on GitHub.

In total, these 70 modules have been downloaded more than 262 million times over the last month, have around 425,000 stargazers (number of users that starred the GitHub repository) on GitHub, and contain around 20,000 bug fixing commits.


Applications

Finding popular Node.js applications is not as easy as finding packages, due to the lack of a central repository. Therefore, we rely on lists curated by users, such as blog posts and wiki pages that collect popular Node.js applications.

After we remove duplicates and projects having commit messages not written in English, we end up with a list of 64 Node.js applications with over 4,400 bug fixing commits. These applications have more than 17,000 stargazers on GitHub.


Basic Change Types

A BCT is the smallest unit of change in our method. This table lists the 582 unique BCTs in the raw data and the number of occurrences of each BCT.


Change Types

Change types are groups of commits which share a similar set of BCTs and number of modified statements. This table lists the 219 change types discovered in our emperical study.

The last four columns show: Commits: # of commits in the group, Avg Modified Statements: average # of modified statements in the group, BCTs: # of basic change types in the group, Projects: # of unique projects in the group.

Interactive By clicking on a change type, a new table is opened listing all commits in the dataset that belong to this group. A link to GitHub is available to see more information about each commit.


Bug Pattern Groups

The 12 bug patterns we highlighted as good candidates for detection using static analysis tools. There are many more patterns that can be discovered with the "Change Types" table.

Interactive By clicking on a bug pattern group, the table above gets filtered with the relevant change type groups.