An Empirical Study on Bash Language Usage in Github

Loading...
Thumbnail Image

Date

2021-05-27

Authors

Li, Zheyang

Advisor

Sun, Chengnian

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

The Bourne-again shell (Bash) is a prevalent scripting language for orchestrating shell commands and managing resources in Unix-like environments. At the time of writing, it is one of the mainstream shell dialects that is available on most GNU Linux systems. However, the unique syntax and semantic of shell languages could easily lead to unintended behaviors if carelessly used. Prior studies primarily focused on replacing Bash with different languages and there is not much empirical evidence studying the usage of the Bash itself in practice. In this study, we perform a wide-ranging empirical study of Bash usage, based on an analysis over one million open-source Bash scripts found in Github repositories. We identify and discuss which features and utilities of Bash are most often used. Using static analysis, we find that Bash scripts are often error prone, and the error-proneness has a moderately positive correlation with the size of the script. We also find that the most common problem areas concern quoting, resource management, command options, permissions, and error handling. We envision that the findings of this study can be beneficial for learning Bash and future studies that aim to improve shell and command-line productivity and reliability. In addition, we provide a large dataset of Bash script source code, parse trees and code smell reports of each collected Bash script to facilitate future research in Bash language.

Description

Keywords

software engineering, empirical study

LC Subject Headings

Citation