An Empirical Study on Bash Language Usage in Github
Loading...
Date
2021-05-27
Authors
Li, Zheyang
Advisor
Sun, Chengnian
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
The Bourne-again shell (Bash) is a prevalent scripting
language for orchestrating shell commands and managing
resources in Unix-like environments. At the time of
writing, it is one of the mainstream shell dialects that is
available on most GNU Linux systems. However, the unique
syntax and semantic of shell languages could easily lead to
unintended behaviors if carelessly used. Prior studies
primarily focused on replacing Bash with different
languages and there is not much empirical evidence studying
the usage of the Bash itself in practice.
In this study, we perform a wide-ranging empirical study of
Bash usage, based on an analysis over one million
open-source Bash scripts found in Github repositories. We
identify and discuss which features and utilities of Bash
are most often used. Using static analysis, we find that
Bash scripts are often error prone, and the error-proneness
has a moderately positive correlation with the size of the
script. We also find that the most common problem areas
concern quoting, resource management, command options,
permissions, and error handling. We envision that the
findings of this study can be beneficial for learning Bash
and future studies that aim to improve shell and
command-line productivity and reliability. In addition, we
provide a large dataset of Bash script source code,
parse trees and code smell reports of each collected
Bash script to facilitate future research in Bash language.
Description
Keywords
software engineering, empirical study