Abstract
The Bourne-again shell (Bash) is a prevalent scripting language for orchestrating shell commands and managing resources in Unix-like environments. It is one of the mainstream shell dialects that is available on most GNU Linux systems. However, the unique syntax and semantics of Bash could easily lead to unintended behaviors if carelessly used. Prior studies primarily focused on improving the reliability of Bash scripts or facilitating writing Bash scripts; there is yet no empirical study on the characteristics of Bash programs written in reality, e.g., frequently used language features, common code smells, and bugs.
In this article, we perform a large-scale empirical study of Bash usage, based on analyses over one million open source Bash scripts found in Github repositories. We identify and discuss which features and utilities of Bash are most often used. Using static analysis, we find that Bash scripts are often error-prone, and the error-proneness has a moderately positive correlation with the size of the scripts. We also find that the most common problem areas concern quoting, resource management, command options, permissions, and error handling. We envision that these findings can be beneficial for learning Bash and future research that aims to improve shell and command-line productivity and reliability.
- [1] [n.d.]. Advanced Bash-Scripting Guide. Retrieved June 2, 2021 from https://tldp.org/LDP/abs/html/internalvariables.html.Google Scholar
- [2] . 2020. Project CLAI: Instrumenting the Command Line as a New Environment for AI Agents.
arxiv:2002.00762 [cs.HC]. Google ScholarCross Ref - [3] . 2017. A large-scale study of programming languages and code quality in GitHub. Commun. ACM 60, 10 (
Sept. 2017), 91–100. Google ScholarDigital Library - [4] . 2011. Assessing programming language impact on development and maintenance: A study on C and C++. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE’11). 171–180. Google ScholarDigital Library
- [5] . 1978. An Introduction to the UNIX Shell. Bell Laboratories. Computing Science.Google Scholar
- [6] . 1987. Scatterplot matrix techniques for large N. J. Amer. Statist. Assoc. 82, 398 (1987), 424–436.Google ScholarCross Ref
- [7] . 2001. An empirical study of operating systems errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). Association for Computing Machinery, New York, NY, 73–88. Google ScholarDigital Library
- [8] . 2007. An empirical study of Java bytecode programs. Softw: Pract. Exper. 37, 6 (2007), 581–641. Google ScholarDigital Library
- [9] . 2017. NoFAQ: Synthesizing command repairs from examples. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’17). Association for Computing Machinery, New York, NY, 582–592. Google ScholarDigital Library
- [10] . 2015. Bash2py: A bash to Python translator. In Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15). 508–511. Google ScholarCross Ref
- [11] . 2018. Testing probabilistic programming systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). Association for Computing Machinery, New York, NY, 574–586. Google ScholarDigital Library
- [12] . 2014. Mining billions of AST nodes to study actual and potential usage of Java language features. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). Association for Computing Machinery, New York, NY, 779–790. Google ScholarDigital Library
- [13] . 2020. Bash. Retrieved February 2, 2021 from https://www.gnu.org/software/bash/.Google Scholar
- [14] . 2020. GNU Bash Manual. Retrieved February 15, 2021 from https://www.gnu.org/software/bash/manual/.Google Scholar
- [15] . 2020. GNU Core Utilities. Retrieved February 15, 2021 from https://www.gnu.org/software/coreutils/.Google Scholar
- [16] . 2020. The 2020 State of the Octoverse. Retrieved February 2, 2021 from https://octoverse.github.com/.Google Scholar
- [17] . 2019. Executable formal semantics for the POSIX shell. Proc. ACM Program. Lang. 4, POPL (
Dec. 2019), Article43 , 30 pages. Google ScholarDigital Library - [18] . 2021. Bash Pitfalls. Retrieved February 23, 2021 from https://mywiki.wooledge.org/BashPitfalls/.Google Scholar
- [19] . 2015. What change history tells us about thread synchronization. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’15). Association for Computing Machinery, New York, NY, 426–438. Google ScholarDigital Library
- [20] . 2013. An empirical study of PHP feature usage: A static analysis perspective. In Proceedings of the 2013 International Symposium on Software Testing and Analysis (ISSTA’13). Association for Computing Machinery, New York, NY, 325–335. Google ScholarDigital Library
- [21] . 2021. ShellCheck. Retrieved February 2, 2021 from https://www.shellcheck.net/.Google Scholar
- [22] . 2012. Understanding and detecting real-world performance bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). Association for Computing Machinery, New York, NY, 77–88. Google ScholarDigital Library
- [23] . 2011. Evolution of Shells in Linux. Retrieved April 11, 2021 from https://web.archive.org/web/20210411144653/https://developer.ibm.com/technologies/linux/tutorials/l-linux-shells/.Google Scholar
- [24] . 2011. Large-scale, AST-based API-usage analysis of open-source Java projects. In Proceedings of the 2011 ACM Symposium on Applied Computing (SAC’11). Association for Computing Machinery, New York, NY, 1317–1324. Google ScholarDigital Library
- [25] . 2022. Bash in the Wild: Language Usage, Code Smells, and Bugs. Google ScholarCross Ref
- [26] . 2018. NL2Bash: A corpus and semantic parser for natural language interface to the Linux operating system. In LREC: Language Resources and Evaluation Conference.Google Scholar
- [27] . 2014. A study of Linux file system evolution. ACM Trans. Storage 10, 1 (
Jan. 2014), Article3 , 32 pages. Google ScholarDigital Library - [28] . 2008. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). Association for Computing Machinery, New York, NY, 329–339. Google ScholarDigital Library
- [29] . 1976. Using a command language as a high-level programming language. In Proceedings of the 2nd International Conference on Software Engineering (ICSE’76). IEEE Computer Society Press, Washington, DC, 169–176.Google ScholarDigital Library
- [30] . 2007. ABASH: Finding bugs in bash scripts. In Proceedings of the 2007 Workshop on Programming Languages and Analysis for Security (PLAS’07). Association for Computing Machinery, New York, NY, 105–114. Google ScholarDigital Library
- [31] . 2020. Understanding memory and thread safety practices and issues in real-world rust programs. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’20). Association for Computing Machinery, New York, NY, 763–779. Google ScholarDigital Library
- [32] . 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA’16). Association for Computing Machinery, New York, NY, 294–305. Google ScholarDigital Library
- [33] . 2019. Bash-Builtins. Retrieved February 15, 2021 from http://manpages.ubuntu.com/manpages/bionic/man7/bash-builtins.7.html.Google Scholar
- [34] . 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’18). ACM, New York, NY, 129–140. Google ScholarDigital Library
- [35] . 2015. An empirical study on real bug fixes. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 913–923. Google ScholarCross Ref
- [36] . 2016. Card-sorting: From text to themes. In Perspectives on Data Science for Software Engineering. Elsevier, 137–141.Google ScholarCross Ref
Index Terms
- Bash in the Wild: Language Usage, Code Smells, and Bugs
Recommendations
Are architectural smells independent from code smells? An empirical study
Highlights- Case study analyzing the correlations among code smells, groups of code smells and architectural smells.
AbstractBackground. Architectural smells and code smells are symptoms of bad code or design that can cause different quality problems, such as faults, technical debt, or difficulties with maintenance and evolution. Some studies ...
Comments