research-article

Bash in the Wild: Language Usage, Code Smells, and Bugs

Authors:
Yiwen Dong

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada

0000-0002-3205-9010
View Profile

,
Zheyang Li

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Yongqiang Tian

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada

0000-0003-1644-2965
View Profile

,
Chengnian Sun

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Michael W. Godfrey

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Meiyappan Nagappan

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

ACM Transactions on Software Engineering and Methodology Volume 32 Issue 1Article No.: 8pp 1–22https://doi.org/10.1145/3517193

Published:13 February 2023Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

The Bourne-again shell (Bash) is a prevalent scripting language for orchestrating shell commands and managing resources in Unix-like environments. It is one of the mainstream shell dialects that is available on most GNU Linux systems. However, the unique syntax and semantics of Bash could easily lead to unintended behaviors if carelessly used. Prior studies primarily focused on improving the reliability of Bash scripts or facilitating writing Bash scripts; there is yet no empirical study on the characteristics of Bash programs written in reality, e.g., frequently used language features, common code smells, and bugs.

In this article, we perform a large-scale empirical study of Bash usage, based on analyses over one million open source Bash scripts found in Github repositories. We identify and discuss which features and utilities of Bash are most often used. Using static analysis, we find that Bash scripts are often error-prone, and the error-proneness has a moderately positive correlation with the size of the scripts. We also find that the most common problem areas concern quoting, resource management, command options, permissions, and error handling. We envision that these findings can be beneficial for learning Bash and future research that aims to improve shell and command-line productivity and reliability.

REFERENCES

[1] [n.d.]. Advanced Bash-Scripting Guide. Retrieved June 2, 2021 from https://tldp.org/LDP/abs/html/internalvariables.html.Google Scholar
[2] Agarwal Mayank, Barroso Jorge J., Chakraborti Tathagata, Dow Eli M., Fadnis Kshitij, Godoy Borja, Pallan Madhavan, and Talamadupula Kartik. 2020. Project CLAI: Instrumenting the Command Line as a New Environment for AI Agents. arxiv:2002.00762 [cs.HC]. Google ScholarCross Ref
[3] Baishakhi Ray, Daryl Posnett, Premkumar Devanbu, and Vladimir Filkov. 2017. A large-scale study of programming languages and code quality in GitHub. Commun. ACM 60, 10 (Sept.2017), 91–100. Google ScholarDigital Library
[4] Bhattacharya Pamela and Neamtiu Iulian. 2011. Assessing programming language impact on development and maintenance: A study on C and C++. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE’11). 171–180. Google ScholarDigital Library
[5] Bourne Stephen R.. 1978. An Introduction to the UNIX Shell. Bell Laboratories. Computing Science.Google Scholar
[6] Carr Daniel B., Littlefield Richard J., Nicholson W. L., and Littlefield J. S.. 1987. Scatterplot matrix techniques for large N. J. Amer. Statist. Assoc. 82, 398 (1987), 424–436.Google ScholarCross Ref
[7] Chou Andy, Yang Junfeng, Chelf Benjamin, Hallem Seth, and Engler Dawson. 2001. An empirical study of operating systems errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). Association for Computing Machinery, New York, NY, 73–88. Google ScholarDigital Library
[8] Collberg Christian, Myles Ginger, and Stepp Michael. 2007. An empirical study of Java bytecode programs. Softw: Pract. Exper. 37, 6 (2007), 581–641. Google ScholarDigital Library
[9] D’Antoni Loris, Singh Rishabh, and Vaughn Michael. 2017. NoFAQ: Synthesizing command repairs from examples. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’17). Association for Computing Machinery, New York, NY, 582–592. Google ScholarDigital Library
[10] Davis Ian J., Wexler Mike, Zhang Cheng, Holt Richard. C., and Weber Theresa. 2015. Bash2py: A bash to Python translator. In Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15). 508–511. Google ScholarCross Ref
[11] Dutta Saikat, Legunsen Owolabi, Huang Zixin, and Misailovic Sasa. 2018. Testing probabilistic programming systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). Association for Computing Machinery, New York, NY, 574–586. Google ScholarDigital Library
[12] Dyer Robert, Rajan Hridesh, Nguyen Hoan Anh, and Nguyen Tien N.. 2014. Mining billions of AST nodes to study actual and potential usage of Java language features. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). Association for Computing Machinery, New York, NY, 779–790. Google ScholarDigital Library
[13] Foundation Free Software. 2020. Bash. Retrieved February 2, 2021 from https://www.gnu.org/software/bash/.Google Scholar
[14] Foundation Free Software. 2020. GNU Bash Manual. Retrieved February 15, 2021 from https://www.gnu.org/software/bash/manual/.Google Scholar
[15] Foundation Free Software. 2020. GNU Core Utilities. Retrieved February 15, 2021 from https://www.gnu.org/software/coreutils/.Google Scholar
[16] Github. 2020. The 2020 State of the Octoverse. Retrieved February 2, 2021 from https://octoverse.github.com/.Google Scholar
[17] Greenberg Michael and Blatt Austin J.. 2019. Executable formal semantics for the POSIX shell. Proc. ACM Program. Lang. 4, POPL (Dec.2019), Article 43, 30 pages. Google ScholarDigital Library
[18] Greg. 2021. Bash Pitfalls. Retrieved February 23, 2021 from https://mywiki.wooledge.org/BashPitfalls/.Google Scholar
[19] Gu Rui, Jin Guoliang, Song Linhai, Zhu Linjie, and Lu Shan. 2015. What change history tells us about thread synchronization. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’15). Association for Computing Machinery, New York, NY, 426–438. Google ScholarDigital Library
[20] Hills Mark, Klint Paul, and Vinju Jurgen. 2013. An empirical study of PHP feature usage: A static analysis perspective. In Proceedings of the 2013 International Symposium on Software Testing and Analysis (ISSTA’13). Association for Computing Machinery, New York, NY, 325–335. Google ScholarDigital Library
[21] Holen Vidar. 2021. ShellCheck. Retrieved February 2, 2021 from https://www.shellcheck.net/.Google Scholar
[22] Jin Guoliang, Song Linhai, Shi Xiaoming, Scherpelz Joel, and Lu Shan. 2012. Understanding and detecting real-world performance bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). Association for Computing Machinery, New York, NY, 77–88. Google ScholarDigital Library
[23] Jones M.. 2011. Evolution of Shells in Linux. Retrieved April 11, 2021 from https://web.archive.org/web/20210411144653/https://developer.ibm.com/technologies/linux/tutorials/l-linux-shells/.Google Scholar
[24] Lämmel Ralf, Pek Ekaterina, and Starek Jürgen. 2011. Large-scale, AST-based API-usage analysis of open-source Java projects. In Proceedings of the 2011 ACM Symposium on Applied Computing (SAC’11). Association for Computing Machinery, New York, NY, 1317–1324. Google ScholarDigital Library
[25] Li Zheyang, Dong Yiwen, Tian Yongqiang, Sun Chengnian, Godfrey Michael W., and Nagappan Meiyappan. 2022. Bash in the Wild: Language Usage, Code Smells, and Bugs. Google ScholarCross Ref
[26] Lin Xi Victoria, Wang Chenglong, Zettlemoyer Luke, and Ernst Michael D.. 2018. NL2Bash: A corpus and semantic parser for natural language interface to the Linux operating system. In LREC: Language Resources and Evaluation Conference.Google Scholar
[27] Lu Lanyue, Arpaci-Dusseau Andrea C., Arpaci-Dusseau Remzi H., and Lu Shan. 2014. A study of Linux file system evolution. ACM Trans. Storage 10, 1 (Jan.2014), Article 3, 32 pages. Google ScholarDigital Library
[28] Lu Shan, Park Soyeon, Seo Eunsoo, and Zhou Yuanyuan. 2008. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). Association for Computing Machinery, New York, NY, 329–339. Google ScholarDigital Library
[29] Mashey John R.. 1976. Using a command language as a high-level programming language. In Proceedings of the 2nd International Conference on Software Engineering (ICSE’76). IEEE Computer Society Press, Washington, DC, 169–176.Google ScholarDigital Library
[30] Mazurak Karl and Zdancewic Steve. 2007. ABASH: Finding bugs in bash scripts. In Proceedings of the 2007 Workshop on Programming Languages and Analysis for Security (PLAS’07). Association for Computing Machinery, New York, NY, 105–114. Google ScholarDigital Library
[31] Qin Boqin, Chen Yilun, Yu Zeming, Song Linhai, and Zhang Yiying. 2020. Understanding memory and thread safety practices and issues in real-world rust programs. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’20). Association for Computing Machinery, New York, NY, 763–779. Google ScholarDigital Library
[32] Sun Chengnian, Le Vu, Zhang Qirun, and Su Zhendong. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA’16). Association for Computing Machinery, New York, NY, 294–305. Google ScholarDigital Library
[33] Ubuntu. 2019. Bash-Builtins. Retrieved February 15, 2021 from http://manpages.ubuntu.com/manpages/bionic/man7/bash-builtins.7.html.Google Scholar
[34] Zhang Yuhao, Chen Yifan, Cheung Shing-Chi, Xiong Yingfei, and Zhang Lu. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’18). ACM, New York, NY, 129–140. Google ScholarDigital Library
[35] Zhong Hao and Su Zhendong. 2015. An empirical study on real bug fixes. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 913–923. Google ScholarCross Ref
[36] Zimmermann Thomas. 2016. Card-sorting: From text to themes. In Perspectives on Data Science for Software Engineering. Elsevier, 137–141.Google ScholarCross Ref

Index Terms

Bash in the Wild: Language Usage, Code Smells, and Bugs
1. General and reference
  1. Cross-computing tools and techniques
    1. Empirical studies
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

Learning the Bash Shell
Read More
Are architectural smells independent from code smells? An empirical study
Highlights
- Case study analyzing the correlations among code smells, groups of code smells and architectural smells.
Abstract
Background. Architectural smells and code smells are symptoms of bad code or design that can cause different quality problems, such as faults, technical debt, or difficulties with maintenance and evolution. Some studies ...
Read More
Learning the bash shell, third edition
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 32, Issue 1
January 2023
954 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3572890
Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 February 2023
- Online AM: 23 April 2022
- Accepted: 7 February 2022
- Revised: 28 November 2021
- Received: 8 July 2021
Published in tosem Volume 32, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Empirical studies
shell scripts
bash
language features
code smells
bugs
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 697
  Total Downloads
- Downloads (Last 12 months)423
- Downloads (Last 6 weeks)44
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Bash in the Wild: Language Usage, Code Smells, and Bugs

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Learning the Bash Shell

Are architectural smells independent from code smells? An empirical study

Learning the bash shell, third edition