skip to main content
research-article

Bash in the Wild: Language Usage, Code Smells, and Bugs

Published:13 February 2023Publication History
Skip Abstract Section

Abstract

The Bourne-again shell (Bash) is a prevalent scripting language for orchestrating shell commands and managing resources in Unix-like environments. It is one of the mainstream shell dialects that is available on most GNU Linux systems. However, the unique syntax and semantics of Bash could easily lead to unintended behaviors if carelessly used. Prior studies primarily focused on improving the reliability of Bash scripts or facilitating writing Bash scripts; there is yet no empirical study on the characteristics of Bash programs written in reality, e.g., frequently used language features, common code smells, and bugs.

In this article, we perform a large-scale empirical study of Bash usage, based on analyses over one million open source Bash scripts found in Github repositories. We identify and discuss which features and utilities of Bash are most often used. Using static analysis, we find that Bash scripts are often error-prone, and the error-proneness has a moderately positive correlation with the size of the scripts. We also find that the most common problem areas concern quoting, resource management, command options, permissions, and error handling. We envision that these findings can be beneficial for learning Bash and future research that aims to improve shell and command-line productivity and reliability.

REFERENCES

  1. [1] [n.d.]. Advanced Bash-Scripting Guide. Retrieved June 2, 2021 from https://tldp.org/LDP/abs/html/internalvariables.html.Google ScholarGoogle Scholar
  2. [2] Agarwal Mayank, Barroso Jorge J., Chakraborti Tathagata, Dow Eli M., Fadnis Kshitij, Godoy Borja, Pallan Madhavan, and Talamadupula Kartik. 2020. Project CLAI: Instrumenting the Command Line as a New Environment for AI Agents. arxiv:2002.00762 [cs.HC]. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Baishakhi Ray, Daryl Posnett, Premkumar Devanbu, and Vladimir Filkov. 2017. A large-scale study of programming languages and code quality in GitHub. Commun. ACM 60, 10 (Sept.2017), 91100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bhattacharya Pamela and Neamtiu Iulian. 2011. Assessing programming language impact on development and maintenance: A study on C and C++. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE’11). 171180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Bourne Stephen R.. 1978. An Introduction to the UNIX Shell. Bell Laboratories. Computing Science.Google ScholarGoogle Scholar
  6. [6] Carr Daniel B., Littlefield Richard J., Nicholson W. L., and Littlefield J. S.. 1987. Scatterplot matrix techniques for large N. J. Amer. Statist. Assoc. 82, 398 (1987), 424436.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chou Andy, Yang Junfeng, Chelf Benjamin, Hallem Seth, and Engler Dawson. 2001. An empirical study of operating systems errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). Association for Computing Machinery, New York, NY, 7388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Collberg Christian, Myles Ginger, and Stepp Michael. 2007. An empirical study of Java bytecode programs. Softw: Pract. Exper. 37, 6 (2007), 581641. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] D’Antoni Loris, Singh Rishabh, and Vaughn Michael. 2017. NoFAQ: Synthesizing command repairs from examples. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’17). Association for Computing Machinery, New York, NY, 582592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Davis Ian J., Wexler Mike, Zhang Cheng, Holt Richard. C., and Weber Theresa. 2015. Bash2py: A bash to Python translator. In Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15). 508511. Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Dutta Saikat, Legunsen Owolabi, Huang Zixin, and Misailovic Sasa. 2018. Testing probabilistic programming systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). Association for Computing Machinery, New York, NY, 574586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Dyer Robert, Rajan Hridesh, Nguyen Hoan Anh, and Nguyen Tien N.. 2014. Mining billions of AST nodes to study actual and potential usage of Java language features. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). Association for Computing Machinery, New York, NY, 779790. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Foundation Free Software. 2020. Bash. Retrieved February 2, 2021 from https://www.gnu.org/software/bash/.Google ScholarGoogle Scholar
  14. [14] Foundation Free Software. 2020. GNU Bash Manual. Retrieved February 15, 2021 from https://www.gnu.org/software/bash/manual/.Google ScholarGoogle Scholar
  15. [15] Foundation Free Software. 2020. GNU Core Utilities. Retrieved February 15, 2021 from https://www.gnu.org/software/coreutils/.Google ScholarGoogle Scholar
  16. [16] Github. 2020. The 2020 State of the Octoverse. Retrieved February 2, 2021 from https://octoverse.github.com/.Google ScholarGoogle Scholar
  17. [17] Greenberg Michael and Blatt Austin J.. 2019. Executable formal semantics for the POSIX shell. Proc. ACM Program. Lang. 4, POPL (Dec.2019), Article 43, 30 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Greg. 2021. Bash Pitfalls. Retrieved February 23, 2021 from https://mywiki.wooledge.org/BashPitfalls/.Google ScholarGoogle Scholar
  19. [19] Gu Rui, Jin Guoliang, Song Linhai, Zhu Linjie, and Lu Shan. 2015. What change history tells us about thread synchronization. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’15). Association for Computing Machinery, New York, NY, 426438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Hills Mark, Klint Paul, and Vinju Jurgen. 2013. An empirical study of PHP feature usage: A static analysis perspective. In Proceedings of the 2013 International Symposium on Software Testing and Analysis (ISSTA’13). Association for Computing Machinery, New York, NY, 325335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Holen Vidar. 2021. ShellCheck. Retrieved February 2, 2021 from https://www.shellcheck.net/.Google ScholarGoogle Scholar
  22. [22] Jin Guoliang, Song Linhai, Shi Xiaoming, Scherpelz Joel, and Lu Shan. 2012. Understanding and detecting real-world performance bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). Association for Computing Machinery, New York, NY, 7788. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Jones M.. 2011. Evolution of Shells in Linux. Retrieved April 11, 2021 from https://web.archive.org/web/20210411144653/https://developer.ibm.com/technologies/linux/tutorials/l-linux-shells/.Google ScholarGoogle Scholar
  24. [24] Lämmel Ralf, Pek Ekaterina, and Starek Jürgen. 2011. Large-scale, AST-based API-usage analysis of open-source Java projects. In Proceedings of the 2011 ACM Symposium on Applied Computing (SAC’11). Association for Computing Machinery, New York, NY, 13171324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Li Zheyang, Dong Yiwen, Tian Yongqiang, Sun Chengnian, Godfrey Michael W., and Nagappan Meiyappan. 2022. Bash in the Wild: Language Usage, Code Smells, and Bugs. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Lin Xi Victoria, Wang Chenglong, Zettlemoyer Luke, and Ernst Michael D.. 2018. NL2Bash: A corpus and semantic parser for natural language interface to the Linux operating system. In LREC: Language Resources and Evaluation Conference.Google ScholarGoogle Scholar
  27. [27] Lu Lanyue, Arpaci-Dusseau Andrea C., Arpaci-Dusseau Remzi H., and Lu Shan. 2014. A study of Linux file system evolution. ACM Trans. Storage 10, 1 (Jan.2014), Article 3, 32 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Lu Shan, Park Soyeon, Seo Eunsoo, and Zhou Yuanyuan. 2008. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). Association for Computing Machinery, New York, NY, 329339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Mashey John R.. 1976. Using a command language as a high-level programming language. In Proceedings of the 2nd International Conference on Software Engineering (ICSE’76). IEEE Computer Society Press, Washington, DC, 169176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Mazurak Karl and Zdancewic Steve. 2007. ABASH: Finding bugs in bash scripts. In Proceedings of the 2007 Workshop on Programming Languages and Analysis for Security (PLAS’07). Association for Computing Machinery, New York, NY, 105114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Qin Boqin, Chen Yilun, Yu Zeming, Song Linhai, and Zhang Yiying. 2020. Understanding memory and thread safety practices and issues in real-world rust programs. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’20). Association for Computing Machinery, New York, NY, 763779. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Sun Chengnian, Le Vu, Zhang Qirun, and Su Zhendong. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA’16). Association for Computing Machinery, New York, NY, 294305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Ubuntu. 2019. Bash-Builtins. Retrieved February 15, 2021 from http://manpages.ubuntu.com/manpages/bionic/man7/bash-builtins.7.html.Google ScholarGoogle Scholar
  34. [34] Zhang Yuhao, Chen Yifan, Cheung Shing-Chi, Xiong Yingfei, and Zhang Lu. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’18). ACM, New York, NY, 129140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Zhong Hao and Su Zhendong. 2015. An empirical study on real bug fixes. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 913923. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Zimmermann Thomas. 2016. Card-sorting: From text to themes. In Perspectives on Data Science for Software Engineering. Elsevier, 137141.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Bash in the Wild: Language Usage, Code Smells, and Bugs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Software Engineering and Methodology
        ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 1
        January 2023
        954 pages
        ISSN:1049-331X
        EISSN:1557-7392
        DOI:10.1145/3572890
        • Editor:
        • Mauro Pezzè
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 February 2023
        • Online AM: 23 April 2022
        • Accepted: 7 February 2022
        • Revised: 28 November 2021
        • Received: 8 July 2021
        Published in tosem Volume 32, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format