skip to main content
10.1145/3417113.3422155acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
short-paper

A vision to mitigate bioinformatics software development challenges

Published:22 January 2021Publication History

ABSTRACT

Developers construct bioinformatics software to automate crucial analysis and research related to biological science. However, challenges while developing bioinformatics software can prohibit advancement in biological science research. Through a human-centric systematic analysis, we can identify challenges related to bioinformatics software development and envision future research directions. From our qualitative analysis with 221 Stack Overflow questions, we identify six categories of challenges: file operations, searching genetic entities, defect resolution, configuration management, sequence alignment, and translation of genetic information. To mitigate the identified challenges we envision three research directions that require synergies between bioinformatics and automated software engineering: (i) automated configuration recommendation using optimization algorithms, (ii) automated and comprehensive defect categorization, and (iii) intelligent task assistance with active and reinforcement learning.

References

  1. Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. 1990. Basic local alignment search tool. Journal of molecular biology 215, 3 (1990), 403--410.Google ScholarGoogle ScholarCross RefCross Ref
  2. Evan Anderson, G. Veith, and David Weininger. 1987. SMILES: a line notation and computerized interpreter for chemical structures.Google ScholarGoogle Scholar
  3. P. Arora, D. Ganguly, and G. J. F. Jones. 2015. The good, the bad and their kins: Identifying questions with negative scores in StackOverflow. In 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 1232--1239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sebastian Baltes, Lorik Dumani, Christoph Treude, and Stephan Diehl. 2018. SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR '18). ACM, New York, NY, USA, 319--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brock Angus Campbell and Christoph Treude. 2017. NLP2Code: Code snippet content assist via natural language tasks. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 628--632.Google ScholarGoogle ScholarCross RefCross Ref
  6. Mikaela Cashman, Myra B. Cohen, Priya Ranjan, and Robert W. Cottingham. 2018. Navigating the Maze: The Impact of Configurability in Bioinformatics Software. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE 2018). Association for Computing Machinery, New York, NY, USA, 757--767. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. K. Chilana, C. L. Palmer, and A. J. Ko. 2009. Comparing bioinformatics software development by computer scientists and biologists: An exploratory study. In 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering. 72--79.Google ScholarGoogle Scholar
  8. Levin Clement, Dynomant Emeric, Mouchard Laurent, Landsman David, Hovig Eivind, Vlahovicek Kristian, et al. 2018. A data-supported history of bioinformatics tools. arXiv preprint arXiv:1807.06808 (2018).Google ScholarGoogle Scholar
  9. Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, and Michiel J. L. de Hoon. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 11 (03 2009), 1422--1423. arXiv:https://academic.oup.com/bioinformatics/article-pdf/25/11/1422/944180/btp163.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20, 1 (1960), 37--46. Google ScholarGoogle ScholarCross RefCross Ref
  11. Benjamin F Crabtree and William L Miller. 1999. Doing qualitative research. sage publications.Google ScholarGoogle Scholar
  12. Kalyanmoy Deb. 2001. Multi-objective optimization using evolutionary algorithms. Vol. 16. John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stack Exchange. 2019. Stack Exchange. https://data.stackexchange.com/. [Online; accessed 08-06-2020].Google ScholarGoogle Scholar
  14. E. Farhana, N. Imtiaz, and A. Rahman. 2019. Synthesizing Program Execution Time Discrepancies in Julia Used for Scientific Software. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 496--500.Google ScholarGoogle Scholar
  15. Mathieu Fourment and Michael R. Gillings. 2007. A comparison of common programming languages used in bioinformatics. BMC Bioinformatics 9 (2007), 82 -- 82.Google ScholarGoogle ScholarCross RefCross Ref
  16. Anthony JF Griffiths, Susan R Wessler, Richard C Lewontin, William M Gelbart, David T Suzuki, Jeffrey H Miller, et al. 2005. An introduction to genetic analysis. Macmillan.Google ScholarGoogle Scholar
  17. Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API Learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Seattle, WA, USA) (FSE 2016). Association for Computing Machinery, New York, NY, USA, 631--642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Junxiao Han, Emad Shihab, Zhiyuan Wan, Shuiguang Deng, and Xin Xia. 2020. What do Programmers Discuss about Deep Learning Frameworks. EMPIRICAL SOFTWARE ENGINEERING (2020).Google ScholarGoogle Scholar
  19. Qiao Huang, Xin Xia, Zhenchang Xing, David Lo, and Xinyu Wang. 2018. API Method Recommendation without Worrying about the Task-API Knowledge Gap. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE 2018). Association for Computing Machinery, New York, NY, USA, 293--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wolfgang Huber, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, Sean Davis, Laurent Gatto, Thomas Girke, et al. 2015. Orchestrating high-throughput genomic analysis with Bioconductor. Nature methods 12, 2 (2015), 115.Google ScholarGoogle Scholar
  21. Nasif Imtiaz, Akond Rahman, Effat Farhana, and Laurie Williams. 2019. Challenges with Responding to Static Analysis Tool Alerts. In Proceedings of the 16th International Conference on Mining Software Repositories (Montreal, Canada) (MSR '19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Someswa Kesh and Wullianallur Raghupathi. 2004. Critical issues in bioinformatics and computing. Perspectives in health information management/AHIMA, American Health Information Management Association 1 (2004).Google ScholarGoogle Scholar
  23. Muin J Khoury, Terri H Beaty, Terri H Beaty, Bernice H Cohen, et al. 1993. Fundamentals of genetic epidemiology. Vol. 22. Monographs in Epidemiology and.Google ScholarGoogle Scholar
  24. J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159--174. http://www.jstor.org/stable/2529310Google ScholarGoogle ScholarCross RefCross Ref
  25. Brendan Lawlor and Paul Walsh. 2015. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software. Bio-engineered 6, 4 (2015), 193--203. arXiv:https://doi.org/10.1080/21655979.2015.1050162 PMID: 25996054. Google ScholarGoogle ScholarCross RefCross Ref
  26. David W Mount. 2001. Bioinformatics: sequence and genome analysis. Vol. 1. Cold spring harbor laboratory press New York.Google ScholarGoogle Scholar
  27. NCBI. 2020. BLAST Topics. https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp &DOC_TYPE=BlastHelp [Online; accessed 09-06-2020].Google ScholarGoogle Scholar
  28. NCBI. 2020. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/. [Online; accessed 07-06-2020].Google ScholarGoogle Scholar
  29. Akond Rahman, Effat Farhana, and Nasif Imtiaz. 2019. Snakes in Paradise?: Insecure Python-related Coding Practices in Stack Overflow. In Proceedings of the 16th International Conference on Mining Software Repositories (Montreal, Canada) (MSR '19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Akond Rahman, Effat Farhana, Chris Parnin, and Laurie Williams. 2020. Gang of Eight: A Defect Taxonomy for Infrastructure As Code Scripts. In Proceedings of the 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE '20). to appear. pre-print: https://akondrahman.github.io/papers/icse20_acid.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pamela H. Russell, Rachel L. Johnson, Shreyas Ananthan, Benjamin Harnke, and Nichole E. Carlson. 2018. A large-scale analysis of bioinformatics code on GitHub. PLOS ONE 13, 10 (10 2018), 1--19. Google ScholarGoogle ScholarCross RefCross Ref
  32. Johnny Saldana. 2015. The coding manual for qualitative researchers. Sage.Google ScholarGoogle Scholar
  33. Burr Settles. 2009. Active learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google ScholarGoogle Scholar
  34. Yi Shang, Hongchi Shi, and Su-Shing Chen. 2001. An Intelligent Distributed Environment for Active Learning. J. Educ. Resour. Comput. 1, 2es (Aug. 2001), 4--es. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Stack Overflow. 2011. bioinformatics - Find nucleotides in DNA sequence with perl. https://stackoverflow.com/questions/7090371/. [Online; accessed 06-06-2020].Google ScholarGoogle Scholar
  36. Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-Based Web Recommendations: A Reinforcement Learning Approach. In Proceedings of the 2007 ACM Conference on Recommender Systems (Minneapolis, MN, USA) (RecSys '07). Association for Computing Machinery, New York, NY, USA, 113--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Mohammad Tahaei, Kami Vaniea, and Naomi Saphra. 2020. Understanding Privacy-Related Questions on Stack Overflow. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Trias Thireou, George Spyrou, and Vassilis Atlamazoglou. 2007. A Survey of the Availability of Primary Bioinformatics Web Resources. Genomics, Proteomics Bioinformatics 5, 1 (2007), 70 -- 76. Google ScholarGoogle ScholarCross RefCross Ref
  40. Emily Waltz. 2020. Software and Genetic Sequencing Track the Coronavirus's Path. https://spectrum.ieee.org/the-human-os/biomedical/devices/genetic-sequencing-and-online-software-tools-track-caronaviruss-path. [Online; accessed 07-05-2020].Google ScholarGoogle Scholar
  41. Zhiyuan Wan, David Lo, Xin Xia, and Liang Cai. 2017. Bug Characteristics in Blockchain Systems: A Large-Scale Empirical Study. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 413--424.Google ScholarGoogle Scholar

Index Terms

  1. A vision to mitigate bioinformatics software development challenges

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering
      September 2020
      195 pages
      ISBN:9781450381284
      DOI:10.1145/3417113

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 January 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate82of337submissions,24%

      Upcoming Conference

    • Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader