skip to main content
10.1145/3387904.3389260acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Supporting Program Comprehension through Fast Query response in Large-Scale Systems

Published:12 September 2020Publication History

ABSTRACT

Software traceability provides support for various engineering activities including Program Comprehension; however, it can be challenging and arduous to complete in large industrial projects. Researchers have proposed automated traceability techniques to create, maintain and leverage trace links. Computationally intensive techniques, such as repository mining and deep learning, have showed the capability to deliver accurate trace links. The objective of achieving trusted, automated tracing techniques at industrial scale has not yet been successfully accomplished due to practical performance challenges. This paper evaluates high-performance solutions for deploying effective, computationally expensive trace-ability algorithms in large scale industrial projects and leverages generated trace links to answer Program Comprehension Queries. We comparatively evaluate four different platforms for supporting industrial-scale tracing solutions, capable of tackling software projects with millions of artifacts. We demonstrate that tracing solutions built using big data frameworks scale well for large projects and that our Spark implementation outperforms relational database, graph database (GraphDB), and plain Java implementations. These findings contradict earlier results which suggested that GraphDB solutions should be adopted for large-scale tracing problems.

References

  1. [n.d.]. Difference Between SQL Vs MySQL Vs SQL Server. https://www.softwaretestinghelp.com/sql-vs-mysql-vs-sql-server. Accessed: 2020-10-30.Google ScholarGoogle Scholar
  2. [n.d.]. Engines. https://db-engines.com/en/Google ScholarGoogle Scholar
  3. [n.d.]. Spark MLib Modlue. https://spark.apache.org/mllib.Google ScholarGoogle Scholar
  4. E. Deelman et al. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13 (2005), 219--237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Islam et al. 2017. Huge and Real-Time Database Systems: A Comparative Study and Review for SQL Server 2016, Oracle 12c & MySQL 5.7 for Personal Computer. Journal of Basic and Applied Sciences 13 (2017), 481--490.Google ScholarGoogle Scholar
  6. M. Armbrust et al. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD). ACM, New York, NY, USA, 1383--1394. https://doi.org/10.1145/2723372.2742797Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Zaharia et al. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65. https://doi.org/10.1145/2934664Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Lohar et al. 2013. Improving trace accuracy through data-driven configuration and composition of tracing features. In European Software Eng. Conference (ESEC/FSE). 378--388. https://doi.org/10.1145/2491411.2491432Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jason Baldridge. 2005. The opennlp project. URL: http://opennlp.apache.org/index.html, (accessed 2 February 2012) (2005), 1.Google ScholarGoogle Scholar
  10. Andrew Begel and Thomas Zimmermann. 2014. Analyze this! 145 questions for data scientists in software engineering. In 36th International Conference on Software Engineering, ICSE '14, Hyderabad, India - May 31 - June 07, 2014. 12--23. https://doi.org/10.1145/2568225.2568233Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Markus Borg, Orlena CZ Gotel, and Krzysztof Wnuk. 2013. Enabling traceability reuse for impact analyses: A feasibility study in a safety context. In Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE). IEEE, 72--78.Google ScholarGoogle ScholarCross RefCross Ref
  13. Lawrence Chung, Daniel Gross, and Eric S. K. Yu. 1999. Architectural Design to Meet Stakeholder Requirements. In Software Architecture, TC2 First Working IFIP Conference on Software Architecture (WICSA1), 22-24 February 1999, San Antonio, Texas, USA. 545--564.Google ScholarGoogle Scholar
  14. Jane Cleland-Huang, Orlena Gotel, Jane Huffman Hayes, Patrick Mäder, and Andrea Zisman. 2014. Software traceability: trends and future directions. In Proceedings of the on Future of Software Engineering, FOSE, Hyderabad, India, 2014. 55--69. https://doi.org/10.1145/2593882.2593891Google ScholarGoogle Scholar
  15. Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries. 2010. Moving into a New Software Project Landscape. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (Cape Town, South Africa) (ICSE '10). ACM, New York, NY, USA, 275--284. https://doi.org/10.1145/1806799.1806842Google ScholarGoogle Scholar
  16. Alex Dekhtyar and Jane Huffman Hayes. 2012. Studying the Role of Humans in the Traceability Loop. In Software and Systems Traceability. 241--261. https://doi.org/10.1007/978-1-4471-2239-5_11Google ScholarGoogle Scholar
  17. Alex Dekhtyar, Jane Huffman Hayes, Senthil Karthikeyan Sundaram, Elizabeth Ashlee Holbrook, and Olga Dekhtyar. 2007. Technique Integration for Requirements Assessment. In International Requirements Engineering Conference, RE New Delhi, India. 141--150. https://doi.org/10.1109/RE.2007.17Google ScholarGoogle Scholar
  18. R. Elamin and R. Osman. 2018. Implementing Traceability Repositories as Graph Databases for Software Quality Improvement. In International Conference on Software Quality, Reliability and Security (QRS). 269--276. https://doi.org/10.1109/QRS.2018.00040Google ScholarGoogle Scholar
  19. R. Evans. 2015. Apache Storm, a Hands on Tutorial. In IEEE International Conference on Cloud Engineering. 2--2. https://doi.org/10.1109/IC2E.2015.67Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ellen Friedman and Kostas Tzoumas. 2016. Introduction to Apache Flink: Stream Processing for Real Time and Beyond (1st ed.). O'Reilly Media, Inc.Google ScholarGoogle Scholar
  21. Thomas Fritz and Gail C. Murphy. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010. 175--184. https://doi.org/10.1145/1806799.1806828Google ScholarGoogle Scholar
  22. Jin Guo, Jinghui Cheng, and Jane Cleland-Huang. 2017. Semantically enhanced software traceability using deep learning techniques. In International Conference on Software Engineering (ICSE). IEEE, 3--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Norman F Hirst. 1976. MUMPS: Massachusetts General Hospital Utility Multiprogramming System. Medical Informatics 1, 3 (1976), 163--165.Google ScholarGoogle ScholarCross RefCross Ref
  24. Andrew J. Ko, Robert DeLine, and Gina Venolia. 2007. Information Needs in Collocated Software Development Teams. In Proceedings of the 29th International Conference on Software Engineering (ICSE '07). IEEE Computer Society, USA, 344--353. https://doi.org/10.1109/ICSE.2007.45Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thomas D. LaToza and Brad A. Myers. 2010. Developers Ask Reachability Questions. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (Cape Town, South Africa) (ICSE '10). ACM, New York, NY, USA, 185--194. https://doi.org/10.1145/1806799.1806829Google ScholarGoogle Scholar
  26. Andrea De Lucia, Andrian Marcus, Rocco Oliveto, and Denys Poshyvanyk. 2012. Information Retrieval Methods for Automated Traceability Recovery. In Software and Systems Traceability. 71--98. https://doi.org/10.1007/978-1-4471-2239-5_4Google ScholarGoogle Scholar
  27. Sugandha Malviya, Michael Vierhauser, Jane Cleland-Huang, and Smita Ghaisas. 2017. What Questions do Requirements Engineers Ask?. In 25th IEEE International Requirements Engineering Conference, RE 2017, Lisbon, Portugal, September 4-8, 2017. 100--109. https://doi.org/10.1109/RE.2017.76Google ScholarGoogle ScholarCross RefCross Ref
  28. Robert B Miller. 1968. Response time in man-computer conversational transactions.. In AFIPS Fall Joint Computing Conference (1). 267--277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mehdi Mirakhorli and Jane Cleland-Huang. 2016. Detecting, Tracing, and Monitoring Architectural Tactics in Code. IEEE Trans. Software Eng. 42, 3 (2016), 206--221. https://doi.org/10.1109/TSE.2015.2479217Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Bashar Nuseibeh. 2001. Weaving Together Requirements and Architectures. IEEE Computer 34, 3 (2001), 115--117. https://doi.org/10.1109/2.910904Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Osehra. 2019. OSEHRA/VistA-M. https://github.com/OSEHRA/VistA-MGoogle ScholarGoogle Scholar
  32. Mona Rahimi, Mehdi Mirakhorli, and Jane Cleland-Huang. 2014. Automated extraction and visualization of quality concerns from requirements specifications. In IEEE 22nd International Requirements Engineering Conference, RE 2014, Karlskrona, Sweden, August 25-29, 2014. 253--262. https://doi.org/10.1109/RE.2014.6912267Google ScholarGoogle ScholarCross RefCross Ref
  33. Michael Rath, Jacob Rendall, Jin L. C. Guo, Jane Cleland-Huang, and Patrick Mäder. 2018. Traceability in the wild: automatically augmenting incomplete trace links. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. 834--845. https://doi.org/10.1145/3180155.3180207Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA. http://is.muni.cz/publication/884893/en.Google ScholarGoogle Scholar
  35. Philip Russom et al. 2011. Big data analytics. TDWI best practices report, fourth quarter 19, 4 (2011), 1--34.Google ScholarGoogle Scholar
  36. G. Salton and M. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Susanne A Sherba. 2005. Towards automating traceability: an incremental and scalable approach. Ph.D. Dissertation. University of Colorado at Boulder.Google ScholarGoogle Scholar
  38. Yonghee Shin and Jane Cleland-Huang. 2012. A comparative evaluation of two user feedback techniques for requirements trace retrieval. In Proceedings of the ACM Symposium on Applied Computing, SAC 2012, Riva, Trento, Italy, March 26-30, 2012. 1069--1074. https://doi.org/10.1145/2245276.2231943Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jonathan Sillito, Gail C. Murphy, and Kris De Volder. 2006. Questions Programmers Ask During Software Evolution Tasks. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Portland, Oregon, USA) (SIGSOFT '06/FSE-14). ACM, New York, NY, USA, 23--34. https://doi.org/10.1145/1181775.1181779Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Roopak Sinha, Barry Dowdeswell, Gulnara Zhabelova, and Valeriy Vyatkin. 2019. TORUS: Scalable Requirements Traceability for Large-Scale Cyber-Physical Systems. ACM Transactions on Cyber-Physical Systems 3, 2 (2019), 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Douglas Thain, Todd Tannenbaum, and Miron Livny. 2005. Distributed computing in practice: the Condor experience. Concurrency and computation: practice and experience 17, 2-4 (2005), 323--356.Google ScholarGoogle Scholar
  42. Tom White. 2009. Hadoop: The Definitive Guide (1st ed.). O'Reilly Media, Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Waleed Zogaan, Palak Sharma, Mehdi Mirakhorli, and Venera Arnaoudova. 2017. Datasets from Fifteen Years of Automated Requirements Traceability Research: Current State, Characteristics, and Quality. In Requirements Engineering Conf., RE. 110--121. https://doi.org/10.1109/RE.2017.80Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Supporting Program Comprehension through Fast Query response in Large-Scale Systems
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ICPC '20: Proceedings of the 28th International Conference on Program Comprehension
              July 2020
              481 pages
              ISBN:9781450379588
              DOI:10.1145/3387904

              Copyright © 2020 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 12 September 2020

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

              Upcoming Conference

              ICSE 2025

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader