ABSTRACT
Software traceability provides support for various engineering activities including Program Comprehension; however, it can be challenging and arduous to complete in large industrial projects. Researchers have proposed automated traceability techniques to create, maintain and leverage trace links. Computationally intensive techniques, such as repository mining and deep learning, have showed the capability to deliver accurate trace links. The objective of achieving trusted, automated tracing techniques at industrial scale has not yet been successfully accomplished due to practical performance challenges. This paper evaluates high-performance solutions for deploying effective, computationally expensive trace-ability algorithms in large scale industrial projects and leverages generated trace links to answer Program Comprehension Queries. We comparatively evaluate four different platforms for supporting industrial-scale tracing solutions, capable of tackling software projects with millions of artifacts. We demonstrate that tracing solutions built using big data frameworks scale well for large projects and that our Spark implementation outperforms relational database, graph database (GraphDB), and plain Java implementations. These findings contradict earlier results which suggested that GraphDB solutions should be adopted for large-scale tracing problems.
- [n.d.]. Difference Between SQL Vs MySQL Vs SQL Server. https://www.softwaretestinghelp.com/sql-vs-mysql-vs-sql-server. Accessed: 2020-10-30.Google Scholar
- [n.d.]. Engines. https://db-engines.com/en/Google Scholar
- [n.d.]. Spark MLib Modlue. https://spark.apache.org/mllib.Google Scholar
- E. Deelman et al. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13 (2005), 219--237.Google ScholarDigital Library
- K. Islam et al. 2017. Huge and Real-Time Database Systems: A Comparative Study and Review for SQL Server 2016, Oracle 12c & MySQL 5.7 for Personal Computer. Journal of Basic and Applied Sciences 13 (2017), 481--490.Google Scholar
- M. Armbrust et al. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD). ACM, New York, NY, USA, 1383--1394. https://doi.org/10.1145/2723372.2742797Google ScholarDigital Library
- M. Zaharia et al. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65. https://doi.org/10.1145/2934664Google ScholarDigital Library
- S. Lohar et al. 2013. Improving trace accuracy through data-driven configuration and composition of tracing features. In European Software Eng. Conference (ESEC/FSE). 378--388. https://doi.org/10.1145/2491411.2491432Google ScholarDigital Library
- Jason Baldridge. 2005. The opennlp project. URL: http://opennlp.apache.org/index.html, (accessed 2 February 2012) (2005), 1.Google Scholar
- Andrew Begel and Thomas Zimmermann. 2014. Analyze this! 145 questions for data scientists in software engineering. In 36th International Conference on Software Engineering, ICSE '14, Hyderabad, India - May 31 - June 07, 2014. 12--23. https://doi.org/10.1145/2568225.2568233Google ScholarDigital Library
- David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.Google ScholarDigital Library
- Markus Borg, Orlena CZ Gotel, and Krzysztof Wnuk. 2013. Enabling traceability reuse for impact analyses: A feasibility study in a safety context. In Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE). IEEE, 72--78.Google ScholarCross Ref
- Lawrence Chung, Daniel Gross, and Eric S. K. Yu. 1999. Architectural Design to Meet Stakeholder Requirements. In Software Architecture, TC2 First Working IFIP Conference on Software Architecture (WICSA1), 22-24 February 1999, San Antonio, Texas, USA. 545--564.Google Scholar
- Jane Cleland-Huang, Orlena Gotel, Jane Huffman Hayes, Patrick Mäder, and Andrea Zisman. 2014. Software traceability: trends and future directions. In Proceedings of the on Future of Software Engineering, FOSE, Hyderabad, India, 2014. 55--69. https://doi.org/10.1145/2593882.2593891Google Scholar
- Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries. 2010. Moving into a New Software Project Landscape. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (Cape Town, South Africa) (ICSE '10). ACM, New York, NY, USA, 275--284. https://doi.org/10.1145/1806799.1806842Google Scholar
- Alex Dekhtyar and Jane Huffman Hayes. 2012. Studying the Role of Humans in the Traceability Loop. In Software and Systems Traceability. 241--261. https://doi.org/10.1007/978-1-4471-2239-5_11Google Scholar
- Alex Dekhtyar, Jane Huffman Hayes, Senthil Karthikeyan Sundaram, Elizabeth Ashlee Holbrook, and Olga Dekhtyar. 2007. Technique Integration for Requirements Assessment. In International Requirements Engineering Conference, RE New Delhi, India. 141--150. https://doi.org/10.1109/RE.2007.17Google Scholar
- R. Elamin and R. Osman. 2018. Implementing Traceability Repositories as Graph Databases for Software Quality Improvement. In International Conference on Software Quality, Reliability and Security (QRS). 269--276. https://doi.org/10.1109/QRS.2018.00040Google Scholar
- R. Evans. 2015. Apache Storm, a Hands on Tutorial. In IEEE International Conference on Cloud Engineering. 2--2. https://doi.org/10.1109/IC2E.2015.67Google ScholarDigital Library
- Ellen Friedman and Kostas Tzoumas. 2016. Introduction to Apache Flink: Stream Processing for Real Time and Beyond (1st ed.). O'Reilly Media, Inc.Google Scholar
- Thomas Fritz and Gail C. Murphy. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010. 175--184. https://doi.org/10.1145/1806799.1806828Google Scholar
- Jin Guo, Jinghui Cheng, and Jane Cleland-Huang. 2017. Semantically enhanced software traceability using deep learning techniques. In International Conference on Software Engineering (ICSE). IEEE, 3--14.Google ScholarDigital Library
- Norman F Hirst. 1976. MUMPS: Massachusetts General Hospital Utility Multiprogramming System. Medical Informatics 1, 3 (1976), 163--165.Google ScholarCross Ref
- Andrew J. Ko, Robert DeLine, and Gina Venolia. 2007. Information Needs in Collocated Software Development Teams. In Proceedings of the 29th International Conference on Software Engineering (ICSE '07). IEEE Computer Society, USA, 344--353. https://doi.org/10.1109/ICSE.2007.45Google ScholarDigital Library
- Thomas D. LaToza and Brad A. Myers. 2010. Developers Ask Reachability Questions. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (Cape Town, South Africa) (ICSE '10). ACM, New York, NY, USA, 185--194. https://doi.org/10.1145/1806799.1806829Google Scholar
- Andrea De Lucia, Andrian Marcus, Rocco Oliveto, and Denys Poshyvanyk. 2012. Information Retrieval Methods for Automated Traceability Recovery. In Software and Systems Traceability. 71--98. https://doi.org/10.1007/978-1-4471-2239-5_4Google Scholar
- Sugandha Malviya, Michael Vierhauser, Jane Cleland-Huang, and Smita Ghaisas. 2017. What Questions do Requirements Engineers Ask?. In 25th IEEE International Requirements Engineering Conference, RE 2017, Lisbon, Portugal, September 4-8, 2017. 100--109. https://doi.org/10.1109/RE.2017.76Google ScholarCross Ref
- Robert B Miller. 1968. Response time in man-computer conversational transactions.. In AFIPS Fall Joint Computing Conference (1). 267--277.Google ScholarDigital Library
- Mehdi Mirakhorli and Jane Cleland-Huang. 2016. Detecting, Tracing, and Monitoring Architectural Tactics in Code. IEEE Trans. Software Eng. 42, 3 (2016), 206--221. https://doi.org/10.1109/TSE.2015.2479217Google ScholarDigital Library
- Bashar Nuseibeh. 2001. Weaving Together Requirements and Architectures. IEEE Computer 34, 3 (2001), 115--117. https://doi.org/10.1109/2.910904Google ScholarDigital Library
- Osehra. 2019. OSEHRA/VistA-M. https://github.com/OSEHRA/VistA-MGoogle Scholar
- Mona Rahimi, Mehdi Mirakhorli, and Jane Cleland-Huang. 2014. Automated extraction and visualization of quality concerns from requirements specifications. In IEEE 22nd International Requirements Engineering Conference, RE 2014, Karlskrona, Sweden, August 25-29, 2014. 253--262. https://doi.org/10.1109/RE.2014.6912267Google ScholarCross Ref
- Michael Rath, Jacob Rendall, Jin L. C. Guo, Jane Cleland-Huang, and Patrick Mäder. 2018. Traceability in the wild: automatically augmenting incomplete trace links. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. 834--845. https://doi.org/10.1145/3180155.3180207Google ScholarDigital Library
- Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA. http://is.muni.cz/publication/884893/en.Google Scholar
- Philip Russom et al. 2011. Big data analytics. TDWI best practices report, fourth quarter 19, 4 (2011), 1--34.Google Scholar
- G. Salton and M. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, USA.Google ScholarDigital Library
- Susanne A Sherba. 2005. Towards automating traceability: an incremental and scalable approach. Ph.D. Dissertation. University of Colorado at Boulder.Google Scholar
- Yonghee Shin and Jane Cleland-Huang. 2012. A comparative evaluation of two user feedback techniques for requirements trace retrieval. In Proceedings of the ACM Symposium on Applied Computing, SAC 2012, Riva, Trento, Italy, March 26-30, 2012. 1069--1074. https://doi.org/10.1145/2245276.2231943Google ScholarDigital Library
- Jonathan Sillito, Gail C. Murphy, and Kris De Volder. 2006. Questions Programmers Ask During Software Evolution Tasks. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Portland, Oregon, USA) (SIGSOFT '06/FSE-14). ACM, New York, NY, USA, 23--34. https://doi.org/10.1145/1181775.1181779Google ScholarDigital Library
- Roopak Sinha, Barry Dowdeswell, Gulnara Zhabelova, and Valeriy Vyatkin. 2019. TORUS: Scalable Requirements Traceability for Large-Scale Cyber-Physical Systems. ACM Transactions on Cyber-Physical Systems 3, 2 (2019), 15.Google ScholarDigital Library
- Douglas Thain, Todd Tannenbaum, and Miron Livny. 2005. Distributed computing in practice: the Condor experience. Concurrency and computation: practice and experience 17, 2-4 (2005), 323--356.Google Scholar
- Tom White. 2009. Hadoop: The Definitive Guide (1st ed.). O'Reilly Media, Inc.Google ScholarDigital Library
- Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.Google ScholarDigital Library
- Waleed Zogaan, Palak Sharma, Mehdi Mirakhorli, and Venera Arnaoudova. 2017. Datasets from Fifteen Years of Automated Requirements Traceability Research: Current State, Characteristics, and Quality. In Requirements Engineering Conf., RE. 110--121. https://doi.org/10.1109/RE.2017.80Google ScholarCross Ref
Index Terms
- Supporting Program Comprehension through Fast Query response in Large-Scale Systems
Recommendations
Reducing trace selection footprint for large-scale Java applications without performance loss
OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applicationsWhen optimizing large-scale applications, striking the balance between steady-state performance, start-up time, and code size has always been a grand challenge. While recent advances in trace compilation have significantly improved the steady-state ...
Supporting internet-scale multi-agent systems
DKE 40The Internet provides a large-scale environment for (intelligent) software agents. Agents are autonomous (mobile) processes, capable of communication with other agents, interaction with the world, and adaptation to changes in their environment. Current ...
Performance and scalability of EJB applications
OOPSLA '02: Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applicationsWe investigate the combined effect of application implementation method, container design, and efficiency of communication layers on the performance scalability of J2EE application servers by detailed measurement and profiling of an auction site ...
Comments