research-article

Supporting Program Comprehension through Fast Query response in Large-Scale Systems

Authors:
Jinfeng Lin

University of Notre Dame, Notre Dame, IN

University of Notre Dame, Notre Dame, IN
View Profile

,
Yalin Liu

University of Notre Dame, Notre Dame, IN

University of Notre Dame, Notre Dame, IN
View Profile

,
Jane Cleland-Huang

University of Notre Dame, Notre Dame, IN

University of Notre Dame, Notre Dame, IN
View Profile

ICPC '20: Proceedings of the 28th International Conference on Program ComprehensionJuly 2020Pages 285–295https://doi.org/10.1145/3387904.3389260

Published:12 September 2020Publication History

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

Pages 285–295

ABSTRACT

Software traceability provides support for various engineering activities including Program Comprehension; however, it can be challenging and arduous to complete in large industrial projects. Researchers have proposed automated traceability techniques to create, maintain and leverage trace links. Computationally intensive techniques, such as repository mining and deep learning, have showed the capability to deliver accurate trace links. The objective of achieving trusted, automated tracing techniques at industrial scale has not yet been successfully accomplished due to practical performance challenges. This paper evaluates high-performance solutions for deploying effective, computationally expensive trace-ability algorithms in large scale industrial projects and leverages generated trace links to answer Program Comprehension Queries. We comparatively evaluate four different platforms for supporting industrial-scale tracing solutions, capable of tackling software projects with millions of artifacts. We demonstrate that tracing solutions built using big data frameworks scale well for large projects and that our Spark implementation outperforms relational database, graph database (GraphDB), and plain Java implementations. These findings contradict earlier results which suggested that GraphDB solutions should be adopted for large-scale tracing problems.

References

[n.d.]. Difference Between SQL Vs MySQL Vs SQL Server. https://www.softwaretestinghelp.com/sql-vs-mysql-vs-sql-server. Accessed: 2020-10-30.Google Scholar
[n.d.]. Engines. https://db-engines.com/en/Google Scholar
[n.d.]. Spark MLib Modlue. https://spark.apache.org/mllib.Google Scholar
E. Deelman et al. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13 (2005), 219--237.Google ScholarDigital Library
K. Islam et al. 2017. Huge and Real-Time Database Systems: A Comparative Study and Review for SQL Server 2016, Oracle 12c & MySQL 5.7 for Personal Computer. Journal of Basic and Applied Sciences 13 (2017), 481--490.Google Scholar
M. Armbrust et al. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD). ACM, New York, NY, USA, 1383--1394. https://doi.org/10.1145/2723372.2742797Google ScholarDigital Library
M. Zaharia et al. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65. https://doi.org/10.1145/2934664Google ScholarDigital Library
S. Lohar et al. 2013. Improving trace accuracy through data-driven configuration and composition of tracing features. In European Software Eng. Conference (ESEC/FSE). 378--388. https://doi.org/10.1145/2491411.2491432Google ScholarDigital Library
Jason Baldridge. 2005. The opennlp project. URL: http://opennlp.apache.org/index.html, (accessed 2 February 2012) (2005), 1.Google Scholar
Andrew Begel and Thomas Zimmermann. 2014. Analyze this! 145 questions for data scientists in software engineering. In 36th International Conference on Software Engineering, ICSE '14, Hyderabad, India - May 31 - June 07, 2014. 12--23. https://doi.org/10.1145/2568225.2568233Google ScholarDigital Library
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.Google ScholarDigital Library
Markus Borg, Orlena CZ Gotel, and Krzysztof Wnuk. 2013. Enabling traceability reuse for impact analyses: A feasibility study in a safety context. In Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE). IEEE, 72--78.Google ScholarCross Ref
Lawrence Chung, Daniel Gross, and Eric S. K. Yu. 1999. Architectural Design to Meet Stakeholder Requirements. In Software Architecture, TC2 First Working IFIP Conference on Software Architecture (WICSA1), 22-24 February 1999, San Antonio, Texas, USA. 545--564.Google Scholar
Jane Cleland-Huang, Orlena Gotel, Jane Huffman Hayes, Patrick Mäder, and Andrea Zisman. 2014. Software traceability: trends and future directions. In Proceedings of the on Future of Software Engineering, FOSE, Hyderabad, India, 2014. 55--69. https://doi.org/10.1145/2593882.2593891Google Scholar
Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries. 2010. Moving into a New Software Project Landscape. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (Cape Town, South Africa) (ICSE '10). ACM, New York, NY, USA, 275--284. https://doi.org/10.1145/1806799.1806842Google Scholar
Alex Dekhtyar and Jane Huffman Hayes. 2012. Studying the Role of Humans in the Traceability Loop. In Software and Systems Traceability. 241--261. https://doi.org/10.1007/978-1-4471-2239-5_11Google Scholar
Alex Dekhtyar, Jane Huffman Hayes, Senthil Karthikeyan Sundaram, Elizabeth Ashlee Holbrook, and Olga Dekhtyar. 2007. Technique Integration for Requirements Assessment. In International Requirements Engineering Conference, RE New Delhi, India. 141--150. https://doi.org/10.1109/RE.2007.17Google Scholar
R. Elamin and R. Osman. 2018. Implementing Traceability Repositories as Graph Databases for Software Quality Improvement. In International Conference on Software Quality, Reliability and Security (QRS). 269--276. https://doi.org/10.1109/QRS.2018.00040Google Scholar
R. Evans. 2015. Apache Storm, a Hands on Tutorial. In IEEE International Conference on Cloud Engineering. 2--2. https://doi.org/10.1109/IC2E.2015.67Google ScholarDigital Library
Ellen Friedman and Kostas Tzoumas. 2016. Introduction to Apache Flink: Stream Processing for Real Time and Beyond (1st ed.). O'Reilly Media, Inc.Google Scholar
Thomas Fritz and Gail C. Murphy. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010. 175--184. https://doi.org/10.1145/1806799.1806828Google Scholar
Jin Guo, Jinghui Cheng, and Jane Cleland-Huang. 2017. Semantically enhanced software traceability using deep learning techniques. In International Conference on Software Engineering (ICSE). IEEE, 3--14.Google ScholarDigital Library
Norman F Hirst. 1976. MUMPS: Massachusetts General Hospital Utility Multiprogramming System. Medical Informatics 1, 3 (1976), 163--165.Google ScholarCross Ref
Andrew J. Ko, Robert DeLine, and Gina Venolia. 2007. Information Needs in Collocated Software Development Teams. In Proceedings of the 29th International Conference on Software Engineering (ICSE '07). IEEE Computer Society, USA, 344--353. https://doi.org/10.1109/ICSE.2007.45Google ScholarDigital Library
Thomas D. LaToza and Brad A. Myers. 2010. Developers Ask Reachability Questions. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (Cape Town, South Africa) (ICSE '10). ACM, New York, NY, USA, 185--194. https://doi.org/10.1145/1806799.1806829Google Scholar
Andrea De Lucia, Andrian Marcus, Rocco Oliveto, and Denys Poshyvanyk. 2012. Information Retrieval Methods for Automated Traceability Recovery. In Software and Systems Traceability. 71--98. https://doi.org/10.1007/978-1-4471-2239-5_4Google Scholar
Sugandha Malviya, Michael Vierhauser, Jane Cleland-Huang, and Smita Ghaisas. 2017. What Questions do Requirements Engineers Ask?. In 25th IEEE International Requirements Engineering Conference, RE 2017, Lisbon, Portugal, September 4-8, 2017. 100--109. https://doi.org/10.1109/RE.2017.76Google ScholarCross Ref
Robert B Miller. 1968. Response time in man-computer conversational transactions.. In AFIPS Fall Joint Computing Conference (1). 267--277.Google ScholarDigital Library
Mehdi Mirakhorli and Jane Cleland-Huang. 2016. Detecting, Tracing, and Monitoring Architectural Tactics in Code. IEEE Trans. Software Eng. 42, 3 (2016), 206--221. https://doi.org/10.1109/TSE.2015.2479217Google ScholarDigital Library
Bashar Nuseibeh. 2001. Weaving Together Requirements and Architectures. IEEE Computer 34, 3 (2001), 115--117. https://doi.org/10.1109/2.910904Google ScholarDigital Library
Osehra. 2019. OSEHRA/VistA-M. https://github.com/OSEHRA/VistA-MGoogle Scholar
Mona Rahimi, Mehdi Mirakhorli, and Jane Cleland-Huang. 2014. Automated extraction and visualization of quality concerns from requirements specifications. In IEEE 22nd International Requirements Engineering Conference, RE 2014, Karlskrona, Sweden, August 25-29, 2014. 253--262. https://doi.org/10.1109/RE.2014.6912267Google ScholarCross Ref
Michael Rath, Jacob Rendall, Jin L. C. Guo, Jane Cleland-Huang, and Patrick Mäder. 2018. Traceability in the wild: automatically augmenting incomplete trace links. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. 834--845. https://doi.org/10.1145/3180155.3180207Google ScholarDigital Library
Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA. http://is.muni.cz/publication/884893/en.Google Scholar
Philip Russom et al. 2011. Big data analytics. TDWI best practices report, fourth quarter 19, 4 (2011), 1--34.Google Scholar
G. Salton and M. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, USA.Google ScholarDigital Library
Susanne A Sherba. 2005. Towards automating traceability: an incremental and scalable approach. Ph.D. Dissertation. University of Colorado at Boulder.Google Scholar
Yonghee Shin and Jane Cleland-Huang. 2012. A comparative evaluation of two user feedback techniques for requirements trace retrieval. In Proceedings of the ACM Symposium on Applied Computing, SAC 2012, Riva, Trento, Italy, March 26-30, 2012. 1069--1074. https://doi.org/10.1145/2245276.2231943Google ScholarDigital Library
Jonathan Sillito, Gail C. Murphy, and Kris De Volder. 2006. Questions Programmers Ask During Software Evolution Tasks. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Portland, Oregon, USA) (SIGSOFT '06/FSE-14). ACM, New York, NY, USA, 23--34. https://doi.org/10.1145/1181775.1181779Google ScholarDigital Library
Roopak Sinha, Barry Dowdeswell, Gulnara Zhabelova, and Valeriy Vyatkin. 2019. TORUS: Scalable Requirements Traceability for Large-Scale Cyber-Physical Systems. ACM Transactions on Cyber-Physical Systems 3, 2 (2019), 15.Google ScholarDigital Library
Douglas Thain, Todd Tannenbaum, and Miron Livny. 2005. Distributed computing in practice: the Condor experience. Concurrency and computation: practice and experience 17, 2-4 (2005), 323--356.Google Scholar
Tom White. 2009. Hadoop: The Definitive Guide (1st ed.). O'Reilly Media, Inc.Google ScholarDigital Library
Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.Google ScholarDigital Library
Waleed Zogaan, Palak Sharma, Mehdi Mirakhorli, and Venera Arnaoudova. 2017. Datasets from Fifteen Years of Automated Requirements Traceability Research: Current State, Characteristics, and Quality. In Requirements Engineering Conf., RE. 110--121. https://doi.org/10.1109/RE.2017.80Google ScholarCross Ref

Index Terms

Supporting Program Comprehension through Fast Query response in Large-Scale Systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Reducing trace selection footprint for large-scale Java applications without performance loss
OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications

When optimizing large-scale applications, striking the balance between steady-state performance, start-up time, and code size has always been a grand challenge. While recent advances in trace compilation have significantly improved the steady-state ...
Read More
Supporting internet-scale multi-agent systems
DKE 40

The Internet provides a large-scale environment for (intelligent) software agents. Agents are autonomous (mobile) processes, capable of communication with other agents, interaction with the world, and adaptation to changes in their environment. Current ...
Read More
Performance and scalability of EJB applications
OOPSLA '02: Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications

We investigate the combined effect of application implementation method, container design, and efficiency of communication layers on the performance scalability of J2EE application servers by detailed measurement and profiling of an auction site ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension
July 2020
481 pages
ISBN:9781450379588
DOI:10.1145/3387904

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 September 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Software project queries
performance
scalability
traceability
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 130
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Supporting Program Comprehension through Fast Query response in Large-Scale Systems

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing trace selection footprint for large-scale Java applications without performance loss

Supporting internet-scale multi-agent systems

Performance and scalability of EJB applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Supporting Program Comprehension through Fast Query response in Large-Scale Systems

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing trace selection footprint for large-scale Java applications without performance loss

Supporting internet-scale multi-agent systems

Performance and scalability of EJB applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media