Abstract
Understanding the domain topic of software in terms of maintenance and reuse is important. However, the continual development of software and changes in its size make it difficult for engineers to understand. Recent research studies have sought to solve this problem by extracting the domain topic using various information searching techniques, such as latent semantic indexing (LSI) and latent dirichlet allocation (LDA), with the research on LDA-based techniques being particularly active. However, since the research has used only unstructured information, such as identifiers or notes, without including structured information, such as a method of calling information, it has caused problems in which extracted topics differ according to the program’s characteristics. This paper proposes a method of generating documents and extracting topics using both structured and unstructured information. In addition, indexes are generated based on the frequency of the identifier’s occurrence in the source code, and a system is proposed that extracts an association rule based on the method’s co-occurrence. Moreover, this paper suggests an information retrieval system that can provide highly reliable search results for user queries by combining domain topics, indexes with scores, and association rule information. We also develop the Topic EXtract And Search System (TEXAS2) system for this research and confirm high user satisfaction with the search results to their queries in a performance test.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB Conference
Alenezi M (2015) Extracting high-level concepts from open-source systems. Int J Softw Eng Appl 9(1):183–190
Almustafa K, Alenezi M (2015) Dynamic evolution of source code topics. In: The 10th international conference on software engineering advances, pp 307–312
Antoniol G, Guéhéneuc YG (2006) Feature identification: an epidemiological metaphor. IEEE Trans Software Eng 32(9):627–641
Bauitaan S, Alenezi M (2015) Software evolution via topic modeling: an analytic study. Int J Softw Eng Appl 9(5):43–52
Binkely D, Heinz D, Lawrie D, Overfelt J (2016) Source code analysis with LDA. J Softw Evol Process 28(10):893–920
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Blei D, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, pp 113–120
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: International world-wide web conference (WWW), pp 107–117
Chen Y-W, Wang J-L, Cai Y-Q, Ji-Xiang D (2015) A method for Chinese text classification based on apparent semantics and latent aspects. J Ambient Intell Humaniz Comput 6(4):473–480
Cho YS, Moon SC (2015) Recommender system using periodicity analysis via mining sequential patterns with time-series and FRAT analysis. J Converg 6(1):9–17
Haiduc S, Arnaoudova V, Marcus A, Antoniol G (2015) The use of text retrieval and natural language processing in software engineering. In: Proceedings of the 37th international conference on software engineering, vol 2, pp 949–950
Han J, Pei J, Yin Y, Mao R (2015) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87
Hu J, Sun X, Li B, Lo D (2015) Modeling the evolution of development topics using dynamic topic models. In: IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER), pp 3–12
Iwan LH, Safar M (2010) Pattern mining from movement of mobile users. J Ambient Intell Humaniz Comput 1(4):295–308
Jeeva SC, Rajsingh EB (2016) Intelligent phishing url detection using association rule mining. J Human Centric Comput Inf Sci 6:10
Karrer T, Krämer JP, Diel J, Hartmann B (2011) Stacksplorer: call graph navigation helps increasing code maintenance efficiency. In: Proceedings of the 24th annual ACM symposium on user interface software and technology, pp 217–224
Linstead E, Lopes C, Baldi P (2008) An application of latent Dirichlet allocation to analyzing software evolution. In: 7th International conference on machine learning and applications, pp 813–818
Maskeri G, Sarkar S, Heafield K (2008) Mining business topics in source code using latent dirichlet allocation. In: Proceedings of the 1st India software engineering conference, pp 113–120
McBurney PW, Liu C, McMillan C, Weninger T (2014) Improving topic model source code summarization. In: Proceedings of the 22nd international conference on program comprehension, pp 291–294
Panichella A, Dit B, Oliveto R, Penta M, Poshyvanyk D, Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering, pp 522–531
Rafiei M, Kardan AA (2015) A novel method for expert finding in online communities based on concept map and PageRank. J Human Centric Comput Inf Sci 5:10
Saeidi AM, Hage J, Khadka R, Jansen S (2015) ITMViz: Interactive topic modeling for source code analysis. In: Proceedings of the IEEE 23rd international conference on program comprehension, pp 295–298
Savage T, Dit B, Gethers M, Poshyvank D (2010) Topic XP: exploring topics in source code using latent dirichlet allocation. In: IEEE international conference on software maintenance, pp 1–6
Slimani TH, Lazzez A (2013) Sequential mining: patterns and algorithms analysis. Int J Comput Electron Res 2(5):639–647
Sun X, Li B, Yun L, Ying C (2014) What information in software historical repositories do we need to support software maintenance tasks? An approach based on topic model. In: Computer and information science, pp 27–37
Thomas SW (2011) Mining software repositories using topic models. In: Proceedings of the 33rd international conference on software engineering, pp 1138–1139
Vijayakumar K, Arun C (2017) Automated risk identification using NLP in cloud based development environments. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-017-0503-7
Acknowledgements
This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014M3C4A7030505) and (in part) the Yonsei University Research Fund of 2014.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hwang, S.W., Lee, Y.S. & Nam, Y.K. System for extracting domain topic using link analysis and searching for relevant features. J Ambient Intell Human Comput 15, 1429–1441 (2024). https://doi.org/10.1007/s12652-018-1046-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-1046-2