Skip to main content
Log in

System for extracting domain topic using link analysis and searching for relevant features

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Understanding the domain topic of software in terms of maintenance and reuse is important. However, the continual development of software and changes in its size make it difficult for engineers to understand. Recent research studies have sought to solve this problem by extracting the domain topic using various information searching techniques, such as latent semantic indexing (LSI) and latent dirichlet allocation (LDA), with the research on LDA-based techniques being particularly active. However, since the research has used only unstructured information, such as identifiers or notes, without including structured information, such as a method of calling information, it has caused problems in which extracted topics differ according to the program’s characteristics. This paper proposes a method of generating documents and extracting topics using both structured and unstructured information. In addition, indexes are generated based on the frequency of the identifier’s occurrence in the source code, and a system is proposed that extracts an association rule based on the method’s co-occurrence. Moreover, this paper suggests an information retrieval system that can provide highly reliable search results for user queries by combining domain topics, indexes with scores, and association rule information. We also develop the Topic EXtract And Search System (TEXAS2) system for this research and confirm high user satisfaction with the search results to their queries in a performance test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://lucene.apache.org/core/.

  2. https://lucene.apache.org/solr/.

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB Conference

  • Alenezi M (2015) Extracting high-level concepts from open-source systems. Int J Softw Eng Appl 9(1):183–190

    Google Scholar 

  • Almustafa K, Alenezi M (2015) Dynamic evolution of source code topics. In: The 10th international conference on software engineering advances, pp 307–312

  • Antoniol G, Guéhéneuc YG (2006) Feature identification: an epidemiological metaphor. IEEE Trans Software Eng 32(9):627–641

    Article  Google Scholar 

  • Bauitaan S, Alenezi M (2015) Software evolution via topic modeling: an analytic study. Int J Softw Eng Appl 9(5):43–52

    Google Scholar 

  • Binkely D, Heinz D, Lawrie D, Overfelt J (2016) Source code analysis with LDA. J Softw Evol Process 28(10):893–920

    Article  Google Scholar 

  • Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  Google Scholar 

  • Blei D, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, pp 113–120

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: International world-wide web conference (WWW), pp 107–117

  • Chen Y-W, Wang J-L, Cai Y-Q, Ji-Xiang D (2015) A method for Chinese text classification based on apparent semantics and latent aspects. J Ambient Intell Humaniz Comput 6(4):473–480

    Article  Google Scholar 

  • Cho YS, Moon SC (2015) Recommender system using periodicity analysis via mining sequential patterns with time-series and FRAT analysis. J Converg 6(1):9–17

    Google Scholar 

  • Haiduc S, Arnaoudova V, Marcus A, Antoniol G (2015) The use of text retrieval and natural language processing in software engineering. In: Proceedings of the 37th international conference on software engineering, vol 2, pp 949–950

  • Han J, Pei J, Yin Y, Mao R (2015) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87

    Article  MathSciNet  Google Scholar 

  • Hu J, Sun X, Li B, Lo D (2015) Modeling the evolution of development topics using dynamic topic models. In: IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER), pp 3–12

  • Iwan LH, Safar M (2010) Pattern mining from movement of mobile users. J Ambient Intell Humaniz Comput 1(4):295–308

    Article  Google Scholar 

  • Jeeva SC, Rajsingh EB (2016) Intelligent phishing url detection using association rule mining. J Human Centric Comput Inf Sci 6:10

    Article  Google Scholar 

  • Karrer T, Krämer JP, Diel J, Hartmann B (2011) Stacksplorer: call graph navigation helps increasing code maintenance efficiency. In: Proceedings of the 24th annual ACM symposium on user interface software and technology, pp 217–224

  • Linstead E, Lopes C, Baldi P (2008) An application of latent Dirichlet allocation to analyzing software evolution. In: 7th International conference on machine learning and applications, pp 813–818

  • Maskeri G, Sarkar S, Heafield K (2008) Mining business topics in source code using latent dirichlet allocation. In: Proceedings of the 1st India software engineering conference, pp 113–120

  • McBurney PW, Liu C, McMillan C, Weninger T (2014) Improving topic model source code summarization. In: Proceedings of the 22nd international conference on program comprehension, pp 291–294

  • Panichella A, Dit B, Oliveto R, Penta M, Poshyvanyk D, Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering, pp 522–531

  • Rafiei M, Kardan AA (2015) A novel method for expert finding in online communities based on concept map and PageRank. J Human Centric Comput Inf Sci 5:10

    Article  Google Scholar 

  • Saeidi AM, Hage J, Khadka R, Jansen S (2015) ITMViz: Interactive topic modeling for source code analysis. In: Proceedings of the IEEE 23rd international conference on program comprehension, pp 295–298

  • Savage T, Dit B, Gethers M, Poshyvank D (2010) Topic XP: exploring topics in source code using latent dirichlet allocation. In: IEEE international conference on software maintenance, pp 1–6

  • Slimani TH, Lazzez A (2013) Sequential mining: patterns and algorithms analysis. Int J Comput Electron Res 2(5):639–647

    Google Scholar 

  • Sun X, Li B, Yun L, Ying C (2014) What information in software historical repositories do we need to support software maintenance tasks? An approach based on topic model. In: Computer and information science, pp 27–37

  • Thomas SW (2011) Mining software repositories using topic models. In: Proceedings of the 33rd international conference on software engineering, pp 1138–1139

  • Vijayakumar K, Arun C (2017) Automated risk identification using NLP in cloud based development environments. J Ambient Intell Humaniz Comput. ​https://doi.org/10.1007/s12652-017-0503-7

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014M3C4A7030505) and (in part) the Yonsei University Research Fund of 2014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sang Won Hwang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hwang, S.W., Lee, Y.S. & Nam, Y.K. System for extracting domain topic using link analysis and searching for relevant features. J Ambient Intell Human Comput 15, 1429–1441 (2024). https://doi.org/10.1007/s12652-018-1046-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-1046-2

Keywords

Navigation