TEXAS2: A System for Extracting Domain Topic Using Link Analysis and Searching for Relevant Features

Hwang, SangWon; Lee, YongSeok; Nam, YoungKwang

doi:10.1007/978-981-10-3023-9_113

SangWon Hwang⁵,
YongSeok Lee⁵ &
YoungKwang Nam⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 421))

Included in the following conference series:

2572 Accesses

Abstract

It is very important to understand the domain topic of software to maintain and reuse it. However, the continual development and change in its size makes it difficult to understand it. To solve this problem, researches have been recently conducted to extract the domain topic using various information search techniques such as LDA, with the researches on LDA-based techniques being especially active. However, since only unstructured information such as an identifier or note is used in most research, without including structured ones like information calling, problems in which extracted topics are different from the characteristics of the program can occur. In this paper, we propose a method to generate documents and extract topics using both structured and unstructured information. We also generate indexes based on the frequency of the identifier of the source code, and propose a system that extracts an association rule based on the simultaneous generation of the method. We as well establish a system that provides highly reliable search results to user queries by combining domain topics, indexes with scores, and the association rule information. Consequently a TEXAS2 system for this study was established and confirmed a high user satisfaction on search results to the queries in a performance test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antoniol, G., Guéhéneuc, Y. G.: Feature identification: an epidemiological metaphor. IEEE Trans. Softw. Eng. 32(9), 627–641. IEEE Press, New York (2006)
Google Scholar
Karrer, T., Krämer, J.P., Diel, J., Hartmann, B.: Stacksplorer: call graph navigation helps increasing code maintenance efficiency. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 217–224. ACM, New York (2011)
Google Scholar
Maskeri, G., Sarkar, S., Heafield, K.: Mining business topics in source code using Latent Dirichlet Allocation. In: Proceedings of the 1st India Software Engineering Conference, pp. 113–120. ACM, New York (2008)
Google Scholar
Alenezi, M.: Extracting high-level concepts from open-source systems. Intl. J. Softw. Eng. Appl. 9(1), 183–190 (2015). SERSC, Tasmania
Google Scholar
McBurney, P.W., Liu, C., McMillan, C., Weninger, T.: Improving topic model source code summarization. In: Proceedings of the 22nd International Conference on Program Comprehension, pp. 291–294. ACM, New York (2014)
Google Scholar
Savage, T., Dit, B., Gethers, M., Poshyvank, D.: Topic XP: exploring topics in source code using Latent Dirichlet Allocation. In: IEEE International Conference on Software Maintenance, pp. 1–6. IEEE Press, New York (2010)
Google Scholar
Slimani, T., Lazzez, A.: Sequential mining: patterns and algorithms analysis. Intl. J. Comput. Electron. Res. 2, 639–647 (2013)
Google Scholar
Apache Lucene. https://lucene.apache.org/core/
Apache Solr. https://lucene.apache.org/solr/
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). MIT Press, Cambridge
Google Scholar
Blei, D.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). ACM, New York
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. J. Comput. Netw. ISDN Syst. 30, 107–117 (1998). Amsterdam
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB Conference
Google Scholar

Download references

Acknowledgement

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT & Future Planning(NRF-2014M3C4A7030505).

Author information

Authors and Affiliations

Department of Computer Science, Yonsei University, Wonju, South Korea
SangWon Hwang, YongSeok Lee & YoungKwang Nam

Authors

SangWon Hwang
View author publications
You can also search for this author in PubMed Google Scholar
YongSeok Lee
View author publications
You can also search for this author in PubMed Google Scholar
YoungKwang Nam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to SangWon Hwang .

Editor information

Editors and Affiliations

Computer Science and Engineering, Seoul National University of Science and Technology, Seoul, Korea (Republic of)
James J. (Jong Hyuk) Park
Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
Yi Pan
Computer Science and Engineering, Gangneung-Wonju National University, Wonju, Korea (Republic of)
Gangman Yi
University Salerno, Fisciano, Italy
Vincenzo Loia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hwang, S., Lee, Y., Nam, Y. (2017). TEXAS2: A System for Extracting Domain Topic Using Link Analysis and Searching for Relevant Features. In: Park, J., Pan, Y., Yi, G., Loia, V. (eds) Advances in Computer Science and Ubiquitous Computing. UCAWSN CUTE CSA 2016 2016 2016. Lecture Notes in Electrical Engineering, vol 421. Springer, Singapore. https://doi.org/10.1007/978-981-10-3023-9_113

Download citation

DOI: https://doi.org/10.1007/978-981-10-3023-9_113
Published: 23 November 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3022-2
Online ISBN: 978-981-10-3023-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics