Skip to main content

TEXAS2: A System for Extracting Domain Topic Using Link Analysis and Searching for Relevant Features

  • Conference paper
  • First Online:
Advances in Computer Science and Ubiquitous Computing (UCAWSN 2016, CUTE 2016, CSA 2016)

Abstract

It is very important to understand the domain topic of software to maintain and reuse it. However, the continual development and change in its size makes it difficult to understand it. To solve this problem, researches have been recently conducted to extract the domain topic using various information search techniques such as LDA, with the researches on LDA-based techniques being especially active. However, since only unstructured information such as an identifier or note is used in most research, without including structured ones like information calling, problems in which extracted topics are different from the characteristics of the program can occur. In this paper, we propose a method to generate documents and extract topics using both structured and unstructured information. We also generate indexes based on the frequency of the identifier of the source code, and propose a system that extracts an association rule based on the simultaneous generation of the method. We as well establish a system that provides highly reliable search results to user queries by combining domain topics, indexes with scores, and the association rule information. Consequently a TEXAS2 system for this study was established and confirmed a high user satisfaction on search results to the queries in a performance test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Antoniol, G., Guéhéneuc, Y. G.: Feature identification: an epidemiological metaphor. IEEE Trans. Softw. Eng. 32(9), 627–641. IEEE Press, New York (2006)

    Google Scholar 

  2. Karrer, T., Krämer, J.P., Diel, J., Hartmann, B.: Stacksplorer: call graph navigation helps increasing code maintenance efficiency. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 217–224. ACM, New York (2011)

    Google Scholar 

  3. Maskeri, G., Sarkar, S., Heafield, K.: Mining business topics in source code using Latent Dirichlet Allocation. In: Proceedings of the 1st India Software Engineering Conference, pp. 113–120. ACM, New York (2008)

    Google Scholar 

  4. Alenezi, M.: Extracting high-level concepts from open-source systems. Intl. J. Softw. Eng. Appl. 9(1), 183–190 (2015). SERSC, Tasmania

    Google Scholar 

  5. McBurney, P.W., Liu, C., McMillan, C., Weninger, T.: Improving topic model source code summarization. In: Proceedings of the 22nd International Conference on Program Comprehension, pp. 291–294. ACM, New York (2014)

    Google Scholar 

  6. Savage, T., Dit, B., Gethers, M., Poshyvank, D.: Topic XP: exploring topics in source code using Latent Dirichlet Allocation. In: IEEE International Conference on Software Maintenance, pp. 1–6. IEEE Press, New York (2010)

    Google Scholar 

  7. Slimani, T., Lazzez, A.: Sequential mining: patterns and algorithms analysis. Intl. J. Comput. Electron. Res. 2, 639–647 (2013)

    Google Scholar 

  8. Apache Lucene. https://lucene.apache.org/core/

  9. Apache Solr. https://lucene.apache.org/solr/

  10. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). MIT Press, Cambridge

    Google Scholar 

  11. Blei, D.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). ACM, New York

    Google Scholar 

  12. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. J. Comput. Netw. ISDN Syst. 30, 107–117 (1998). Amsterdam

    Google Scholar 

  13. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB Conference

    Google Scholar 

Download references

Acknowledgement

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT & Future Planning(NRF-2014M3C4A7030505).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to SangWon Hwang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Hwang, S., Lee, Y., Nam, Y. (2017). TEXAS2: A System for Extracting Domain Topic Using Link Analysis and Searching for Relevant Features. In: Park, J., Pan, Y., Yi, G., Loia, V. (eds) Advances in Computer Science and Ubiquitous Computing. UCAWSN CUTE CSA 2016 2016 2016. Lecture Notes in Electrical Engineering, vol 421. Springer, Singapore. https://doi.org/10.1007/978-981-10-3023-9_113

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3023-9_113

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3022-2

  • Online ISBN: 978-981-10-3023-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics