Skip to main content

A Comparison of Methods for Automatic Term Extraction for Domain Analysis

  • Conference paper
Software Reuse for Dynamic Systems in the Cloud and Beyond (ICSR 2015)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8919))

Included in the following conference series:

Abstract

Fourteen word frequency metrics were tested to evaluate their effectiveness in identifying vocabulary in a domain. Fifteen domain-engineering projects were examined to measure how closely the vocabularies selected by the fourteen word frequency metrics were to the vocabularies produced by domain engineers. Stemming and stopword removal were also evaluated to measure their impact on selecting proper vocabulary terms. The results of the experiment show that stemming and stopword removal do improve performance and that term frequency is a valuable contributor to performance. Most word frequency metrics gave similar results. A few of the metrics did poorly compared to the others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Crawley, M.J.: The R Book. Wiley, West Sussex (2007)

    Book  MATH  Google Scholar 

  2. Frakes, W.: A Method for Bounding Domains. In: IASTED International Conference Software Engineering and Applications, Las Vegas, NV, pp. 269–272 (2000)

    Google Scholar 

  3. Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  4. Frakes, W.B., Kang, K.: Software Reuse Research: Status and Future. IEEE Transactions on Software Engineering 31(7), 529–536 (2005)

    Article  Google Scholar 

  5. Frakes, W., Prieto-Diaz, R., Fox, C.: DARE: Domain Analysis and Reuse Environment. Annals of Software Engineering, 125–141 (1998)

    Google Scholar 

  6. Justeson, J., Katz, S.: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. In: Natural Language Engineering, pp. 9–27. IBM Research Division, Almadem (1993)

    Google Scholar 

  7. Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  8. Noreault, T., McGill, M., Koll, M.: A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment. In: Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, pp. 57–76. Butterworth and Co., Cambridge (1980)

    Google Scholar 

  9. Porter, M.F.: An Algorithm for Suffix Striping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  10. Sclano, F., Velardi, P.: TermExtractor: A Web Application to Learn the Shared Terminology of Emergent Web Communities. In: Gonçalves, R.J., Müller, J.P., Mertins, K., Zelm, M. (eds.) Enterprise Interoperability II, pp. 287–290. Springer, London (2007)

    Chapter  Google Scholar 

  11. Tilley, J.: A Comparison of Statistical Filtering Methods for Automatic Term Extraction for Domain Analysis. Masters Thesis, Computer Science Department, Virginia Tech (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Frakes, W.B., Kulczycki, G., Tilley, J. (2014). A Comparison of Methods for Automatic Term Extraction for Domain Analysis. In: Schaefer, I., Stamelos, I. (eds) Software Reuse for Dynamic Systems in the Cloud and Beyond. ICSR 2015. Lecture Notes in Computer Science, vol 8919. Springer, Cham. https://doi.org/10.1007/978-3-319-14130-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14130-5_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14129-9

  • Online ISBN: 978-3-319-14130-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics