Abstract
Terminology recognition system which is a fundamental research for Technology Opportunity Discovery (TOD) has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.4 and 6.4% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.5 on F-score. We applied two machine learning methods such as Logistic regression and SVMs, and got the best score at SVMs method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yoon, B.: On the development of a technology intelligence tool for identifying technology opportunity. Expert Systems with Applications 35, 124–135 (2008)
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22(1), 1–38 (1996)
Wermter, J., Hahn, U.: Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms. In: HLT 2005 Proceedings of the Conference on Human Language Technology and Empirical Methods in NLP (2005)
Hilbe, J.M.: Logistic Regression Models. Chapman & Hall/CRC Press (2009)
Justeson, J.S., Katz, S.M.: Technical terminology: some lingustic propertis and an algorithm for identification in text. Natural Language Engineering 1(1), 9–27 (1995)
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries 3(2), 115–130 (2000)
Nakagawa, H., Mori, T.: Automatic term recognition based on statistics of compound nouns and their components. Terminology 9(2), 201–219 (2003)
Cilibrasi, R., Vitanyi, P.: The Google Similarity Distance. IEEE Trans. Knowledge and Data Engineering 19(3), 370–383 (2007)
Zeng, Q.T., Tse, T., et al.: Term identification methods for consumer health vocabulary development. Journal of Medical Internet Research 9(1) (2007)
Tseng, Y., Lin, C., Lin, Y.: Text mining techniques for patent analysis. Information Processing and Management 43(5), 1216–1247 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Song, SK., Choi, YS., Chun, HW., Jeong, CH., Choi, SP., Sung, WK. (2011). Multi-words Terminology Recognition Using Web Search. In: Kim, Th., et al. U- and E-Service, Science and Technology. UNESST 2011. Communications in Computer and Information Science, vol 264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27210-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-27210-3_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27209-7
Online ISBN: 978-3-642-27210-3
eBook Packages: Computer ScienceComputer Science (R0)