Skip to main content
Log in

A term normalization method for efficient knowledge acquisition through text processing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The importance of research on knowledge management is growing due to recent issues on Big Data. One of the most fundamental steps in knowledge management is the extraction of terminologies. Terms are often expressed in various forms and the variations often play a negative role, becoming an obstacle which causes knowledge systems to extract unnecessary ones. To solve the problem, we propose a method of term normalization which finds a normalized form (original and standard form defined in dictionaries) of variant terms. The method employs two characteristics of terms: appearance similarity measuring how similar terms are, context similarity measuring how many clue words they share. Through experiment, we show its positive influence of both similarities in term normalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Big Data - Wikipedia, The Free Encyclopedia: http://en.wikipedia.org/wiki/Big_data

  2. Wikipedia: The Free Encyclopedia: http://en.wikipedia.org/wiki/Main_Page

  3. This paper is an extension of the paper [9] presented in a workshop (IWEKSS 2012) held as part of an international conference (ICAISC 2012). And the paper contains qualitative upgrade over all with additional contents such as more examples for easy understanding and new experimental results with abundant test sets.

  4. NDSL (National Discovery for Science Leaders): http://www.ndsl.kr/index.do

  5. DBPedia: http://dbpedia.org/About

  6. The Stanford Natural Language Processing Group: http://nlp.stanford.edu/software/tagger.shtml

  7. The Porter Stemming Algorithm: http://tartarus.org/ martin/PorterStemmer/

References

  1. Bawakid A, Oussalah M (2010) Using features extracted from Wikipedia for the task of word sense disambiguation. In: Proceedings of IEEE international conference on cybernetic intelligent systems, pp 1–6

  2. Brank J, Mladenic D, Grobelnik M, Milic-Frayling N (2008) Feature selection for the classification of large document collections. J Univers Comput Sci 14(10):1562–1596

    MathSciNet  Google Scholar 

  3. Dowdal J, Rinaldi F, Ibekwe-SanJuan F, SanJuan E (2003) Complex structuring of term variants for question answering. In: Proceedings of the ACM workshop on multiword expressions: analysis, acquisition and treatment, vol 18, pp 1–8

  4. Duong TH, Jo G, Jung JJ, Nguyen NT (2009) Complexity analysis of ontology integration methodologies: a comparative study. J Univers Comput Sci 15(4):877–897

    MathSciNet  MATH  Google Scholar 

  5. Fogarolli A (2009) Word sense disambiguation based on Wikipedia link structure. In: Proceedings of IEEE international conference on semantic computing, pp 77–82

  6. Hwang M, Kim P (2009) A new similarity measure for automatic construction of the unknown word lexical dictionary. International Journal on Semantic Web and Information Systems (IJSWIS) 5(1):48–64

    Article  Google Scholar 

  7. Hwang M, Choi D, Kim P (2010) A method for knowledge base enrichment using Wikipedia document information. Information (An International Interdisciplinary Journal) 13(5):1599–1612

    Google Scholar 

  8. Hwang M, Choi D, Choi J, Kim H, Kim P (2010) Similarity measure for semantic document interconnections. Information (An International Interdisciplinary Journal) 13(2):253–267

    Google Scholar 

  9. Hwang M, Jeong D-H, Jung H, Sung W-K, Shin J, Kim P (2012) A term normalization method for better performance of terminology construction. In: International conference on artificial intelligence and soft computing, pp 682–690

  10. Ibekwe-Sanjuan F (1998) Terminological variation, a means of identifying research topics from texts. In: Proceedings of international conference on computational linguistics, vol 1, pp 564–570

  11. Jung JJ (2009) Semantic business process integration based on ontology alignment. Expert Syst Appl 36(8):11013–11020

    Article  Google Scholar 

  12. Jung JJ (2010) Reusing ontology mappings for query segmentation and routing in semantic peer-to-peer environment. Inf Sci 180(7):3248–3257

    Article  Google Scholar 

  13. Jung JJ (2012) Discovering community of lingual practice for matching multilingual tags from folksonomies. Comput J 55(3):337–346

    Article  Google Scholar 

  14. Jung JJ (2012) Online named entity recognition method for microtexts in social networking services: a case study of Twitter. Expert Syst Appl 39(9):8066–8070

    Article  Google Scholar 

  15. Jung H, Yi E, Kim D, Lee GG (2005) Information extraction with automatic knowledge expansion. Inf Process Manag 41(2):217–242

    Article  Google Scholar 

  16. Porter MF (1980) An algorithm for suffix stripping. J Program 14(3):130–137

    Article  Google Scholar 

  17. Song S-K, Choi Y-S, Chun H-W, Jeong C-H, Choi S-P, Sung W-K (2011) Multi-words terminology recognition using Web search. Commun Comput Inf Sci 264:233–238

    Article  Google Scholar 

  18. Toutanova K, Manning CD (2000) Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of joint SIGDAT conference on empirical methods in natural language processing and very large corpora (EMNLP/VLC ’00), pp 63–70

  19. Tsai RT-H, Sung C-L, Dai H-J, Hung H-C, Sung T-Y, Hsu W-L (2006) NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 7(Suppl 5):S11

    Article  Google Scholar 

  20. Velardi P, Cucchiarelli A, Petit M (2007) A taxonomy learning method and its application to characterize a scientific Web community. IEEE Trans Knowl Data Eng 19(2):180–191

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Do-Heon Jeong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwang, M., Jeong, DH., Kim, J. et al. A term normalization method for efficient knowledge acquisition through text processing. Multimed Tools Appl 65, 75–91 (2013). https://doi.org/10.1007/s11042-012-1144-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1144-7

Keywords

Navigation