Skip to main content

An Integrated Approach to Automatic Synonym Detection in Turkish Corpus

  • Conference paper
Advances in Natural Language Processing (NLP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8686))

Included in the following conference series:

Abstract

In this study, we designed a model to determine synonymy. Our main assumption is that synonym pairs show similar semantic and dependency relation by the definition. They share same meronym/holonym and hypernym/hyponym relations. Contrary to synonymy, hypernymy and meronymy relations can probably be acquired by applying lexico-syntactic patterns to a big corpus. Such acquisition might be utilized and ease detection of synonymy. Likewise, we utilized some particular dependency relations such as object/subject of a verb, etc. Machine learning algorithms were applied on all these acquired features. The first aim is to find out which dependency and semantic features are the most informative and contribute most to the model. Performance of each feature is individually evaluated with cross validation. The model that combines all features shows promising results and successfully detects synonymy relation. The main contribution of the study is to integrate both semantic and dependency relation within distributional aspect. Second contribution is considered as being first major attempt for Turkish synonym identification based on corpus-driven approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Mandala, R., Tokunaga, T., Tanaka, H.: Combining Multiple Evidence from Different Types of Thesaurus for Query Expansion. In: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, pp. 191–197 (1999)

    Google Scholar 

  2. Bai, J., Song, D., Bruza, P., Nie, J., Cao, G.: Query Expansion Using Term Relationships in Language Models for Information Retrieval. In: 14th ACM International Conference on Information and Knowledge Management, Bremen, Germany, pp. 688–695 (2005)

    Google Scholar 

  3. Stefan, R., Liu, Y., Vasserman, A.: Translating Queries into Snippets for Improved Query Expansion. In: 22nd International Conference on Computational Linguistics, COLING 2008, Manchester, UK, pp. 737–744 (2008)

    Google Scholar 

  4. Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, Canada, pp. 768–774 (1998)

    Google Scholar 

  5. Inkpen, D.: A Statistical Model for Near-synonym Choice. ACM Transactions on Speech and Language Processing 4(1), 1–17 (2007)

    Article  MathSciNet  Google Scholar 

  6. Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. In: Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 10–17 (1997)

    Google Scholar 

  7. Inkpen, D.Z., Hirst, G.: Near-synonym Choice in Natural Language Generation. In: Recent Advances in Natural Language Processing III, Selected Papers from RANLP 2003, Borovets, Bulgaria, pp. 141–152 (2003)

    Google Scholar 

  8. McCarthy, D., Navigli, R.: The English Lexical Substitution Task. Language Resources and Evaluation 43(2), 139–159 (2009)

    Article  Google Scholar 

  9. Mirkin, S., Dagan, I., Geffet, M.: Integrating Pattern-Based and Distributional Similarity Methods for Lexical Entailment Acquisition. In: Proceedings of the COLING/ACL 2006 on Main Conference Poster Sessions, Sydney, Austraila, pp. 579–586 (2006)

    Google Scholar 

  10. Harris, Z.: Distributional Structure. Word 10(23), 146–162 (1954)

    Google Scholar 

  11. Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th International Conference on Computational Linguistics, COLING 1992, Nantes, France, pp. 539–545 (1992)

    Google Scholar 

  12. Lin, D., Zhao, S., Qin, L., Zhou, M.: Identifying Synonyms among Distributionally Similar Words. In: IJCAI 2003, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 1492–1493 (2003)

    Google Scholar 

  13. Wang, W., Thomas, C., Sheth, A.P., Chan, V.: Pattern-based Synonym and Antonym Extraction. In: Proceedings of the 48th Annual Southeast Regional Conference, Oxford, MS, USA, p. 64 (2010)

    Google Scholar 

  14. Hagiwara, M.: A Supervised Learning Approach to Automatic Synonym Identification Based on Distributional Features. In: 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop, Columbus, OH, pp. 1–6 (2008)

    Google Scholar 

  15. Heylen, K., Peirsman, Y., Geeraerts, D., Speelman, D.: Modelling Word Similarity: an Evaluation of Automatic Synonymy Extraction Algorithms. In: International Conference on Language Resources and Evaluation, LREC (2008)

    Google Scholar 

  16. Hindle, D.: Noun Classification from Predicate-Argument Structures. In: 28th Annual Meeting of the Association for Computational Linguistics, Pittsburgh, Pennsylvania, USA, pp. 268–275 (1990)

    Google Scholar 

  17. Gasperin, C., Gamallo, P., Agustini, A., Lopes, G., Lima, V.: Using Syntactic Contexts for Measuring Word Similarity. In: Workshop on Knowledge Acquisition and Categorization, ESSLLI (2001)

    Google Scholar 

  18. Curran, J.R., Moens, M.: Improvements in Automatic Thesaurus Extraction. In: ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, USA, pp. 59–66 (2002)

    Google Scholar 

  19. van der Plas, L., Bouma, G.: Syntactic Contexts for Finding Semantically Related Words. In: Meeting of Computational Linguistics in the Netherlands (CLIN), Amsterdam, pp. 173–186 (2005)

    Google Scholar 

  20. Barzilay, R., McKeown, K.: Extracting Paraphrases from a Parallel Corpus. In: 39th Annual Meeting and 10th Conference of the European Chapter, Proceedings of the Conference, Toulouse, France, pp. 50–57 (2001)

    Google Scholar 

  21. Ibrahim, A., Katz, B., Lin, J.: Extracting Structural Paraphrases from Aligned Monolingual Corpora. In: The Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications, Sapporo, Japan, pp. 57–64 (2003)

    Google Scholar 

  22. Shimohata, M., Sumita, E.: Automatic Paraphrasing Based on Parallel Corpus for Normalization. In: Third International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, Spain, pp. 453–457 (2002)

    Google Scholar 

  23. van der Plas, L., Tiedemann, J.: Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity. In: 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 866–873 (2006)

    Google Scholar 

  24. Blondel, V.D., Sennelart, P.: Automatic Extraction of Synonyms in a Dictionary. In: SIAM Workshop on Text Mining, Arlington, VA (2002)

    Google Scholar 

  25. Wang, T., Hirst, G.: Exploring Patterns in Dictionary Definitions for Synonym Extraction. Natural Language Engineering 18(3), 313–342 (2012)

    Article  Google Scholar 

  26. Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New Experiments in Distributional Representations of Synonymy. In: Ninth Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, Michigan, pp. 25–32 (2005)

    Google Scholar 

  27. Terra, E., Clarke, C.: Frequency Estimates for Statistical Word Similarity Measures. In: HTL/NAACL 2003, Edmonton, Canada, pp. 165–172 (2003)

    Google Scholar 

  28. Turney, P.D., Littman, M.L., Bigham, J., Shnayder, V.: Combining Independent Modules in Lexical Multiple-choice Problems. In: Recent Advances in Natural Language Processing III, Selected Papers from RANLP 2003, Borovets, Bulgaria, pp. 101–110 (2003)

    Google Scholar 

  29. Turney, P.D.: A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations. In: 22nd International Conference on Computational Linguistics, Coling 2008, Manchester, UK, pp. 905–912 (2008)

    Google Scholar 

  30. Yates, A., Goharian, N., Frieder, O.: Graded Relevance Ranking for Synonym Discovery. In: 22nd International World Wide Web Conference, WWW 2013, Rio de Janeiro, Brazil, pp. 139–140 (2013)

    Google Scholar 

  31. Hagiwara, M., Ogawa, Y., Toyama, K.: Selection of Effective Contextual Information for Automatic Synonym Acquisition. In: 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 353–360 (2006)

    Google Scholar 

  32. Yazici, E., Amasyali, M.F.: Automatic Extraction of Semantic Relationships using Turkish Dictionary Definitions. In: EMO Bilimsel Dergi, Istanbul (2011)

    Google Scholar 

  33. Sak, H., Güngör, T., Saraçlar, M.: Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 417–427. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  34. Serbetci, A., Orhan, Z., Pehlivan, I.: Extraction of Semantic Word Relations in Turkish from Dictionary Definitions. In: ACL 2011 Workshop on Relational Models of Semantics, Portland, pp. 11–18 (2011)

    Google Scholar 

  35. Orhan, Z., Pehlivan, I., Uslan, V., Onder, P.: Automated Extraction of Semantic Word Relations in Turkish Lexicon. Mathematical and Computational Applications (1), 13–22 (2011)

    Google Scholar 

  36. Yildirim, S., Yildiz, T.: Automatic Extraction of Turkish Hypernym-Hyponym Pairs From Large Corpus. In: COLING (Demos), pp. 493–500 (2012)

    Google Scholar 

  37. Yildiz, T., Diri, B., Yildirim, S.: Analysis of Lexico-syntactic Patterns for Meronym Extraction from a Turkish Corpus. In: 6th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 126–138 (2013)

    Google Scholar 

  38. Yıldız, T., Yıldırım, S., Diri, B.: Extraction of Part-Whole Relations from Turkish Corpora. In: Gelbukh, A. (ed.) CICLing 2013, Part I. LNCS, vol. 7816, pp. 126–138. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Yıldız, T., Yıldırım, S., Diri, B. (2014). An Integrated Approach to Automatic Synonym Detection in Turkish Corpus. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10888-9_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10887-2

  • Online ISBN: 978-3-319-10888-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics