An Integrated Approach to Automatic Synonym Detection in Turkish Corpus

Yıldız, Tuğba; Yıldırım, Savaş; Diri, Banu

doi:10.1007/978-3-319-10888-9_12

Tuğba Yıldız²⁰,
Savaş Yıldırım²⁰ &
Banu Diri²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8686))

Included in the following conference series:

International Conference on Natural Language Processing

2113 Accesses
3 Citations

Abstract

In this study, we designed a model to determine synonymy. Our main assumption is that synonym pairs show similar semantic and dependency relation by the definition. They share same meronym/holonym and hypernym/hyponym relations. Contrary to synonymy, hypernymy and meronymy relations can probably be acquired by applying lexico-syntactic patterns to a big corpus. Such acquisition might be utilized and ease detection of synonymy. Likewise, we utilized some particular dependency relations such as object/subject of a verb, etc. Machine learning algorithms were applied on all these acquired features. The first aim is to find out which dependency and semantic features are the most informative and contribute most to the model. Performance of each feature is individually evaluated with cross validation. The model that combines all features shows promising results and successfully detects synonymy relation. The main contribution of the study is to integrate both semantic and dependency relation within distributional aspect. Second contribution is considered as being first major attempt for Turkish synonym identification based on corpus-driven approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Expert Assessment of Synonymic Rows in RuWordNet

A Study on Turkish Meronym Extraction Using a Variety of Lexico-Syntactic Patterns

A Study on Chinese Synonyms: From the Perspective of Collocations

References

Mandala, R., Tokunaga, T., Tanaka, H.: Combining Multiple Evidence from Different Types of Thesaurus for Query Expansion. In: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, pp. 191–197 (1999)
Google Scholar
Bai, J., Song, D., Bruza, P., Nie, J., Cao, G.: Query Expansion Using Term Relationships in Language Models for Information Retrieval. In: 14th ACM International Conference on Information and Knowledge Management, Bremen, Germany, pp. 688–695 (2005)
Google Scholar
Stefan, R., Liu, Y., Vasserman, A.: Translating Queries into Snippets for Improved Query Expansion. In: 22nd International Conference on Computational Linguistics, COLING 2008, Manchester, UK, pp. 737–744 (2008)
Google Scholar
Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, Canada, pp. 768–774 (1998)
Google Scholar
Inkpen, D.: A Statistical Model for Near-synonym Choice. ACM Transactions on Speech and Language Processing 4(1), 1–17 (2007)
Article MathSciNet Google Scholar
Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. In: Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 10–17 (1997)
Google Scholar
Inkpen, D.Z., Hirst, G.: Near-synonym Choice in Natural Language Generation. In: Recent Advances in Natural Language Processing III, Selected Papers from RANLP 2003, Borovets, Bulgaria, pp. 141–152 (2003)
Google Scholar
McCarthy, D., Navigli, R.: The English Lexical Substitution Task. Language Resources and Evaluation 43(2), 139–159 (2009)
Article Google Scholar
Mirkin, S., Dagan, I., Geffet, M.: Integrating Pattern-Based and Distributional Similarity Methods for Lexical Entailment Acquisition. In: Proceedings of the COLING/ACL 2006 on Main Conference Poster Sessions, Sydney, Austraila, pp. 579–586 (2006)
Google Scholar
Harris, Z.: Distributional Structure. Word 10(23), 146–162 (1954)
Google Scholar
Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th International Conference on Computational Linguistics, COLING 1992, Nantes, France, pp. 539–545 (1992)
Google Scholar
Lin, D., Zhao, S., Qin, L., Zhou, M.: Identifying Synonyms among Distributionally Similar Words. In: IJCAI 2003, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 1492–1493 (2003)
Google Scholar
Wang, W., Thomas, C., Sheth, A.P., Chan, V.: Pattern-based Synonym and Antonym Extraction. In: Proceedings of the 48th Annual Southeast Regional Conference, Oxford, MS, USA, p. 64 (2010)
Google Scholar
Hagiwara, M.: A Supervised Learning Approach to Automatic Synonym Identification Based on Distributional Features. In: 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop, Columbus, OH, pp. 1–6 (2008)
Google Scholar
Heylen, K., Peirsman, Y., Geeraerts, D., Speelman, D.: Modelling Word Similarity: an Evaluation of Automatic Synonymy Extraction Algorithms. In: International Conference on Language Resources and Evaluation, LREC (2008)
Google Scholar
Hindle, D.: Noun Classification from Predicate-Argument Structures. In: 28th Annual Meeting of the Association for Computational Linguistics, Pittsburgh, Pennsylvania, USA, pp. 268–275 (1990)
Google Scholar
Gasperin, C., Gamallo, P., Agustini, A., Lopes, G., Lima, V.: Using Syntactic Contexts for Measuring Word Similarity. In: Workshop on Knowledge Acquisition and Categorization, ESSLLI (2001)
Google Scholar
Curran, J.R., Moens, M.: Improvements in Automatic Thesaurus Extraction. In: ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, USA, pp. 59–66 (2002)
Google Scholar
van der Plas, L., Bouma, G.: Syntactic Contexts for Finding Semantically Related Words. In: Meeting of Computational Linguistics in the Netherlands (CLIN), Amsterdam, pp. 173–186 (2005)
Google Scholar
Barzilay, R., McKeown, K.: Extracting Paraphrases from a Parallel Corpus. In: 39th Annual Meeting and 10th Conference of the European Chapter, Proceedings of the Conference, Toulouse, France, pp. 50–57 (2001)
Google Scholar
Ibrahim, A., Katz, B., Lin, J.: Extracting Structural Paraphrases from Aligned Monolingual Corpora. In: The Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications, Sapporo, Japan, pp. 57–64 (2003)
Google Scholar
Shimohata, M., Sumita, E.: Automatic Paraphrasing Based on Parallel Corpus for Normalization. In: Third International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, Spain, pp. 453–457 (2002)
Google Scholar
van der Plas, L., Tiedemann, J.: Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity. In: 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 866–873 (2006)
Google Scholar
Blondel, V.D., Sennelart, P.: Automatic Extraction of Synonyms in a Dictionary. In: SIAM Workshop on Text Mining, Arlington, VA (2002)
Google Scholar
Wang, T., Hirst, G.: Exploring Patterns in Dictionary Definitions for Synonym Extraction. Natural Language Engineering 18(3), 313–342 (2012)
Article Google Scholar
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New Experiments in Distributional Representations of Synonymy. In: Ninth Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, Michigan, pp. 25–32 (2005)
Google Scholar
Terra, E., Clarke, C.: Frequency Estimates for Statistical Word Similarity Measures. In: HTL/NAACL 2003, Edmonton, Canada, pp. 165–172 (2003)
Google Scholar
Turney, P.D., Littman, M.L., Bigham, J., Shnayder, V.: Combining Independent Modules in Lexical Multiple-choice Problems. In: Recent Advances in Natural Language Processing III, Selected Papers from RANLP 2003, Borovets, Bulgaria, pp. 101–110 (2003)
Google Scholar
Turney, P.D.: A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations. In: 22nd International Conference on Computational Linguistics, Coling 2008, Manchester, UK, pp. 905–912 (2008)
Google Scholar
Yates, A., Goharian, N., Frieder, O.: Graded Relevance Ranking for Synonym Discovery. In: 22nd International World Wide Web Conference, WWW 2013, Rio de Janeiro, Brazil, pp. 139–140 (2013)
Google Scholar
Hagiwara, M., Ogawa, Y., Toyama, K.: Selection of Effective Contextual Information for Automatic Synonym Acquisition. In: 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 353–360 (2006)
Google Scholar
Yazici, E., Amasyali, M.F.: Automatic Extraction of Semantic Relationships using Turkish Dictionary Definitions. In: EMO Bilimsel Dergi, Istanbul (2011)
Google Scholar
Sak, H., Güngör, T., Saraçlar, M.: Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 417–427. Springer, Heidelberg (2008)
Chapter Google Scholar
Serbetci, A., Orhan, Z., Pehlivan, I.: Extraction of Semantic Word Relations in Turkish from Dictionary Definitions. In: ACL 2011 Workshop on Relational Models of Semantics, Portland, pp. 11–18 (2011)
Google Scholar
Orhan, Z., Pehlivan, I., Uslan, V., Onder, P.: Automated Extraction of Semantic Word Relations in Turkish Lexicon. Mathematical and Computational Applications (1), 13–22 (2011)
Google Scholar
Yildirim, S., Yildiz, T.: Automatic Extraction of Turkish Hypernym-Hyponym Pairs From Large Corpus. In: COLING (Demos), pp. 493–500 (2012)
Google Scholar
Yildiz, T., Diri, B., Yildirim, S.: Analysis of Lexico-syntactic Patterns for Meronym Extraction from a Turkish Corpus. In: 6th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 126–138 (2013)
Google Scholar
Yıldız, T., Yıldırım, S., Diri, B.: Extraction of Part-Whole Relations from Turkish Corpora. In: Gelbukh, A. (ed.) CICLing 2013, Part I. LNCS, vol. 7816, pp. 126–138. Springer, Heidelberg (2013)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Istanbul Bilgi University, Eski Silahtarağa Elektrik Santrali, Kaz.m Karabekir Cad. No: 2/13, Eyüp, 34060, Istanbul, Turkey
Tuğba Yıldız & Savaş Yıldırım
Department of Computer Engineering, Yildiz Technical University, Davutpasa, 34349, Istanbul, Turkey
Banu Diri

Authors

Tuğba Yıldız
View author publications
You can also search for this author in PubMed Google Scholar
Savaş Yıldırım
View author publications
You can also search for this author in PubMed Google Scholar
Banu Diri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Adam Przepiórkowski & Maciej Ogrodniczuk &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yıldız, T., Yıldırım, S., Diri, B. (2014). An Integrated Approach to Automatic Synonym Detection in Turkish Corpus. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-10888-9_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10887-2
Online ISBN: 978-3-319-10888-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics