research-article

Synonymy Expansion Using Link Prediction Methods: A Case Study of Assamese WordNet

Authors:

Bornali Phukon,

Sanasam Ranbir Singh,

Priyankoo SarmahAuthors Info & Claims

Transactions on Asian and Low-Resource Language Information Processing, Volume 21, Issue 1

Article No.: 15, Pages 1 - 21

https://doi.org/10.1145/3467966

Published: 02 November 2021 Publication History

Abstract

WordNets built for low-resource languages, such as Assamese, often use the expansion methodology. This may result in missing lexical entries and missing synonymy relations. As the Assamese WordNet is also built using the expansion method, using the Hindi WordNet, it also has missing synonymy relations. As WordNets can be visualized as a network of unique words connected by synonymy relations, link prediction in complex network analysis is an effective way of predicting missing relations in a network. Hence, to predict the missing synonyms in the Assamese WordNet, link prediction methods were used in the current work that proved effective. It is also observed that for discovering missing relations in the Assamese WordNet, simple local proximity-based methods might be more effective as compared to global and complex supervised models using network embedding. Further, it is noticed that though a set of retrieved words are not synonyms per se, they are semantically related to the target word and may be categorized as semantic cohorts.

References

[1]

Lada A. Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Soc. Netw. 25, 3 (2003), 211–230.

[2]

Lada A. Adamic and Bernardo A. Huberman. 2000. Power-law distribution of the world wide web. Science 287, 5461 (2000), 2115–2115.

[3]

James Allan, Ron Papka, and Victor Lavrenko. 1998. On-line new event detection and tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 37–45.

Digital Library

[4]

Albert-László Barabási. 2009. Scale-free networks: A decade and beyond. Science 325, 5939 (2009), 412–413.

[5]

Himadri Bharali, Mayashree Mahanta, Shikhar Kr. Sarma, Utpal Saikia, and Dibyajyoti Sarmah. 2014. An analytical study of synonymy in Assamese language using WorldNet: Classification and structure. In Proceedings of the 7th Global WordNet Conference. 250–255.

[6]

Pushpak Bhattacharyya. 2017. IndoWordNet. In The WordNet in Indian Languages. Springer, 1–18.

[7]

Vincent D. Blondel, Anahí Gajardo, Maureen Heymans, Pierre Senellart, and Paul Van Dooren. 2004. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev. 46, 4 (2004), 647–666.

Digital Library

[8]

Vincent D. Blondel and Pierre P. Senellart. 2002. Automatic extraction of synonyms in a dictionary. Vertex 1 (2002), x1.

[9]

C. Chandramouli and Registrar General. 2011. Census of India. Rural Urban Distribution of Population, Provisional Population Total. New Delhi: Office of the Registrar General and Census Commissioner, India.

[10]

Chrysanne DiMarco, Graeme Hirst, and Manfred Stede. 1993. The semantic and stylistic differentiation of synonyms and near-synonyms. In Proceedings of the AAAI Spring Symposium on Building Lexicons for Machine Translation. 114–121.

[11]

Nathan Eagle, Alex Sandy Pentland, and David Lazer. 2009. Inferring friendship network structure by using mobile phone data. Proc. Nat. Acad. Sci. 106, 36 (2009), 15274–15278.

[12]

Philip Edmonds and Graeme Hirst. 2002. Near-synonymy and lexical choice. Comput. Ling. 28, 2 (2002), 105–144.

Digital Library

[13]

Philip Glenny Edmonds. 2000. Semantic Representations of Near-synonyms for Automatic Lexical Choice.University of Toronto.

[14]

Hongliang Fei, Shulong Tan, and Ping Li. 2019. Hierarchical multi-task word embedding learning for synonym prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 834–842.

Digital Library

[15]

Wolfgang Glänzel and András Schubert. 2004. Analysing scientific networks through co-authorship. In Handbook of Quantitative Science and Technology Research. Springer, 257–276.

[16]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864.

Digital Library

[17]

James A. Hanley and Barbara J. McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology 143, 1 (1982), 29–36.

[18]

Bradley Hauer and Grzegorz Kondrak. 2020. Synonymy= translational equivalence. arXiv preprint arXiv:2004.13886 (2020).

[19]

Yeye He, Kaushik Chakrabarti, Tao Cheng, and Tomasz Tylenda. 2016. Automatic discovery of attribute synonyms using query logs and table corpora. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1429–1439.

Digital Library

[20]

Jan Jannink and Gio Wiederhold. 1999. Thesaurus entry extraction from an on-line dictionary. In Proceedings of Fusion, Vol. 99. Citeseer.

[21]

Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 538–543.

Digital Library

[22]

Hisashi Kashima, Tsuyoshi Kato, Yoshihiro Yamanishi, Masashi Sugiyama, and Koji Tsuda. 2009. Link propagation: A fast semi-supervised learning algorithm for link prediction. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 1100–1111.

[23]

Thomas N. Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).

[24]

Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604–632.

Digital Library

[25]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.

Digital Library

[26]

István A. Kovács, Katja Luck, Kerstin Spirohn, Yang Wang, Carl Pollis, Sadie Schlabach, Wenting Bian, Dae-Kyum Kim, Nishka Kishore, Tong Hao, et al. 2019. Network-based prediction of protein interactions. Nat. Commun. 10, 1 (2019), 1240.

[27]

Artuur Leeuwenberg, Mihaela Vela, Jon Dehdari, and Josef van Genabith. 2016. A minimally supervised approach for synonym extraction with word embeddings. Prague Bull. Math. Ling. 105, 1 (2016), 111–142.

[28]

Chengwei Lei and Jianhua Ruan. 2012. A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity. Bioinformatics 29, 3 (2012), 355–364.

Digital Library

[29]

David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. J. Amer. Soc. Inf. Sci. Technol. 58, 7 (2007), 1019–1031.

Digital Library

[30]

Aditya Krishna Menon and Charles Elkan. 2011. Link prediction via matrix factorization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 437–452.

Digital Library

[31]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[32]

George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database. Int. J. Lexicog. 3, 4 (1990), 235–244.

[33]

Frederick C. Mish et al. 2003. Merriam-Webster’s Collegiate Dictionary (11th ed.). Merriam-Webster, Springfield, MA.

[34]

M. Lynne Murphy and Anu Koskela. 2010. Key Terms in Semantics. A&C Black.

[35]

Emmanuel Navarro, Franck Sajous, Bruno Gaume, Laurent Prévot, Hsieh ShuKai, Kuo Tzu-Yi, Pierre Magistry, and Huang Chu-Ren. 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources. Association for Computational Linguistics, 19–27.

[36]

Hans Rudolf Ott, Petra Rudolf, and Frank Schweitzer. 1998. The European Physical Journal: Condensed Matter and Complex Systems. B. Springer.

[37]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report. Stanford InfoLab.

[38]

Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2, 1–2 (2008), 1–135.

Digital Library

[39]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.

[40]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 701–710.

Digital Library

[41]

Manisha Pujari and Rushed Kanawati. 2012. Link prediction in complex networks by supervised rank aggregation. In Proceedings of the IEEE 24th International Conference on Tools with Artificial Intelligence. IEEE, 782–789.

Digital Library

[42]

Longhua Qian, Guodong Zhou, Fang Kong, and Qiaoming Zhu. 2009. Semi-supervised learning for semantic relation classification using stratified sampling strategy. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1437–1445.

[43]

Meng Qu, Xiang Ren, and Jiawei Han. 2017. Automatic synonym discovery with knowledge bases. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 997–1005.

Digital Library

[44]

Richard Reichert, John Olney, and James Paris. 1969. Two Dictionary Transcripts and Programs for Processing Them. Volume I. The Encoding Scheme, Parsent and Conix.Technical Report. System Development Corp., Santa Monica, CA.

[45]

Shikhar Kr. Sarma, Moromi Gogoi, Utpal Saikia, and Rakesh Medhi. 2010. Foundation and structure of developing Assamese WordNet. In Proceedings of the 5th International Conference of the Global WordNet Association (GWC).

[46]

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In Proceedings of the European Semantic Web Conference. Springer, 593–607.

Digital Library

[47]

Benno Schwikowski, Peter Uetz, and Stanley Fields. 2000. A network of protein–protein interactions in yeast. Nature Biotechnol. 18, 12 (2000), 1257–1261.

[48]

Jiaming Shen, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, and Jiawei Han. 2019. Mining entity synonyms with efficient neural set generation. In Proceedings of the AAAI Conference on Artificial Intelligence. 249–256.

Digital Library

[49]

Amit Singhal et al. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24, 4 (2001), 35–43.

[50]

Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1297–1304.

[51]

Mark Steyvers and Josh Tenenbaum. 2005. The large-scale structure of semantic networks. Cogn. Sci. 29, 1 (2005), 41–78. DOI:https://doi.org/10.1207/s15516709cog2901_3

[52]

Panagiotis Symeonidis and Eleftherios Tiakas. 2014. Transitive node similarity: Predicting and recommending links in signed social networks. World Wide Web 17, 4 (2014), 743–776.

Digital Library

[53]

Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. Verse: Versatile graph embeddings from similarity measures. In Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 539–548.

Digital Library

[54]

Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, and Alexander Panchenko. 2017. Fighting with the sparsity of synonymy dictionaries for automatic synset induction. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts. Springer, 94–105.

[55]

Peng Wang, BaoWen Xu, YuRong Wu, and XiaoYu Zhou. 2015. Link prediction in social networks: The state-of-the-art. Sci. China Inf. Sci. 58, 1 (2015), 1–38.

[56]

Tong Wang and Graeme Hirst. 2009. Extracting synonyms from dictionary definitions. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 471–477.

[57]

Xiao Fan Wang and Guanrong Chen. 2003. Complex networks: Small-world, scale-free and beyond. IEEE Circ. Syst. Mag. 3, 1 (2003), 6–20.

[58]

Julie Weeds, Daoud Clarke, Jeremy Reffin, David Weir, and Bill Keller. 2014. Learning to distinguish hypernyms and co-hyponyms. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers.

Cited By

Mu LJin PZhang YZhong HZhao J(2023)Synonym recognition from short textsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119966224:COnline publication date: 15-Aug-2023
https://dl.acm.org/doi/10.1016/j.eswa.2023.119966

Index Terms

Synonymy Expansion Using Link Prediction Methods: A Case Study of Assamese WordNet
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Dictionaries

Recommendations

Development of Part of Speech Tagger for Assamese Using HMM

This article presents the work on the Part-of-Speech Tagger for Assamese based on Hidden Markov Model HMM. Over the years, a lot of language processing tasks have been done for Western and South-Asian languages. However, very little work is done for ...
A Lemmatizer for Low-resource Languages: WSD and Its Role in the Assamese Language
The morphological variations of highly inflected languages that appear in a text impede the progress of computer processing and root word determination tasks while extracting an abstract. As a remedy to this difficulty, a lemmatization algorithm is ...
Exploring Character-Level Deep Learning Models for POS Tagging in Assamese Language
Abstract
The proposed research investigates a novel approach of character-level Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (Bi-LSTM) for part-of-speech (POS) tagging in the Assamese language. The proposed work contributes to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 1

January 2022

442 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3494068

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2021

Accepted: 01 May 2021

Revised: 01 December 2020

Received: 01 July 2020

Published in TALLIP Volume 21, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
347
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mu LJin PZhang YZhong HZhao J(2023)Synonym recognition from short textsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119966224:COnline publication date: 15-Aug-2023
https://dl.acm.org/doi/10.1016/j.eswa.2023.119966

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents