skip to main content
research-article

Synonymy Expansion Using Link Prediction Methods: A Case Study of Assamese WordNet

Published: 02 November 2021 Publication History

Abstract

WordNets built for low-resource languages, such as Assamese, often use the expansion methodology. This may result in missing lexical entries and missing synonymy relations. As the Assamese WordNet is also built using the expansion method, using the Hindi WordNet, it also has missing synonymy relations. As WordNets can be visualized as a network of unique words connected by synonymy relations, link prediction in complex network analysis is an effective way of predicting missing relations in a network. Hence, to predict the missing synonyms in the Assamese WordNet, link prediction methods were used in the current work that proved effective. It is also observed that for discovering missing relations in the Assamese WordNet, simple local proximity-based methods might be more effective as compared to global and complex supervised models using network embedding. Further, it is noticed that though a set of retrieved words are not synonyms per se, they are semantically related to the target word and may be categorized as semantic cohorts.

References

[1]
Lada A. Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Soc. Netw. 25, 3 (2003), 211–230.
[2]
Lada A. Adamic and Bernardo A. Huberman. 2000. Power-law distribution of the world wide web. Science 287, 5461 (2000), 2115–2115.
[3]
James Allan, Ron Papka, and Victor Lavrenko. 1998. On-line new event detection and tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 37–45.
[4]
Albert-László Barabási. 2009. Scale-free networks: A decade and beyond. Science 325, 5939 (2009), 412–413.
[5]
Himadri Bharali, Mayashree Mahanta, Shikhar Kr. Sarma, Utpal Saikia, and Dibyajyoti Sarmah. 2014. An analytical study of synonymy in Assamese language using WorldNet: Classification and structure. In Proceedings of the 7th Global WordNet Conference. 250–255.
[6]
Pushpak Bhattacharyya. 2017. IndoWordNet. In The WordNet in Indian Languages. Springer, 1–18.
[7]
Vincent D. Blondel, Anahí Gajardo, Maureen Heymans, Pierre Senellart, and Paul Van Dooren. 2004. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev. 46, 4 (2004), 647–666.
[8]
Vincent D. Blondel and Pierre P. Senellart. 2002. Automatic extraction of synonyms in a dictionary. Vertex 1 (2002), x1.
[9]
C. Chandramouli and Registrar General. 2011. Census of India. Rural Urban Distribution of Population, Provisional Population Total. New Delhi: Office of the Registrar General and Census Commissioner, India.
[10]
Chrysanne DiMarco, Graeme Hirst, and Manfred Stede. 1993. The semantic and stylistic differentiation of synonyms and near-synonyms. In Proceedings of the AAAI Spring Symposium on Building Lexicons for Machine Translation. 114–121.
[11]
Nathan Eagle, Alex Sandy Pentland, and David Lazer. 2009. Inferring friendship network structure by using mobile phone data. Proc. Nat. Acad. Sci. 106, 36 (2009), 15274–15278.
[12]
Philip Edmonds and Graeme Hirst. 2002. Near-synonymy and lexical choice. Comput. Ling. 28, 2 (2002), 105–144.
[13]
Philip Glenny Edmonds. 2000. Semantic Representations of Near-synonyms for Automatic Lexical Choice.University of Toronto.
[14]
Hongliang Fei, Shulong Tan, and Ping Li. 2019. Hierarchical multi-task word embedding learning for synonym prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 834–842.
[15]
Wolfgang Glänzel and András Schubert. 2004. Analysing scientific networks through co-authorship. In Handbook of Quantitative Science and Technology Research. Springer, 257–276.
[16]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864.
[17]
James A. Hanley and Barbara J. McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology 143, 1 (1982), 29–36.
[18]
Bradley Hauer and Grzegorz Kondrak. 2020. Synonymy= translational equivalence. arXiv preprint arXiv:2004.13886 (2020).
[19]
Yeye He, Kaushik Chakrabarti, Tao Cheng, and Tomasz Tylenda. 2016. Automatic discovery of attribute synonyms using query logs and table corpora. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1429–1439.
[20]
Jan Jannink and Gio Wiederhold. 1999. Thesaurus entry extraction from an on-line dictionary. In Proceedings of Fusion, Vol. 99. Citeseer.
[21]
Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 538–543.
[22]
Hisashi Kashima, Tsuyoshi Kato, Yoshihiro Yamanishi, Masashi Sugiyama, and Koji Tsuda. 2009. Link propagation: A fast semi-supervised learning algorithm for link prediction. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 1100–1111.
[23]
Thomas N. Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).
[24]
Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604–632.
[25]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
[26]
István A. Kovács, Katja Luck, Kerstin Spirohn, Yang Wang, Carl Pollis, Sadie Schlabach, Wenting Bian, Dae-Kyum Kim, Nishka Kishore, Tong Hao, et al. 2019. Network-based prediction of protein interactions. Nat. Commun. 10, 1 (2019), 1240.
[27]
Artuur Leeuwenberg, Mihaela Vela, Jon Dehdari, and Josef van Genabith. 2016. A minimally supervised approach for synonym extraction with word embeddings. Prague Bull. Math. Ling. 105, 1 (2016), 111–142.
[28]
Chengwei Lei and Jianhua Ruan. 2012. A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity. Bioinformatics 29, 3 (2012), 355–364.
[29]
David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. J. Amer. Soc. Inf. Sci. Technol. 58, 7 (2007), 1019–1031.
[30]
Aditya Krishna Menon and Charles Elkan. 2011. Link prediction via matrix factorization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 437–452.
[31]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[32]
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database. Int. J. Lexicog. 3, 4 (1990), 235–244.
[33]
Frederick C. Mish et al. 2003. Merriam-Webster’s Collegiate Dictionary (11th ed.). Merriam-Webster, Springfield, MA.
[34]
M. Lynne Murphy and Anu Koskela. 2010. Key Terms in Semantics. A&C Black.
[35]
Emmanuel Navarro, Franck Sajous, Bruno Gaume, Laurent Prévot, Hsieh ShuKai, Kuo Tzu-Yi, Pierre Magistry, and Huang Chu-Ren. 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources. Association for Computational Linguistics, 19–27.
[36]
Hans Rudolf Ott, Petra Rudolf, and Frank Schweitzer. 1998. The European Physical Journal: Condensed Matter and Complex Systems. B. Springer.
[37]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report. Stanford InfoLab.
[38]
Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2, 1–2 (2008), 1–135.
[39]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.
[40]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 701–710.
[41]
Manisha Pujari and Rushed Kanawati. 2012. Link prediction in complex networks by supervised rank aggregation. In Proceedings of the IEEE 24th International Conference on Tools with Artificial Intelligence. IEEE, 782–789.
[42]
Longhua Qian, Guodong Zhou, Fang Kong, and Qiaoming Zhu. 2009. Semi-supervised learning for semantic relation classification using stratified sampling strategy. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1437–1445.
[43]
Meng Qu, Xiang Ren, and Jiawei Han. 2017. Automatic synonym discovery with knowledge bases. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 997–1005.
[44]
Richard Reichert, John Olney, and James Paris. 1969. Two Dictionary Transcripts and Programs for Processing Them. Volume I. The Encoding Scheme, Parsent and Conix.Technical Report. System Development Corp., Santa Monica, CA.
[45]
Shikhar Kr. Sarma, Moromi Gogoi, Utpal Saikia, and Rakesh Medhi. 2010. Foundation and structure of developing Assamese WordNet. In Proceedings of the 5th International Conference of the Global WordNet Association (GWC).
[46]
Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In Proceedings of the European Semantic Web Conference. Springer, 593–607.
[47]
Benno Schwikowski, Peter Uetz, and Stanley Fields. 2000. A network of protein–protein interactions in yeast. Nature Biotechnol. 18, 12 (2000), 1257–1261.
[48]
Jiaming Shen, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, and Jiawei Han. 2019. Mining entity synonyms with efficient neural set generation. In Proceedings of the AAAI Conference on Artificial Intelligence. 249–256.
[49]
Amit Singhal et al. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24, 4 (2001), 35–43.
[50]
Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1297–1304.
[51]
Mark Steyvers and Josh Tenenbaum. 2005. The large-scale structure of semantic networks. Cogn. Sci. 29, 1 (2005), 41–78. DOI:https://doi.org/10.1207/s15516709cog2901_3
[52]
Panagiotis Symeonidis and Eleftherios Tiakas. 2014. Transitive node similarity: Predicting and recommending links in signed social networks. World Wide Web 17, 4 (2014), 743–776.
[53]
Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. Verse: Versatile graph embeddings from similarity measures. In Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 539–548.
[54]
Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, and Alexander Panchenko. 2017. Fighting with the sparsity of synonymy dictionaries for automatic synset induction. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts. Springer, 94–105.
[55]
Peng Wang, BaoWen Xu, YuRong Wu, and XiaoYu Zhou. 2015. Link prediction in social networks: The state-of-the-art. Sci. China Inf. Sci. 58, 1 (2015), 1–38.
[56]
Tong Wang and Graeme Hirst. 2009. Extracting synonyms from dictionary definitions. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. 471–477.
[57]
Xiao Fan Wang and Guanrong Chen. 2003. Complex networks: Small-world, scale-free and beyond. IEEE Circ. Syst. Mag. 3, 1 (2003), 6–20.
[58]
Julie Weeds, Daoud Clarke, Jeremy Reffin, David Weir, and Bill Keller. 2014. Learning to distinguish hypernyms and co-hyponyms. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers.

Cited By

View all
  • (2023)Synonym recognition from short textsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119966224:COnline publication date: 15-Aug-2023

Index Terms

  1. Synonymy Expansion Using Link Prediction Methods: A Case Study of Assamese WordNet

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 1
    January 2022
    442 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3494068
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2021
    Accepted: 01 May 2021
    Revised: 01 December 2020
    Received: 01 July 2020
    Published in TALLIP Volume 21, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Automatic extraction
    2. synonymy
    3. social network analysis
    4. synonymy network
    5. assamese
    6. neural networks
    7. Indian languages
    8. low resource languages
    9. semantic cohort

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Synonym recognition from short textsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119966224:COnline publication date: 15-Aug-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media