skip to main content
10.1145/3297280.3297376acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Detecting reliable novel word senses: a network-centric approach

Published: 08 April 2019 Publication History

Abstract

In this era of Big Data, due to expeditious exchange of information on the web, words are being used to denote newer meanings, causing linguistic shift. With the recent availability of large amounts of digitized texts, an automated analysis of the evolution of language has become possible. Our study mainly focuses on improving the detection of new word senses. This paper presents a unique proposal based on network features to improve the precision of new word sense detection. For a candidate word where a new sense (birth) has been detected by comparing the sense clusters induced at two different time points, we further compare the network properties of the subgraphs induced from novel sense cluster across these two time points. Using the mean fractional change in edge density, structural similarity and average path length as features in an SVM classifier, manual evaluation gives precision values of 0.86 and 0.74 for the task of new sense detection, when tested on 2 distinct time-point pairs, in comparison to the precision values in the range of 0.23-0.32, when the proposed scheme is not used. The outlined method can therefore be used as a new post-hoc step to improve the precision of novel word sense detection in a robust and reliable way where the underlying framework uses a graph structure. Another important observation is that even though our proposal is a post-hoc step, it can be used in isolation and that itself results in a very decent performance achieving a precision of 0.54-0.62. Finally, we show that our method is able to detect the well-known historical shifts in 80% cases.

References

[1]
David Bamman and Gregory Crane. 2011. Measuring historical word sense variation. In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries. 1--10.
[2]
Chris Biemann and Antal van den Bosch. 2011. Structure discovery in natural language. Springer Science & Business Media.
[3]
Paul Cook, Jey Han Lau, Diana McCarthy, and Timothy Baldwin. 2014. Novel Word-sense Identification. In COLING. 1624--1635.
[4]
Beate Dorow, Dominic Widdows, Katarina Ling, Jean-Pierre Eckmann, Danilo Sergi, and Elisha Moses. 2004. Using curvature and markov clustering in graphs for lexical acquisition and word sense discrimination. arXiv preprint condmat/0403693 (2004).
[5]
Steffen Eger and Alexander Mehler. 2016. On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models. In ACL. 52--58.
[6]
Katrin Erk. 2006. Unknown word sense detection as outlier detection. In Proceedings of the main conference on Human Language Technology Conference of the NAACL. 128--135.
[7]
Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.
[8]
Lea Frermann and Mirella Lapata. 2016. A Bayesian Model of Diachronic Meaning Change. Transactions of the Association for Computational Linguistics 4 (2016), 31--45.
[9]
Yoav Goldberg and Jon Orwant. 2013. A dataset of syntactic-ngrams over time from a very large corpus of english books. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Vol. 1. 241--247.
[10]
Kristina Gulordava and Marco Baroni. 2011. A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics. 67--71.
[11]
William L. Hamilton, Jure Leskovec, and Dan Jurafsky. 2016. Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change. In Proceedings of the 2016 Conference on EMNLP. 2116--2121.
[12]
William L. Hamilton, Jure Leskovec, and Dan Jurafsky. 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In ACL. Berlin, Germany, 1489--1501.
[13]
Abhik Jana and Pawan Goyal. 2018. Network Features Based Co-hyponymy Detection. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan.
[14]
Adam Jatowt and Kevin Duh. 2014. A framework for analyzing semantic change of words across time. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. 229--238.
[15]
Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. 2015. Statistically significant detection of linguistic change. In Proceedings of the 24th International Conference on World Wide Web. 625--635.
[16]
Jey Han Lau, Paul Cook, Diana McCarthy, Spandana Gella, and Timothy Baldwin. 2014. Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models. In Proceedings of ACL. 259--270.
[17]
Jey Han Lau, Paul Cook, Diana McCarthy, David Newman, and Timothy Baldwin. 2012. Word sense induction for novel sense detection. In Proceedings of EACL. 591--601.
[18]
Rada Mihalcea and Vivi Nastase. 2012. Word epoch disambiguation: Finding how words change over time. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. 259--263.
[19]
Sunny Mitra, Ritwik Mitra, Martin Riedl, Chris Biemann, Animesh Mukherjee, and Pawan Goyal. 2014. That's sick dude!: Automatic identification of word sense change across different timescales. In ACL. 1020--1029.
[20]
Thiago Alexandre Salgueiro Pardo, Lucas Antiqueira, Maria das Graças Volpe Nunes, Osvaldo N Oliveira Jr, and Luciano da Fontoura Costa. 2006. Using complex networks for language processing: The case of summary evaluation. In Communications, Circuits and Systems Proceedings, 2006 International Conference on, Vol. 4. 2678--2682.
[21]
Christian Ramiro, Mahesh Srinivasan, Barbara C Malt, and Yang Xu. 2018. Algorithms in the historical emergence of word senses. Proceedings of the National Academy of Sciences 115, 10 (2018), 2323--2328.
[22]
Martin Riedl and Chris Biemann. 2013. Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri. In EMNLP. 884--890.
[23]
Maja Rudolph and David Blei. 2018. Dynamic Embeddings for Language Evolution. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. 1003--1011.
[24]
Terrence Szymanski. 2017. Temporal word analogies: Identifying lexical replacement with diachronic word embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. 448--453.
[25]
Nina Tahmasebi, Thomas Risse, and Stefan Dietze. 2011. Towards automatic language evolution tracking, a study on word sense tracking. In Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn2011), co-located with ISWC 2011.
[26]
Yee Whye Teh, Michael I Jordan, Matthew J Beal, and David M Blei. 2006. Hierarchical dirichlet processes. Journal of the american statistical association 101, 476 (2006).
[27]
Ivana Turnu, Michele Marchesi, and Roberto Tonelli. 2012. Entropy of the degree distribution and object-oriented software quality. In Proceedings of the 3rd International Workshop on Emerging Trends in Software Metrics. 77--82.
[28]
Anthony J Viera, Joanne M Garrett, et al. 2005. Understanding interobserver agreement: the kappa statistic. Fam Med 37, 5 (2005), 360--363.
[29]
Stanley Wasserman and Katherine Faust. 1994. Social network analysis: Methods and applications. Vol. 8. Cambridge university press.
[30]
Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, and Hui Xiong. 2018. Dynamic word embeddings for evolving semantic discovery. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 673--681.

Cited By

View all
  • (2023)Semantic micro-dynamics as a reflex of occurrence frequency: a semantic networks approachCognitive Linguistics10.1515/cog-2022-000834:3-4(533-568)Online publication date: 19-Oct-2023
  • (2019)Mapping Lexical Knowledge to Distributed Models for Ontology Concept InventionAI*IA 2019 – Advances in Artificial Intelligence10.1007/978-3-030-35166-3_40(572-587)Online publication date: 12-Nov-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
April 2019
2682 pages
ISBN:9781450359337
DOI:10.1145/3297280
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. complex network measures
  2. distributional thesaurus network
  3. novel sense detection

Qualifiers

  • Research-article

Conference

SAC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Semantic micro-dynamics as a reflex of occurrence frequency: a semantic networks approachCognitive Linguistics10.1515/cog-2022-000834:3-4(533-568)Online publication date: 19-Oct-2023
  • (2019)Mapping Lexical Knowledge to Distributed Models for Ontology Concept InventionAI*IA 2019 – Advances in Artificial Intelligence10.1007/978-3-030-35166-3_40(572-587)Online publication date: 12-Nov-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media