Semantic word shifts in a scientific domain

Chen, Baitong; Ding, Ying; Ma, Feicheng

doi:10.1007/s11192-018-2843-2

Semantic word shifts in a scientific domain

Published: 13 July 2018

Volume 117, pages 211–226, (2018)
Cite this article

Scientometrics Aims and scope Submit manuscript

1115 Accesses
9 Citations
Explore all metrics

Abstract

Understanding semantic word shifts in scientific domains is essential for facilitating interdisciplinary communication. Using a data set of published papers in the field of information retrieval (IR), this paper studies the semantic shifts of words in IR based on mining per-word topic distribution over time. We propose that semantic word shifts not only occur over time, but also over topics. The shifts are examined from two perspectives, the topic-level and the context-level. According to the over-time word-topic distribution, stable words and unstable words are recognized. The diverging and converging trends in the unstable type reveal characteristics of the topic evolution process. The context-level shifts are further detected by similarities between word vectors. Our work associates semantic word shifts with the evolving of topics, which facilitates a better understanding of semantic word shifts from both topics and contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The use of citation context to detect the evolution of research topics: a large-scale analysis

Article 05 February 2021

Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec

Article 05 February 2022

Dynamics of topic formation and quantitative analysis of hot trends in physical science

Article 13 July 2020

References

Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
Article Google Scholar
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Article Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
MATH Google Scholar
Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Proceedings of GSCL (pp. 31–40).
Chen, B., Ding, Y., & Ma, F. (2017a). Mapping the semantic word shifts in topics in the field of information retrieval. In Proceedings of ISSI 2017—The 16th international conference on scientometrics and informetrics (pp. 1335–1341). Wuhan University, China.
Chen, B., Tsutsui, S., Ding, Y., & Ma, F. (2017b). Understanding the topic evolution in a scientific domain: an exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.
Article Google Scholar
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning (pp. 160–167). New York, NY, USA: ACM.
Ding, Y., & Stirling, K. (2016). Data-driven discovery: a new era of exploiting the literature and data. Journal of Data and Information Science, 1(4), 1–9.
Article Google Scholar
Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Advances in Neural Information Processing Systems (pp. 11–18). Cambridge, MA, USA: MIT Press.
Gulordava, K., & Baroni, M. (2011). A distributional similarity approach to the detection of semantic change in the Google Books Ngram Corpus. In Proceedings of the GEMS 2011 workshop on geometrical models of natural language semantics (pp. 67–71). Stroudsburg, PA, USA: Association for Computational Linguistics.
Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. arXiv:1605.09096 [Cs].
Harris, Z. S. (1954). Distributional structure. Word, 10, 146–162.
Article Google Scholar
Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864). Cambridge, MA, USA: MIT Press.
Kenter, T., Wevers, M., Huijnen, P., & de Rijke, M. (2015). Ad hoc monitoring of vocabulary shifts over time. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1191–1200). New York, NY, USA: ACM.
Kim, Y., Chiu, Y.-I., Hanaki, K., Hegde, D., & Petrov, S. (2014). Temporal analysis of language through neural language models. arXiv:1405.3515 [Cs].
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.
Article Google Scholar
Lehmann, W. P. (1993). Historical linguistics: An introduction (3rd edition). London; New York: Routledge.
Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781 [Cs].
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems 26 (pp. 3111–3119). New York: Curran Associates Inc.
Google Scholar
Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the Lrec 2010 workshop on new challenges for Nlp Frameworks (pp. 45–50).
Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web (pp. 287–297). Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.
Wang, S., Schlobach, S., & Klein, M. (2011). Concept drift and how to identify it. Web Semantics: Science, Services and Agents on the World Wide Web, 9(3), 247–265.
Article Google Scholar
Wijaya, D. T., & Yeniterzi, R. (2011). Understanding semantic change of words over centuries. In Proceedings of the 2011 international workshop on detecting and exploiting cultural diversity on the social web (pp. 35–40). New York, NY, USA: ACM.
Xu, J., Ding, Y., & Malic, V. (2015). Author credit for transdisciplinary collaboration. PLoS ONE, 10(9), e0137968.
Article Google Scholar
Yan, E., Ding, Y., Milojević, S., & Sugimoto, C. R. (2012). Topics in dynamic research communities: an exploratory study for the field of information retrieval. Journal of Informetrics, 6(1), 140–153.
Article Google Scholar

Download references

Acknowledgements

This work is funded by the National Natural Science Foundation of China (Grant Nos. 71420107026 and 71704138). The present study is an extended version of an article presented at the 16th International Conference on Scientometrics and Informetrics, Wuhan (China), 16–20 October 2017 (Chen et al. 2017a).

Author information

Authors and Affiliations

Shanghai University, Shanghai, China
Baitong Chen
Indiana University, Bloomington, USA
Ying Ding
Wuhan University, Wuhan, China
Ying Ding & Feicheng Ma
Tianjin Normal University, Tianjin, China
Ying Ding

Authors

Baitong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ying Ding
View author publications
You can also search for this author in PubMed Google Scholar
Feicheng Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baitong Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, B., Ding, Y. & Ma, F. Semantic word shifts in a scientific domain. Scientometrics 117, 211–226 (2018). https://doi.org/10.1007/s11192-018-2843-2

Download citation

Received: 14 January 2018
Published: 13 July 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11192-018-2843-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic word shifts in a scientific domain

Abstract

Access this article

Similar content being viewed by others

The use of citation context to detect the evolution of research topics: a large-scale analysis

Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec

Dynamics of topic formation and quantitative analysis of hot trends in physical science

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic word shifts in a scientific domain

Abstract

Access this article

Similar content being viewed by others

The use of citation context to detect the evolution of research topics: a large-scale analysis

Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec

Dynamics of topic formation and quantitative analysis of hot trends in physical science

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation