Skip to main content
Log in

Semantic word shifts in a scientific domain

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Understanding semantic word shifts in scientific domains is essential for facilitating interdisciplinary communication. Using a data set of published papers in the field of information retrieval (IR), this paper studies the semantic shifts of words in IR based on mining per-word topic distribution over time. We propose that semantic word shifts not only occur over time, but also over topics. The shifts are examined from two perspectives, the topic-level and the context-level. According to the over-time word-topic distribution, stable words and unstable words are recognized. The diverging and converging trends in the unstable type reveal characteristics of the topic evolution process. The context-level shifts are further detected by similarities between word vectors. Our work associates semantic word shifts with the evolving of topics, which facilitates a better understanding of semantic word shifts from both topics and contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.

    Article  Google Scholar 

  • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.

    Article  Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Proceedings of GSCL (pp. 31–40).

  • Chen, B., Ding, Y., & Ma, F. (2017a). Mapping the semantic word shifts in topics in the field of information retrieval. In Proceedings of ISSI 2017The 16th international conference on scientometrics and informetrics (pp. 1335–1341). Wuhan University, China.

  • Chen, B., Tsutsui, S., Ding, Y., & Ma, F. (2017b). Understanding the topic evolution in a scientific domain: an exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.

    Article  Google Scholar 

  • Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning (pp. 160–167). New York, NY, USA: ACM.

  • Ding, Y., & Stirling, K. (2016). Data-driven discovery: a new era of exploiting the literature and data. Journal of Data and Information Science, 1(4), 1–9.

    Article  Google Scholar 

  • Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Advances in Neural Information Processing Systems (pp. 11–18). Cambridge, MA, USA: MIT Press.

  • Gulordava, K., & Baroni, M. (2011). A distributional similarity approach to the detection of semantic change in the Google Books Ngram Corpus. In Proceedings of the GEMS 2011 workshop on geometrical models of natural language semantics (pp. 67–71). Stroudsburg, PA, USA: Association for Computational Linguistics.

  • Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. arXiv:1605.09096 [Cs].

  • Harris, Z. S. (1954). Distributional structure. Word, 10, 146–162.

    Article  Google Scholar 

  • Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864). Cambridge, MA, USA: MIT Press.

  • Kenter, T., Wevers, M., Huijnen, P., & de Rijke, M. (2015). Ad hoc monitoring of vocabulary shifts over time. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1191–1200). New York, NY, USA: ACM.

  • Kim, Y., Chiu, Y.-I., Hanaki, K., Hegde, D., & Petrov, S. (2014). Temporal analysis of language through neural language models. arXiv:1405.3515 [Cs].

  • Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.

    Article  Google Scholar 

  • Lehmann, W. P. (1993). Historical linguistics: An introduction (3rd edition). London; New York: Routledge.

    Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781 [Cs].

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems 26 (pp. 3111–3119). New York: Curran Associates Inc.

    Google Scholar 

  • Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the Lrec 2010 workshop on new challenges for Nlp Frameworks (pp. 45–50).

  • Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web (pp. 287–297). Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.

  • Wang, S., Schlobach, S., & Klein, M. (2011). Concept drift and how to identify it. Web Semantics: Science, Services and Agents on the World Wide Web, 9(3), 247–265.

    Article  Google Scholar 

  • Wijaya, D. T., & Yeniterzi, R. (2011). Understanding semantic change of words over centuries. In Proceedings of the 2011 international workshop on detecting and exploiting cultural diversity on the social web (pp. 35–40). New York, NY, USA: ACM.

  • Xu, J., Ding, Y., & Malic, V. (2015). Author credit for transdisciplinary collaboration. PLoS ONE, 10(9), e0137968.

    Article  Google Scholar 

  • Yan, E., Ding, Y., Milojević, S., & Sugimoto, C. R. (2012). Topics in dynamic research communities: an exploratory study for the field of information retrieval. Journal of Informetrics, 6(1), 140–153.

    Article  Google Scholar 

Download references

Acknowledgements

This work is funded by the National Natural Science Foundation of China (Grant Nos. 71420107026 and 71704138). The present study is an extended version of an article presented at the 16th International Conference on Scientometrics and Informetrics, Wuhan (China), 16–20 October 2017 (Chen et al. 2017a).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baitong Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, B., Ding, Y. & Ma, F. Semantic word shifts in a scientific domain. Scientometrics 117, 211–226 (2018). https://doi.org/10.1007/s11192-018-2843-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-018-2843-2

Keywords

Navigation