Abstract
In the world of Large Language Modeling, incremental learning plays an important role in evolving data such as streaming text. We introduce an incremental learning approach for dynamic contextualized word embeddings in the setting of streaming data. We call the embeddings generated by our model as Incremental Dynamic Contextualized Word Embeddings (iDCWE). Our model introduces the incremental BERT (iBERT) (BERT stands for Bidirectional Encoder Representations from Transformers) to create a dynamic and incremental model to perform incremental training. Our model further captures the semantic drift of words using dynamic graphs. Our paper is the first in the line of research on (incremental) dynamic modeling of streaming text which we also refer to as Neural Dynamic Language Modeling. The performance of our model on the benchmark datasets is on par and even often out-performs the dynamic contextualized word embeddings which was the first paper to combine contextualization with dynamic word embeddings, to the best of our knowledge. Moreover, the compute time efficiency of our model is better than that of the aforementioned paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Our implementation is available at the Github link, https://github.com/srivastavaashish/ndlmsd.
- 2.
- 3.
- 4.
- 5.
- 6.
More details about pre-processing are available at the Anonymous Github link, https://anonymous.4open.science/r/ndlmsd-C1FB.
References
Manning, C., Raghavan, P., SchĆ¼tze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100ā103 (2010)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532ā1543 (2014)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135ā146 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Rudolph, M., Blei, D.: Dynamic embeddings for language evolution. In: Proceedings of the 2018 World Wide Web Conference, pp. 1003ā1011 (2018)
Bamler, R., Mandt, S.: Dynamic word embeddings. In: International Conference on Machine Learning, pp. 380ā389. PMLR (2017)
Hofmann, V., Pierrehumbert, J.B., SchĆ¼tze, H.: Dynamic contextualized word embeddings. arXiv preprint arXiv:2010.12684 (2020)
Amba Hombaiah, S., Chen, T., Zhang, M., Bendersky, M., Najork, M.: Dynamic language models for continuously evolving content. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2514ā2524 (2021)
Rosin, G.D., Guy, I., Radinsky, K.: Time masking for temporal language models. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 833ā841 (2022)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Zeng, Z., Liu, X., Song, Y.: Biased random walk based social regularization for word embeddings. In: IJCAI, pp. 4560ā4566 (2018)
Zeng, Z., Yin, Y., Song, Y., Zhang, M.: Socialized word embeddings. In: IJCAI, pp. 3915ā3921 (2017)
McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 2227ā2237. Association for Computational Linguistics (2018)
Clark, K., Luong, M.-T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Peters, M.E., Neumann, M., Zettlemoyer, L., Yih, W.: Dissecting contextual word embeddings: architecture and representation. arXiv preprint arXiv:1808.08949 (2018)
Lin, Y., Tan, Y.C., Frank, R.: Open Sesame: getting inside BERTās linguistic knowledge. arXiv preprint arXiv:1906.01698 (2019)
Liu, N.F., Gardner, M., Belinkov, Y., Peters, M.E., Smith, N.A.: Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855 (2019)
Hofmann, V., Pierrehumbert, J.B., SchĆ¼tze, H.: DagoBERT: generating derivational morphology with a pretrained language model. arXiv preprint arXiv:2005.00672 (2020)
Ethayarajh, K.: How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512 (2019)
Mickus, T., Paperno, D., Constant, M., Van Deemter, K.: What do you mean, BERT? Assessing BERT as a Distributional Semantics Model. arXiv preprint arXiv:1911.05758 (2019)
Reif, E., et al.: Visualizing and measuring the geometry of BERT. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Rosenfeld, A., Erk, K.: Deep neural models of semantic shift. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 474ā484 (2018)
Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 673ā681 (2018)
Gong, H., Bhat, S., Viswanath, P.: Enriching word embeddings with temporal and spatial information. arXiv preprint arXiv:2010.00761 (2020)
Welch, C., Kummerfeld, J.K., PĆ©rez-Rosas, V., Mihalcea, R.: Compositional demographic word embeddings. arXiv preprint arXiv:2010.02986 (2020)
Welch, C., Kummerfeld, J.K., PĆ©rez-Rosas, V., Mihalcea, R.: Exploring the value of personalized word embeddings. arXiv preprint arXiv:2011.06057 (2020)
Yao, J., Dou, Z., Wen, J.-R.: Employing personal word embeddings for personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1359ā1368 (2020)
Jawahar, G., Seddah, D.: Contextualized diachronic word representations. In: 1st International Workshop on Computational Approaches to Historical Language Change 2019 (colocated with ACL 2019) (2019)
Lukes, J., SĆøgaard, A.: Sentiment analysis under temporal shift. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 65ā71 (2018)
Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Abusive language detection with graph convolutional networks. arXiv preprint arXiv:1904.04073 (2019)
Li, C., Goldwasser, D.: Encoding social information with graph convolutional networks for political perspective detection in news media. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2594ā2604 (2019)
Del Tredici, M., Marcheggiani, D., Walde, S.S.I., FernƔndez, R.: You shall know a user by the company it keeps: dynamic representations for social media users in NLP. arXiv preprint arXiv:1909.00412 (2019)
Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Author profiling for abuse detection. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1088ā1098 (2018)
Hazarika, D., Poria, S., Gorantla, S., Cambria, E., Zimmermann, R., Mihalcea, R.: Cascade: contextual sarcasm detection in online discussion forums. arXiv preprint arXiv:1805.06413 (2018)
Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., Tahmasebi, N.: SemEval-2020 task 1: unsupervised lexical semantic change detection. arXiv preprint arXiv:2007.11464 (2020)
Kutuzov, A., Ćvrelid, L., Szymanski, T., Velldal, E.: Diachronic word embeddings and semantic shifts: a survey. arXiv preprint arXiv:1806.03537 (2018)
Dubossarsky, H., Hengchen, S., Tahmasebi, N., Schlechtweg, D.: Time-out: temporal referencing for robust modeling of lexical semantic change. arXiv preprint arXiv:1906.01688 (2019)
Kulkarni, V., Al-Rfou, R., Perozzi, B., Skiena, S.: Statistically significant detection of linguistic change. In: Proceedings of the 24th International Conference on World Wide Web, pp. 625ā635 (2015)
Kim, Y., Chiu, Y.I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. arXiv preprint arXiv:1405.3515 (2014)
Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096 (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Pinter, Y., Guthrie, R., Eisenstein, J.: Mimicking word embeddings using subword RNNs. arXiv preprint arXiv:1707.06961 (2017)
Wu, Y., et al.: Googleās neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855ā864 (2016)
VeliÄkoviÄ, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Vashishth, S., Yadav, P., Bhandari, M., Rai, P., Bhattacharyya, C., Talukdar, P.: Graph Convolutional Networks based Word Embeddings. CoRR arXiv eprint arXiv:1809.04283 (2018)
Shalev-Shwartz, S.: Online learning and online convex optimization. Found. TrendsĀ® Mach. Learn. 4(2), 107ā194 (2012)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4ā24 (2020)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701ā710 (2014)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Kazemi, S.M., et al.: Representation learning for dynamic graphs: a survey. J. Mach. Learn. Res. 21(70), 1ā73 (2020)
You, J., Du, T., Leskovec, J.: ROLAND: graph learning framework for dynamic graphs. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2358ā2366 (2022)
Acknowledgement
This work was supported by a J.C. Bose Fellowship, the Walmart Centre for Tech Excellence, Indian Institute of Science, and the Robert Bosch Centre for Cyber Physical Systems at Indian Institute of Science.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Srivastava, A., Bhatnagar, S., Narasimha Murty, M., Aravinda Raman, J. (2025). Learning Dynamic Representations in Large Language Models for Evolving Data Streams. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15305. Springer, Cham. https://doi.org/10.1007/978-3-031-78169-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-78169-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78168-1
Online ISBN: 978-3-031-78169-8
eBook Packages: Computer ScienceComputer Science (R0)