Skip to main content

Learning Dynamic Representations in Large Language Models for Evolving Data Streams

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15305))

Included in the following conference series:

  • 253 Accesses

Abstract

In the world of Large Language Modeling, incremental learning plays an important role in evolving data such as streaming text. We introduce an incremental learning approach for dynamic contextualized word embeddings in the setting of streaming data. We call the embeddings generated by our model as Incremental Dynamic Contextualized Word Embeddings (iDCWE). Our model introduces the incremental BERT (iBERT) (BERT stands for Bidirectional Encoder Representations from Transformers) to create a dynamic and incremental model to perform incremental training. Our model further captures the semantic drift of words using dynamic graphs. Our paper is the first in the line of research on (incremental) dynamic modeling of streaming text which we also refer to as Neural Dynamic Language Modeling. The performance of our model on the benchmark datasets is on par and even often out-performs the dynamic contextualized word embeddings which was the first paper to combine contextualization with dynamic word embeddings, to the best of our knowledge. Moreover, the compute time efficiency of our model is better than that of the aforementioned paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Our implementation is available at the Github link, https://github.com/srivastavaashish/ndlmsd.

  2. 2.

    https://www.kaggle.com/Cornell-University/arxiv.

  3. 3.

    https://www.cse.msu.edu/~tangjili/trust.

  4. 4.

    https://www.kaggle.com/datasets/kaggle/reddit-comments-may-2015.

  5. 5.

    https://www.yelp.com/dataset.

  6. 6.

    More details about pre-processing are available at the Anonymous Github link, https://anonymous.4open.science/r/ndlmsd-C1FB.

References

  1. Manning, C., Raghavan, P., SchĆ¼tze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100ā€“103 (2010)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  3. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26 (2013)

    Google Scholar 

  4. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  5. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532ā€“1543 (2014)

    Google Scholar 

  6. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135ā€“146 (2017)

    Article  Google Scholar 

  7. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  8. Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  9. Rudolph, M., Blei, D.: Dynamic embeddings for language evolution. In: Proceedings of the 2018 World Wide Web Conference, pp. 1003ā€“1011 (2018)

    Google Scholar 

  10. Bamler, R., Mandt, S.: Dynamic word embeddings. In: International Conference on Machine Learning, pp. 380ā€“389. PMLR (2017)

    Google Scholar 

  11. Hofmann, V., Pierrehumbert, J.B., SchĆ¼tze, H.: Dynamic contextualized word embeddings. arXiv preprint arXiv:2010.12684 (2020)

  12. Amba Hombaiah, S., Chen, T., Zhang, M., Bendersky, M., Najork, M.: Dynamic language models for continuously evolving content. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2514ā€“2524 (2021)

    Google Scholar 

  13. Rosin, G.D., Guy, I., Radinsky, K.: Time masking for temporal language models. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 833ā€“841 (2022)

    Google Scholar 

  14. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  15. Zeng, Z., Liu, X., Song, Y.: Biased random walk based social regularization for word embeddings. In: IJCAI, pp. 4560ā€“4566 (2018)

    Google Scholar 

  16. Zeng, Z., Yin, Y., Song, Y., Zhang, M.: Socialized word embeddings. In: IJCAI, pp. 3915ā€“3921 (2017)

    Google Scholar 

  17. McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  18. Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 2227ā€“2237. Association for Computational Linguistics (2018)

    Google Scholar 

  19. Clark, K., Luong, M.-T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)

  20. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)

  21. Peters, M.E., Neumann, M., Zettlemoyer, L., Yih, W.: Dissecting contextual word embeddings: architecture and representation. arXiv preprint arXiv:1808.08949 (2018)

  22. Lin, Y., Tan, Y.C., Frank, R.: Open Sesame: getting inside BERTā€™s linguistic knowledge. arXiv preprint arXiv:1906.01698 (2019)

  23. Liu, N.F., Gardner, M., Belinkov, Y., Peters, M.E., Smith, N.A.: Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855 (2019)

  24. Hofmann, V., Pierrehumbert, J.B., SchĆ¼tze, H.: DagoBERT: generating derivational morphology with a pretrained language model. arXiv preprint arXiv:2005.00672 (2020)

  25. Ethayarajh, K.: How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512 (2019)

  26. Mickus, T., Paperno, D., Constant, M., Van Deemter, K.: What do you mean, BERT? Assessing BERT as a Distributional Semantics Model. arXiv preprint arXiv:1911.05758 (2019)

  27. Reif, E., et al.: Visualizing and measuring the geometry of BERT. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  28. Rosenfeld, A., Erk, K.: Deep neural models of semantic shift. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 474ā€“484 (2018)

    Google Scholar 

  29. Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 673ā€“681 (2018)

    Google Scholar 

  30. Gong, H., Bhat, S., Viswanath, P.: Enriching word embeddings with temporal and spatial information. arXiv preprint arXiv:2010.00761 (2020)

  31. Welch, C., Kummerfeld, J.K., PĆ©rez-Rosas, V., Mihalcea, R.: Compositional demographic word embeddings. arXiv preprint arXiv:2010.02986 (2020)

  32. Welch, C., Kummerfeld, J.K., PĆ©rez-Rosas, V., Mihalcea, R.: Exploring the value of personalized word embeddings. arXiv preprint arXiv:2011.06057 (2020)

  33. Yao, J., Dou, Z., Wen, J.-R.: Employing personal word embeddings for personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1359ā€“1368 (2020)

    Google Scholar 

  34. Jawahar, G., Seddah, D.: Contextualized diachronic word representations. In: 1st International Workshop on Computational Approaches to Historical Language Change 2019 (colocated with ACL 2019) (2019)

    Google Scholar 

  35. Lukes, J., SĆøgaard, A.: Sentiment analysis under temporal shift. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 65ā€“71 (2018)

    Google Scholar 

  36. Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Abusive language detection with graph convolutional networks. arXiv preprint arXiv:1904.04073 (2019)

  37. Li, C., Goldwasser, D.: Encoding social information with graph convolutional networks for political perspective detection in news media. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2594ā€“2604 (2019)

    Google Scholar 

  38. Del Tredici, M., Marcheggiani, D., Walde, S.S.I., FernƔndez, R.: You shall know a user by the company it keeps: dynamic representations for social media users in NLP. arXiv preprint arXiv:1909.00412 (2019)

  39. Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Author profiling for abuse detection. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1088ā€“1098 (2018)

    Google Scholar 

  40. Hazarika, D., Poria, S., Gorantla, S., Cambria, E., Zimmermann, R., Mihalcea, R.: Cascade: contextual sarcasm detection in online discussion forums. arXiv preprint arXiv:1805.06413 (2018)

  41. Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., Tahmasebi, N.: SemEval-2020 task 1: unsupervised lexical semantic change detection. arXiv preprint arXiv:2007.11464 (2020)

  42. Kutuzov, A., Ƙvrelid, L., Szymanski, T., Velldal, E.: Diachronic word embeddings and semantic shifts: a survey. arXiv preprint arXiv:1806.03537 (2018)

  43. Dubossarsky, H., Hengchen, S., Tahmasebi, N., Schlechtweg, D.: Time-out: temporal referencing for robust modeling of lexical semantic change. arXiv preprint arXiv:1906.01688 (2019)

  44. Kulkarni, V., Al-Rfou, R., Perozzi, B., Skiena, S.: Statistically significant detection of linguistic change. In: Proceedings of the 24th International Conference on World Wide Web, pp. 625ā€“635 (2015)

    Google Scholar 

  45. Kim, Y., Chiu, Y.I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. arXiv preprint arXiv:1405.3515 (2014)

  46. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096 (2016)

  47. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  48. Pinter, Y., Guthrie, R., Eisenstein, J.: Mimicking word embeddings using subword RNNs. arXiv preprint arXiv:1707.06961 (2017)

  49. Wu, Y., et al.: Googleā€™s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  50. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855ā€“864 (2016)

    Google Scholar 

  51. Veličkovič, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  52. Vashishth, S., Yadav, P., Bhandari, M., Rai, P., Bhattacharyya, C., Talukdar, P.: Graph Convolutional Networks based Word Embeddings. CoRR arXiv eprint arXiv:1809.04283 (2018)

  53. Shalev-Shwartz, S.: Online learning and online convex optimization. Found. TrendsĀ® Mach. Learn. 4(2), 107ā€“194 (2012)

    Google Scholar 

  54. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  55. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4ā€“24 (2020)

    Article  MathSciNet  Google Scholar 

  56. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701ā€“710 (2014)

    Google Scholar 

  57. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  58. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  59. Kazemi, S.M., et al.: Representation learning for dynamic graphs: a survey. J. Mach. Learn. Res. 21(70), 1ā€“73 (2020)

    MathSciNet  Google Scholar 

  60. You, J., Du, T., Leskovec, J.: ROLAND: graph learning framework for dynamic graphs. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2358ā€“2366 (2022)

    Google Scholar 

Download references

Acknowledgement

This work was supported by a J.C. Bose Fellowship, the Walmart Centre for Tech Excellence, Indian Institute of Science, and the Robert Bosch Centre for Cyber Physical Systems at Indian Institute of Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashish Srivastava .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Srivastava, A., Bhatnagar, S., Narasimha Murty, M., Aravinda Raman, J. (2025). Learning Dynamic Representations in Large Language Models for Evolving Data Streams. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15305. Springer, Cham. https://doi.org/10.1007/978-3-031-78169-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78169-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78168-1

  • Online ISBN: 978-3-031-78169-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics