Learning Dynamic Representations in Large Language Models for Evolving Data Streams

Srivastava, Ashish; Bhatnagar, Shalabh; Narasimha Murty, M.; Aravinda Raman, J.

doi:10.1007/978-3-031-78169-8_16

Ashish Srivastava^13,14,
Shalabh Bhatnagar¹³,
M. Narasimha Murty¹³ &
…
J. Aravinda Raman¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15305))

Included in the following conference series:

International Conference on Pattern Recognition

253 Accesses

Abstract

In the world of Large Language Modeling, incremental learning plays an important role in evolving data such as streaming text. We introduce an incremental learning approach for dynamic contextualized word embeddings in the setting of streaming data. We call the embeddings generated by our model as Incremental Dynamic Contextualized Word Embeddings (iDCWE). Our model introduces the incremental BERT (iBERT) (BERT stands for Bidirectional Encoder Representations from Transformers) to create a dynamic and incremental model to perform incremental training. Our model further captures the semantic drift of words using dynamic graphs. Our paper is the first in the line of research on (incremental) dynamic modeling of streaming text which we also refer to as Neural Dynamic Language Modeling. The performance of our model on the benchmark datasets is on par and even often out-performs the dynamic contextualized word embeddings which was the first paper to combine contextualization with dynamic word embeddings, to the best of our knowledge. Moreover, the compute time efficiency of our model is better than that of the aforementioned paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dynamic Neural Language Models

GeSe: Generalized static embedding

Article 11 January 2022

ANTM: Aligned Neural Topic Models for Exploring Evolving Topics

Notes

1.
Our implementation is available at the Github link, https://github.com/srivastavaashish/ndlmsd.
2.
https://www.kaggle.com/Cornell-University/arxiv.
3.
https://www.cse.msu.edu/~tangjili/trust.
4.
https://www.kaggle.com/datasets/kaggle/reddit-comments-may-2015.
5.
https://www.yelp.com/dataset.
6.
More details about pre-processing are available at the Anonymous Github link, https://anonymous.4open.science/r/ndlmsd-C1FB.

References

Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100–103 (2010)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Rudolph, M., Blei, D.: Dynamic embeddings for language evolution. In: Proceedings of the 2018 World Wide Web Conference, pp. 1003–1011 (2018)
Google Scholar
Bamler, R., Mandt, S.: Dynamic word embeddings. In: International Conference on Machine Learning, pp. 380–389. PMLR (2017)
Google Scholar
Hofmann, V., Pierrehumbert, J.B., Schütze, H.: Dynamic contextualized word embeddings. arXiv preprint arXiv:2010.12684 (2020)
Amba Hombaiah, S., Chen, T., Zhang, M., Bendersky, M., Najork, M.: Dynamic language models for continuously evolving content. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2514–2524 (2021)
Google Scholar
Rosin, G.D., Guy, I., Radinsky, K.: Time masking for temporal language models. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 833–841 (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Zeng, Z., Liu, X., Song, Y.: Biased random walk based social regularization for word embeddings. In: IJCAI, pp. 4560–4566 (2018)
Google Scholar
Zeng, Z., Yin, Y., Song, Y., Zhang, M.: Socialized word embeddings. In: IJCAI, pp. 3915–3921 (2017)
Google Scholar
McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp. 2227–2237. Association for Computational Linguistics (2018)
Google Scholar
Clark, K., Luong, M.-T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Peters, M.E., Neumann, M., Zettlemoyer, L., Yih, W.: Dissecting contextual word embeddings: architecture and representation. arXiv preprint arXiv:1808.08949 (2018)
Lin, Y., Tan, Y.C., Frank, R.: Open Sesame: getting inside BERT’s linguistic knowledge. arXiv preprint arXiv:1906.01698 (2019)
Liu, N.F., Gardner, M., Belinkov, Y., Peters, M.E., Smith, N.A.: Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855 (2019)
Hofmann, V., Pierrehumbert, J.B., Schütze, H.: DagoBERT: generating derivational morphology with a pretrained language model. arXiv preprint arXiv:2005.00672 (2020)
Ethayarajh, K.: How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512 (2019)
Mickus, T., Paperno, D., Constant, M., Van Deemter, K.: What do you mean, BERT? Assessing BERT as a Distributional Semantics Model. arXiv preprint arXiv:1911.05758 (2019)
Reif, E., et al.: Visualizing and measuring the geometry of BERT. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Rosenfeld, A., Erk, K.: Deep neural models of semantic shift. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 474–484 (2018)
Google Scholar
Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 673–681 (2018)
Google Scholar
Gong, H., Bhat, S., Viswanath, P.: Enriching word embeddings with temporal and spatial information. arXiv preprint arXiv:2010.00761 (2020)
Welch, C., Kummerfeld, J.K., Pérez-Rosas, V., Mihalcea, R.: Compositional demographic word embeddings. arXiv preprint arXiv:2010.02986 (2020)
Welch, C., Kummerfeld, J.K., Pérez-Rosas, V., Mihalcea, R.: Exploring the value of personalized word embeddings. arXiv preprint arXiv:2011.06057 (2020)
Yao, J., Dou, Z., Wen, J.-R.: Employing personal word embeddings for personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1359–1368 (2020)
Google Scholar
Jawahar, G., Seddah, D.: Contextualized diachronic word representations. In: 1st International Workshop on Computational Approaches to Historical Language Change 2019 (colocated with ACL 2019) (2019)
Google Scholar
Lukes, J., Søgaard, A.: Sentiment analysis under temporal shift. In: Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 65–71 (2018)
Google Scholar
Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Abusive language detection with graph convolutional networks. arXiv preprint arXiv:1904.04073 (2019)
Li, C., Goldwasser, D.: Encoding social information with graph convolutional networks for political perspective detection in news media. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2594–2604 (2019)
Google Scholar
Del Tredici, M., Marcheggiani, D., Walde, S.S.I., Fernández, R.: You shall know a user by the company it keeps: dynamic representations for social media users in NLP. arXiv preprint arXiv:1909.00412 (2019)
Mishra, P., Del Tredici, M., Yannakoudakis, H., Shutova, E.: Author profiling for abuse detection. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1088–1098 (2018)
Google Scholar
Hazarika, D., Poria, S., Gorantla, S., Cambria, E., Zimmermann, R., Mihalcea, R.: Cascade: contextual sarcasm detection in online discussion forums. arXiv preprint arXiv:1805.06413 (2018)
Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., Tahmasebi, N.: SemEval-2020 task 1: unsupervised lexical semantic change detection. arXiv preprint arXiv:2007.11464 (2020)
Kutuzov, A., Øvrelid, L., Szymanski, T., Velldal, E.: Diachronic word embeddings and semantic shifts: a survey. arXiv preprint arXiv:1806.03537 (2018)
Dubossarsky, H., Hengchen, S., Tahmasebi, N., Schlechtweg, D.: Time-out: temporal referencing for robust modeling of lexical semantic change. arXiv preprint arXiv:1906.01688 (2019)
Kulkarni, V., Al-Rfou, R., Perozzi, B., Skiena, S.: Statistically significant detection of linguistic change. In: Proceedings of the 24th International Conference on World Wide Web, pp. 625–635 (2015)
Google Scholar
Kim, Y., Chiu, Y.I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. arXiv preprint arXiv:1405.3515 (2014)
Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096 (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Pinter, Y., Guthrie, R., Eisenstein, J.: Mimicking word embeddings using subword RNNs. arXiv preprint arXiv:1707.06961 (2017)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Google Scholar
Veličkovič, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Vashishth, S., Yadav, P., Bhandari, M., Rai, P., Bhattacharyya, C., Talukdar, P.: Graph Convolutional Networks based Word Embeddings. CoRR arXiv eprint arXiv:1809.04283 (2018)
Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trends® Mach. Learn. 4(2), 107–194 (2012)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Article MathSciNet Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Kazemi, S.M., et al.: Representation learning for dynamic graphs: a survey. J. Mach. Learn. Res. 21(70), 1–73 (2020)
MathSciNet Google Scholar
You, J., Du, T., Leskovec, J.: ROLAND: graph learning framework for dynamic graphs. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2358–2366 (2022)
Google Scholar

Download references

Acknowledgement

This work was supported by a J.C. Bose Fellowship, the Walmart Centre for Tech Excellence, Indian Institute of Science, and the Robert Bosch Centre for Cyber Physical Systems at Indian Institute of Science.

Author information

Authors and Affiliations

Indian Institute of Science, Bangalore, India
Ashish Srivastava, Shalabh Bhatnagar & M. Narasimha Murty
Centre for AI & Robotics, Bangalore, India
Ashish Srivastava
North Carolina State University, Raleigh, NC, USA
J. Aravinda Raman

Authors

Ashish Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Shalabh Bhatnagar
View author publications
You can also search for this author in PubMed Google Scholar
M. Narasimha Murty
View author publications
You can also search for this author in PubMed Google Scholar
J. Aravinda Raman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashish Srivastava .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Srivastava, A., Bhatnagar, S., Narasimha Murty, M., Aravinda Raman, J. (2025). Learning Dynamic Representations in Large Language Models for Evolving Data Streams. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15305. Springer, Cham. https://doi.org/10.1007/978-3-031-78169-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-78169-8_16
Published: 30 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78168-1
Online ISBN: 978-3-031-78169-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Learning Dynamic Representations in Large Language Models for Evolving Data Streams

Abstract

Access this chapter

Subscribe and save

Buy Now