Continual Vision-Language Retrieval via Dynamic Knowledge Rectification

Authors

  • Zhenyu Cui Peking University
  • Yuxin Peng Peking University
  • Xun Wang ByteDance Inc
  • Manyu Zhu ByteDance Inc
  • Jiahuan Zhou Peking University

DOI:

https://doi.org/10.1609/aaai.v38i10.29054

Keywords:

ML: Life-Long and Continual Learning, CV: Language and Vision

Abstract

The recent large-scale pre-trained models like CLIP have aroused great concern in vision-language tasks. However, when required to match image-text data collected in a streaming manner, namely Continual Vision-Language Retrieval (CVRL), their performances are still limited due to the catastrophic forgetting of the learned old knowledge. To handle this issue, advanced methods are proposed to distill the affinity knowledge between images and texts from the old model to the new one for anti-forgetting. Unfortunately, existing approaches neglect the impact of incorrect affinity, which prevents the balance between the anti-forgetting of old knowledge and the acquisition of new knowledge. Therefore, we propose a novel framework called Dynamic Knowledge Rectification (DKR) that simultaneously achieves incorrect knowledge filtering and rectification. Specifically, we first filter the incorrect affinity knowledge calculated by the old model on the new data. Then, a knowledge rectification method is designed to rectify the incorrect affinities while preserving the correct ones. In particular, for the new data that can only be correctly retrieved by the new model, we rectify them with the corresponding new affinity to protect them from negative transfer. Additionally, for those that can not be retrieved by either the old or the new model, we introduce paired ground-truth labels to promote the acquisition of both old and new knowledge. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our DKR and its superiority against state-of-the-art methods.

Published

2024-03-24

How to Cite

Cui, Z., Peng, Y., Wang, X., Zhu, M., & Zhou, J. (2024). Continual Vision-Language Retrieval via Dynamic Knowledge Rectification. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 11704-11712. https://doi.org/10.1609/aaai.v38i10.29054

Issue

Section

AAAI Technical Track on Machine Learning I