Continual Vision-Language Retrieval via Dynamic Knowledge Rectification

Zhenyu Cui; Yuxin Peng; Xun Wang; Manyu Zhu; Jiahuan Zhou

doi:10.1609/aaai.v38i10.29054

Authors

Zhenyu Cui Peking University
Yuxin Peng Peking University
Xun Wang ByteDance Inc
Manyu Zhu ByteDance Inc
Jiahuan Zhou Peking University

DOI:

https://doi.org/10.1609/aaai.v38i10.29054

Keywords:

ML: Life-Long and Continual Learning, CV: Language and Vision

Abstract

The recent large-scale pre-trained models like CLIP have aroused great concern in vision-language tasks. However, when required to match image-text data collected in a streaming manner, namely Continual Vision-Language Retrieval (CVRL), their performances are still limited due to the catastrophic forgetting of the learned old knowledge. To handle this issue, advanced methods are proposed to distill the affinity knowledge between images and texts from the old model to the new one for anti-forgetting. Unfortunately, existing approaches neglect the impact of incorrect affinity, which prevents the balance between the anti-forgetting of old knowledge and the acquisition of new knowledge. Therefore, we propose a novel framework called Dynamic Knowledge Rectification (DKR) that simultaneously achieves incorrect knowledge filtering and rectification. Specifically, we first filter the incorrect affinity knowledge calculated by the old model on the new data. Then, a knowledge rectification method is designed to rectify the incorrect affinities while preserving the correct ones. In particular, for the new data that can only be correctly retrieved by the new model, we rectify them with the corresponding new affinity to protect them from negative transfer. Additionally, for those that can not be retrieved by either the old or the new model, we introduce paired ground-truth labels to promote the acquisition of both old and new knowledge. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our DKR and its superiority against state-of-the-art methods.

Continual Vision-Language Retrieval via Dynamic Knowledge Rectification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription