Abstract:
Traditional fusion methods based on deep learning mainly employ convolutional or self-attention operations to model local or global dependencies, which often lead to the ...Show MoreMetadata
Abstract:
Traditional fusion methods based on deep learning mainly employ convolutional or self-attention operations to model local or global dependencies, which often lead to the oversight of frequency-domain information. To address this deficiency, we introduce a unified frequency adversarial learning network, termed FreqGAN. Our method involves a frequency-compensated generator that employs discrete wavelet transformation to decompose encoded spatial features into multiple frequency bands. Leveraging skip connections, low and high-frequency components are respectively directed into the encoder and decoder, compensating for additional outline and detail. Moreover, we construct a hybrid frequency aggregation module, which enables a progressive optimization of activity levels across multiple scales and makes the various frequency bands correlated. Complementing our generative model, we devise dual frequency-constrained discriminators. These discriminators are tasked with dynamically adjusting weights for each input frequency band, thereby obligating the generator to accurately reconstruct salient frequency information from different modality images. Additionally, a frequency-supervised function is formulated to further safeguard against the loss of frequency information. Our comprehensive experimental evaluations, encompassing a wide range of fusion tasks and subsequent applications, distinctly highlight FreqGAN’s superior performance, establishing it as a frontrunner in comparison to existing state-of-the-art alternatives. The source codes are forthcoming at: https://github.com/Zhishe-Wang/FreqGAN.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 35, Issue: 1, January 2025)