Cited By
View all- Liu YZhang BWang CYan GZhou KLi ZZhang L(2025)Vision-language representation learning with breadth and depth attention pre-trainingKnowledge-Based Systems10.1016/j.knosys.2024.112941(112941)Online publication date: Jan-2025
- Lee DLee WSerra ESpezzano F(2024)Learning Prompt-Level Quality Variance for Cost-Effective Text-to-Image GenerationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679954(3847-3851)Online publication date: 21-Oct-2024
- Lu SLiu MYin LYin ZLiu XZheng W(2023)The multi-modal fusion in visual question answering: a review of attention mechanismsPeerJ Computer Science10.7717/peerj-cs.14009(e1400)Online publication date: 30-May-2023
- Show More Cited By