Cited By
View all- Liu JChen SHe XGuo LZhu XWang WTang J(2025)VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.347977647:2(708-724)Online publication date: Feb-2025
- Yuan MJia GBao B(2024)GPT-Based Knowledge Guiding Network for Commonsense Video CaptioningIEEE Transactions on Multimedia10.1109/TMM.2023.333007026(5147-5158)Online publication date: 2024
- Kainulainen JGuo ZLaaksonen J(2024)Diffusion-Based Multimodal Video CaptioningComputer Vision – ACCV 202410.1007/978-981-96-0908-6_9(148-165)Online publication date: 7-Dec-2024
- Show More Cited By