Loading [a11y]/accessibility-menu.js
Complete 3D Relationships Extraction Modality Alignment Network for 3D Dense Captioning | IEEE Journals & Magazine | IEEE Xplore

Complete 3D Relationships Extraction Modality Alignment Network for 3D Dense Captioning


Abstract:

3D dense captioning aims to semantically describe each object detected in a 3D scene, which plays a significant role in 3D scene understanding. Previous works lack a comp...Show More

Abstract:

3D dense captioning aims to semantically describe each object detected in a 3D scene, which plays a significant role in 3D scene understanding. Previous works lack a complete definition of 3D spatial relationships and the directly integrate visual and language modalities, thus ignoring the discrepancies between the two modalities. To address these issues, we propose a novel complete 3D relationship extraction modality alignment network, which consists of three steps: 3D object detection, complete 3D relationships extraction, and modality alignment caption. To comprehensively capture the 3D spatial relationship features, we define a complete set of 3D spatial relationships, including the local spatial relationship between objects and the global spatial relationship between each object and the entire scene. To this end, we propose a complete 3D relationships extraction module based on message passing and self-attention to mine multi-scale spatial relationship features and inspect the transformation to obtain features in different views. In addition, we propose the modality alignment caption module to fuse multi-scale relationship features and generate descriptions to bridge the semantic gap from the visual space to the language space with the prior information in the word embedding, and help generate improved descriptions for the 3D scene. Extensive experiments demonstrate that the proposed model outperforms the state-of-the-art methods on the ScanRefer and Nr3D datasets.
Published in: IEEE Transactions on Visualization and Computer Graphics ( Volume: 30, Issue: 8, August 2024)
Page(s): 4867 - 4880
Date of Publication: 23 May 2023

ISSN Information:

PubMed ID: 37220037

Funding Agency:


References

References is not available for this document.