Cited By
View all- Cui XSun QWang MLi LZhou WLi H(undefined)LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene SynthesisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3716389
Dense captioning (DC), which provides a comprehensive context understanding of images by describing all salient visual groundings in an image, facilitates multimodal understanding and learning. As an extension of image captioning, DC is developed to ...
The world around us is composed of images that often need to be translated into words. This translation can take place in parts, converting regions of the image into textual descriptions what is also known as dense captioning. By doing so, the ...
Self-attention based Transformer has been successfully introduced in the encoder-decoder framework of image captioning, which is superior in modeling the inner relations of inputs, i.e., image regions or semantic words. However, ...
Association for Computing Machinery
New York, NY, United States
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in