research-article

Deep Attentive Multimodal Network Representation Learning for Social Media Images

Authors:

Sattam Alotaibi,

Hao ChenAuthors Info & Claims

ACM Transactions on Internet Technology (TOIT), Volume 21, Issue 3

Article No.: 69, Pages 1 - 17

https://doi.org/10.1145/3417295

Published: 16 June 2021 Publication History

Abstract

The analysis for social networks, such as the socially connected Internet of Things, has shown a deep influence of intelligent information processing technology on industrial systems for Smart Cities. The goal of social media representation learning is to learn dense, low-dimensional, and continuous representations for multimodal data within social networks, facilitating many real-world applications. Since social media images are usually accompanied by rich metadata (e.g., textual descriptions, tags, groups, and submitted users), simply modeling the image is not effective to learn the comprehensive information from social media images. In this work, we treat the image and its textual description as multimodal content, and transform other metainformation into the links between contents (such as two images marked by the same tag or submitted by the same user). Based on the multimodal content and social links, we propose a Deep Attentive Multimodal Graph Embedding model named DAMGE for more effective social image representation learning. We introduce both small- and large-scale datasets to conduct extensive experiments, of which the results confirm the superiority of the proposal on the tasks of social image classification and link prediction.

References

[1]

Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 6 (2003), 1373–1396.

Digital Library

[2]

Jiuwen Cao, Kai Zhang, Minxia Luo, Chun Yin, and Xiaoping Lai. 2016. Extreme learning machine and adaptive sparse representation for image classification. Neural Netw. 81 (2016), 91–102.

Digital Library

[3]

Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast learning with graph convolutional networks via importance sampling. In 6th International Conference on Learning Representations (ICLR’18), Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rytstxWAW.

[4]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the 8th ACM International Conference on Image and Video Retrieval (CIVR’09), Stéphane Marchand-Maillet and Yiannis Kompatsiaris (Eds.). ACM.

Digital Library

[5]

Paul D. Clough, Michael Grubinger, Thomas Deselaers, Allan Hanbury, and Henning Müller. 2006. Overview of the ImageCLEF 2006 photographic retrieval and object annotation tasks. In Evaluation of Multilingual and Multi-Modal Information Retrieval, 7th Workshop of the Cross-Language Evaluation Forum (CLEF’06), Revised Selected Papers, Lecture Notes in Computer Science, Vol. 4730, Carol Peters, Paul D. Clough, Fredric C. Gey, Jussi Karlgren, Bernardo Magnini, Douglas W. Oard, Maarten de Rijke, and Maximilian Stempfhuber (Eds.). Springer, 579–594.

Digital Library

[6]

Peng Cui, Shaowei Liu, and Wenwu Zhu. 2018. General knowledge embedded image representation learning. IEEE Trans. Multimedia 20, 1 (2018), 198–207.

Digital Library

[7]

Mark Everingham, Luc J. Van Gool, Christopher K. I. Williams, John M. Winn, and Andrew Zisserman. 2010. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 2 (2010), 303–338.

Digital Library

[8]

Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 2121–2129. http://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.

Digital Library

[9]

Yue Gao, Yi Zhen, Haojie Li, and Tat-Seng Chua. 2016. Filtering of brand-related microblogs using social-smooth multiview embedding. IEEE Trans. Multimedia 18, 10 (2016), 2115–2126.

[10]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu Aggarwal, Dou Shen, and Rajeev Rastogi (Eds.). ACM, 855–864.

Digital Library

[11]

Zepeng Gu, Bo Lang, Tongyu Yue, and Lei Huang. 2017. Learning joint multimodal representation based on multi-fusion deep neural networks. In 24th International Conference on Neural Information Processing (ICONIP’17), Part II, Lecture Notes in Computer Science, Vol. 10635, Derong Liu, Shengli Xie, Yuanqing Li, Dongbin Zhao, and El-Sayed M. El-Alfy (Eds.). Springer, 276–285.

[12]

William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 1024–1034. http://papers.nips.cc/paper/6703-inductive-representation-learning-on-large-graphs.

Digital Library

[13]

David R. Hardoon, Sándor Szedmák, and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16, 12 (2004), 2639–2664.

Digital Library

[14]

Feiran Huang, Xiaoming Zhang, Chaozhuo Li, Zhoujun Li, Yueying He, and Zhonghua Zhao. 2018. Multimodal network embedding via attention based multi-view variational autoencoder. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR’18), Kiyoharu Aizawa, Michael S. Lew, and Shin’ichi Satoh (Eds.). ACM, 108–116.

Digital Library

[15]

Feiran Huang, Xiaoming Zhang, and Zhoujun Li. 2018. Learning joint multimodal representation with adversarial attention networks. In 2018 ACM Multimedia Conference on Multimedia(MM’18), Susanne Boll, Kyoung Mu Lee, Jiebo Luo, Wenwu Zhu, Hyeran Byun, Chang Wen Chen, Rainer Lienhart, and Tao Mei (Eds.). ACM, 1874–1882.

Digital Library

[16]

Feiran Huang, Xiaoming Zhang, Zhoujun Li, Tao Mei, Yueying He, and Zhonghua Zhao. 2017. Learning social image embedding with deep multimodal attention networks. In Proceedings of the Thematic Workshops of ACM Multimedia 2017, Wanmin Wu, Jianchao Yang, Qi Tian, and Roger Zimmermann (Eds.). ACM, 460–468.

Digital Library

[17]

Feiran Huang, Xiaoming Zhang, Zhoujun Li, Zhonghua Zhao, and Yueying He. 2018. From content to links: Social image embedding with deep multimodal model. Knowl.-Based Syst. 160 (2018), 251–264.

[18]

Feiran Huang, Xiaoming Zhang, Jie Xu, Chaozhuo Li, and Zhoujun Li. 2019. Network embedding by fusing multimodal contents and links. Knowl.-Based Syst. 171 (2019), 44–55.

[19]

Feiran Huang, Xiaoming Zhang, Jie Xu, Zhonghua Zhao, and Zhoujun Li. 2019. Multimodal learning of social image representation by exploiting social relations. IEEE Transactions on Cybernetics 51, 3 (2021), 1506–1518.

[20]

Feiran Huang, Xiaoming Zhang, Zhonghua Zhao, and Zhoujun Li. 2019. Bi-directional spatial-semantic attention networks for image-text matching. IEEE Trans. Image Processing 28, 4 (2019), 2008–2020.

Digital Library

[21]

Feiran Huang, Xiaoming Zhang, Zhonghua Zhao, Zhoujun Li, and Yueying He. 2018. Deep multi-view representation learning for social images. Appl. Soft Comput. 73 (2018), 106–118.

[22]

Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval (MIR’08), Michael S. Lew, Alberto Del Bimbo, and Erwin M. Bakker (Eds.). ACM, 39–43.

Digital Library

[23]

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations (ICLR ]1), Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl.

[24]

Chaozhuo Li, Senzhang Wang, Dejian Yang, Zhoujun Li, Yang Yang, Xiaoming Zhang, and Jianshe Zhou. 2017. PPNE: Property preserving network embedding. In Proceedings of the 22nd International Conference on Database Systems for Advanced Applications, (DASFAA’17), Part I, Lecture Notes in Computer Science, Vol. 10177, K. Selçuk Candan, Lei Chen, Torben Bach Pedersen, Lijun Chang, and Wen Hua (Eds.). Springer, 163–179.

[25]

Chaozhuo Li, Lei Zheng, Senzhang Wang, Feiran Huang, Philip S. Yu, and Zhoujun Li. 2019. Multi-hot compact network embedding. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’1), Wenwu Zhu, Dacheng Tao, Xueqi Cheng, Peng Cui, Elke A. Rundensteiner, David Carmel, Qi He, and Jeffrey Xu Yu (Eds.). ACM, 459–468.

Digital Library

[26]

Zechao Li, Jinhui Tang, and Tao Mei. 2019. Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 41, 9 (2019), 2070–2083.

[27]

Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2018. Attributed social network embedding. IEEE Trans. Knowl. Data Eng. 30, 12 (2018), 2257–2270.

Digital Library

[28]

Shaowei Liu, Peng Cui, Wenwu Zhu, and Shiqiang Yang. 2015. Learning socially embedded visual representation from scratch. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference (MM ’15,), Xiaofang Zhou, Alan F. Smeaton, Qi Tian, Dick C. A. Bulterman, Heng Tao Shen, Ketan Mayer-Patel, and Shuicheng Yan (Eds.). ACM, 109–118.

Digital Library

[29]

Yun Liu, Xiaoming Zhang, Feiran Huang, and Zhoujun Li. 2018. Adversarial learning of answer-related representation for visual question answering. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM’18), Alfredo Cuzzocrea, James Allan, Norman W. Paton, Divesh Srivastava, Rakesh Agrawal, Andrei Z. Broder, Mohammed J. Zaki, K. Selçuk Candan, Alexandros Labrinidis, Assaf Schuster, and Haixun Wang (Eds.). ACM, 1013–1022.

Digital Library

[30]

Yun Liu, Xiaoming Zhang, Feiran Huang, Xianghong Tang, and Zhoujun Li. 2019. Visual question answering via attention-based syntactic structure tree-LSTM. Appl. Soft Comput. 82 (2019).

[31]

Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 289–297. http://papers.nips.cc/paper/6202-hierarchical-question-image-co-attention-for-visual-question-answering.

Digital Library

[32]

Zhiwu Lu, Liwei Wang, and Ji-Rong Wen. 2014. Direct semantic analysis for social image classification. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Carla E. Brodley and Peter Stone (Eds.). AAAI Press, 1258–1264. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8189.

Digital Library

[33]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15), Lluís Màrquez, Chris Callison-Burch, Jian Su, Daniele Pighin, and Yuval Marton (Eds.). The Association for Computational Linguistics, 1412–1421.

[34]

Julian J. McAuley and Jure Leskovec. 2012. Image labeling on a network: Using social-network metadata for image classification. In Proceedings of the 12th European Conference on Computer Vision (ECCV’12), Part IV, (Lecture Notes in Computer Science), Vol. 7575, Andrew W. Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.),. Springer, 828–841.

Digital Library

[35]

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML’11), Lise Getoor and Tobias Scheffer (Eds.). Omnipress, 689–696.

Digital Library

[36]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1532–1543. http://aclweb.org/anthology/D/D14/D14-1162.pdf.

[37]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 701–710.

Digital Library

[38]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 91–99. http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.

Digital Library

[39]

Abhishek Sharma and David W. Jacobs. 2011. Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE Computer Society, 593–600.

Digital Library

[40]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556.

[41]

Nitish Srivastava and Ruslan Salakhutdinov. 2012. Multimodal learning with deep Boltzmann machines. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger (Eds.). 2231–2239. http://papers.nips.cc/paper/4683-multimodal-learning-with-deep-boltzmann-machines.

Digital Library

[42]

Joshua B. Tenenbaum, Vin De Silva, and John C. Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 5500 (2000), 2319–2323.

[43]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.

Digital Library

[44]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In 6th International Conference on Learning Representations (ICLR’18), Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rJXMpikCZ.

[45]

Petar Velickovic, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R. Devon Hjelm. 2019. Deep graph infomax. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net. https://openreview.net/forum?id=rklz9iAcKQ.

[46]

Vedran Vukotic, Christian Raymond, and Guillaume Gravier. 2016. Bidirectional joint representation learning with symmetrical deep neural networks for multimodal and crossmodal applications. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16), John R. Kender, John R. Smith, Jiebo Luo, Susanne Boll, and Winston H. Hsu (Eds.). ACM, 343–346.

Digital Library

[47]

Zhitao Wang, Chengyao Chen, and Wenjie Li. 2017. Predictive network representation learning for link prediction. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White (Eds.). ACM, 969–972.

Digital Library

[48]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15) (JMLR Workshop and Conference Proceedings), Francis R. Bach and David M. Blei (Eds.), Vol. 37. JMLR.org, 2048–2057. http://jmlr.org/proceedings/papers/v37/xuc15.html.

Digital Library

[49]

Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Processing 26, 5 (2017), 2494–2507.

Digital Library

[50]

Fei Yan and Krystian Mikolajczyk. 2015. Deep correlation for matching images and text. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE Computer Society, 3441–3450.

[51]

Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y. Chang. 2015. Network representation learning with rich text information. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI’15), Qiang Yang and Michael Wooldridge (Eds.). AAAI Press, 2111–2117. http://ijcai.org/Abstract/15/299.

Digital Library

[52]

Zhenguo Yang, Qing Li, Zheng Lu, Yun Ma, Zhiguo Gong, and Wenyin Liu. 2017. Dual structure constrained multimodal feature coding for social event detection from flickr data. ACM Trans. Internet Techn. 17, 2 (2017), 19:1–19:20.

Digital Library

[53]

Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, and Tao Mei. 2017. Boosting image captioning with attributes. In IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, 4904–4912.

[54]

Yi Zhuang, Nan Jiang, Qing Li, Lei Chen, and Chunhua Ju. 2015. Progressive batch medical image retrieval processing in mobile wireless networks. ACM Trans. Internet Techn. 15, 3 (2015), 9:1–9:27.

Digital Library

Cited By

Gómez Sánchez JRamos Hernández RAntonio Vidaña PPérez Hernández I(2024)Evaluation of the impact of security perception on the structural changes of MSEs through system dynamicsHeliyon10.1016/j.heliyon.2024.e3908510:21(e39085)Online publication date: Nov-2024
https://doi.org/10.1016/j.heliyon.2024.e39085
Fu ZZheng CFeng JCai YWei XWang YLi Q(2023)DRAKE: Deep Pair-Wise Relation Alignment for Knowledge-Enhanced Multimodal Scene Graph Generation in Social Media PostsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.323143733:7(3199-3213)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TCSVT.2022.3231437
Sun XZhou JLiu LWei W(2023)Explicit time embedding based cascade attention network for information popularity predictionInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10327860:3Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.ipm.2023.103278
Show More Cited By

Recommendations

College students social media use and communication network heterogeneity

This study examined whether and how the usage of social media can influence college students' level of network heterogeneity and how network heterogeneity is associated with levels of bridging/bonding social capital and subjective well-being. In ...
Cross-Modal Image-Tag Relevance Learning for Social Images
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

A new algorithm is developed in this paper to support more effective cross-modal image-tag relevance learning for large-scale social images, which integrates the multimodal feature representation, multimodal relevance measurement, and cross- modal ...
Social media user classification: based on social capital expectation, susceptibility, and compulsion loop
ICEC '17: Proceedings of the International Conference on Electronic Commerce

Social media such as Facebook, Instagram and Twitter are originally developed as communication tools among individuals for private conversations. Through the platforms, people share photos, stories and news with their social media friends to interact ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology

ACM Transactions on Internet Technology Volume 21, Issue 3

August 2021

522 pages

ISSN:1533-5399

EISSN:1557-6051

DOI:10.1145/3468071

Editor:
Ling Liu
Georgia Institute of Technology, USA

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2021

Accepted: 01 August 2020

Revised: 01 July 2020

Received: 01 May 2020

Published in TOIT Volume 21, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
Natural Science Foundation of Guangdong Province, China
Guangdong Provincial Key R&D Plan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
267
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)7

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gómez Sánchez JRamos Hernández RAntonio Vidaña PPérez Hernández I(2024)Evaluation of the impact of security perception on the structural changes of MSEs through system dynamicsHeliyon10.1016/j.heliyon.2024.e3908510:21(e39085)Online publication date: Nov-2024
https://doi.org/10.1016/j.heliyon.2024.e39085
Fu ZZheng CFeng JCai YWei XWang YLi Q(2023)DRAKE: Deep Pair-Wise Relation Alignment for Knowledge-Enhanced Multimodal Scene Graph Generation in Social Media PostsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.323143733:7(3199-3213)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TCSVT.2022.3231437
Sun XZhou JLiu LWei W(2023)Explicit time embedding based cascade attention network for information popularity predictionInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10327860:3Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.ipm.2023.103278
Huang FLiu WLi ZHuang W(2023)Multimodal learning of social image representationDigital Image Enhancement and Reconstruction10.1016/B978-0-32-398370-9.00013-5(139-150)Online publication date: 2023
https://doi.org/10.1016/B978-0-32-398370-9.00013-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents