User-centric multimodal feature extraction for personalized retrieval of tumblr posts

Ohtomo, Kazuma; Harakawa, Ryosuke; Ogawa, Takahiro; Haseyama, Miki; Iwahashi, Masahiro

doi:10.1007/s11042-021-11634-0

User-centric multimodal feature extraction for personalized retrieval of tumblr posts

Published: 11 November 2021

Volume 81, pages 2979–3003, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Kazuma Ohtomo ORCID: orcid.org/0000-0003-3217-3662¹,
Ryosuke Harakawa¹,
Takahiro Ogawa²,
Miki Haseyama² &
…
Masahiro Iwahashi¹

495 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Tumblr is one of the most popular micro-blogging services worldwide on which users can share posts consisting of texts and images. This paper proposes a user-centric method of multimodal feature extraction for the personalized retrieval of Tumblr posts. To implement personalized retrieval, we formulate each user’s preferences as a triplet loss by using Likes as metadata as well as the text- and image-related features of posts. Furthermore, we develop a personalized multivariational autoencoder (PMVAE) by introducing a triplet loss into multivariational autoencoder (MVAE), which is among the most effective methods of multimodal feature extraction. Previously proposed variants of MVAE can project multiple kinds of features into the single latent features. However, because the latent features do not reflect each user’s preferences, retrieval performance when using the previous methods is limited. On the contrary, our PMVAE can extract relationships between text- and image-related features of posts by considering class-related information that represents whether a user prefers a given post. As a result, user-centric multimodal features, which separate a post that a user prefer and a post that a user does not prefer in the latent feature space, can be obtained. Because user-centric multimodal features have high discriminating power, the personalized retrieval of posts desired by each user becomes feasible by using them in such retrieval algorithms as the k-nearest neighbors and Annoy, which is a technique for approximate nearest neighbor search. We conduct experiments using 10 users and 150,947 contents, to verify the performance of k-NN and Annoy. The results show that our PMVAE increased normalized discounted cumulative gain (nDCG) compared with existing methods. The nDCG becomes 0.253 when using term frequency-inverse document frequency based text features and our end-to-end image features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Deep Approach for Multi-modal User Attribute Modeling

Multimodal data fusion framework based on autoencoders for top-N recommender systems

Article 30 March 2019

Learning a Semantic Space for Modeling Images, Tags and Feelings in Cross-Media Search

Notes

References

Ahmed A, Jalal A, Kim K (2020) Rgb-d images for object segmentation, localization and recognition in indoor scenes using feature descriptor and hough voting. In: 2020 17th international Bhurban conference on applied sciences and technology (IBCAST), pp 290–295
Ai Q, Zhang Y, Bi K, Chen X, Croft WB (2017) Learning a hierarchical embedding model for personalized product search. In: Proc. international ACM SIGIR conf. research and development in information retrieval, pp 645–654
Alam F, Imran M, Ofli F (2017) Image4act: Online social media image processing for disaster response. In: Proc. conf. advances in social networks analysis and mining 2017, pp 601–604
Almatarneh S, Gamallo P, Pena FJR (2019) CiTIUS-COLE at semeval-2019 task 5: Combining linguistic features to identify hate speech against immigrants and women on multilingual tweets. In: Proc. workshop on semantic evaluation, pp 387–390
Badar ud din Tahir S, Jalal A, Batool M (2020) Wearable sensors for activity analysis using smo-based random forest over smart home and sports datasets. In: 2020 3rd International conference on advancements in computational sciences (ICACS), pp 1–6
Chang Y, Tang L, Inagaki Y, Liu Y (2014) What is Tumblr: A statistical overview and comparison. SIGKDD Explor. Newsl. 16(1):21–29
Article Google Scholar
Chen, Y, Wang N, Zhang Z (2018) Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In: Thirty-second AAAI conf. artificial intelligence
Cheng Z, Jialie, S, Hoi SC (2016) On effective personalized music retrieval by exploring online user behaviors. In: Proc. international ACM SIGIR conf. on research and development in information Retrieval, pp 125–134
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Farooq A, Jalal A, Kamal S (2015) Dense rgb-d map-based human tracking and activity recognition using skin joints features and self-organizing map. KSII transactions on internet and information systems (TIIS) 5, 5
Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proc. european conf. computer vision (ECCV), pp 269–285
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: Proc. european conf. computer vision (ECCV). Springer, pp 241–257
Harakawa R, Ogawa T, Haseyama M (2016) Accurate and efficient extraction of hierarchical structure of web communities for web video retrieval. ITE Trans. Media Technology and Applications 4(1):49–59
Article Google Scholar
Harakawa R, Takehara D, Ogawa T, Haseyama M (2018) Sentiment-aware personalized tweet recommendation through multimodal FFM. Multimedia Tools and Applications 77(14):18741–18759
Article Google Scholar
Harakawa R, Takimura S, Ogawa T, Haseyama M, Iwahashi M (2019) Consensus clustering of tweet networks via semantic and sentiment similarity estimation. IEEE Access 7:116207–116217
Article Google Scholar
He K, Zhang X, Ren S, Sun, J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conf. computer vision and pattern recognition, pp 770–778
Hu N, Zhang J, Pavlou PA (2009) Overcoming the j-shaped distribution of product reviews. Commun. ACM 52(10):144–147
Article Google Scholar
Jalal A, Kamal S, Kim D (2014) Depth map-based human activity tracking and recognition using body joints features and self-organized map. In: Fifth international conference on computing, communications and networking technologies (ICCCNT), pp 1–6
Jalal A, Kamal S, Kim D (2014) A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors 14(7):11735–11759
Article Google Scholar
Jalal A, Kamal S, Kim, D (2015) Depth silhouettes context: A new robust feature for human tracking and activity recognition based on embedded hmms. In: 2015 12th International conference on ubiquitous robots and ambient intelligence (URAI), pp 294–299
Jalal A, Kamal S, Kim, D (2015) Shape and motion features approach for activity tracking and recognition from kinect video camera. In: 2015 IEEE 29th International conference on advanced information networking and applications workshops, pp 445–450
Jalal A, Kamal S, Kim D (2016) Human Depth Sensors-Based Activity Recognition Using Spatiotemporal Features and Hidden Markov Model for Smart Environments. Journal of Computer Networks and Communications 2016:8087545
Article Google Scholar
Jalal A, Kim J, Kim, T-H (2012) Development of a life logging system via depth imaging-based human activity recognition for smart homes. Proceedings of the international symposium on sustainable healthy buildings, pp 91–95
Jalal A, Kim Y (2014) Dense depth maps-based human pose tracking and recognition in dynamic scenes using ridge data. In: 2014 11th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 19–124
Jalal A, Kim Y-H, Kim Y-J, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognition 61:295–308
Article Google Scholar
Jalal A, Quaid MAK, Kim K (2019) A Wrist Worn Acceleration Based Human Motion Analysis and Classification for Ambient Smart Home System. Journal of Electrical Engineering & Technology 14(4):1733–1739
Article Google Scholar
Jalal A, Sharif N, Kim J, Kim T-S (2013) Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home. Indoor and built environment 22 , pp 271–279
Jin Z, Cao J, Guo H, Zhang Y, Wang Y, Luo, J (2017) Detection and analysis of 2016 US presidential election related rumors on Twitter. In: Proc. conf. SBP-BRiMS. Springer, pp 14–24
Kamal S, Jalal A (2016) A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors. Arabian Journal for Science and Engineering 41(3):1043–1051
Article Google Scholar
Kamal S, Jalal A, Kim D (2016) Depth images-based human detection, tracking and activity recognition using spatiotemporal features and modified hmm. J Electric Eng Technol 6. https://doi.org/10.5370/JEET.2016.11.6.1857
Kaya M, Bilge H (2019) Deep metric learning: A survey. Symmetry 11(9):1066:1-1066:26
Article Google Scholar
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and information conference, pp 372–378
Kim K, Jalal A, Mahmood M (2019) Vision-Based Human Activity Recognition System Using Depth Silhouettes: A Smart Home System for Monitoring the Residents. Journal of Electrical Engineering & Technology 14(6):2567–2573
Article Google Scholar
Kim W, Goyal B, Chawla K, Lee J, Kwon, K (2018) Attention-based ensemble for deep metric learning. In: Proc. european conf. computer vision (ECCV), pp 736–751
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proc. conf. machine learning, pp 1188–1196
Lee J, Abu-El-Haija S, Varadarajan B, Natsev A (2018) Collaborative deep metric learning for video understanding. In: Proc. ACM special interest group on knowledge discovery in data (SIGKDD), pp 481–490
Li W, Zhang Y, Sun Y, Wang W, Li M, Zhang W, Lin, X (2019) Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement. IEEE Trans Knowl Data Eng :1–14
Liang J, Hu Q, Zhu P, Wang W (2018) Efficient multi-modal geometric mean metric learning. Pattern Recognition 75:188–198
Article Google Scholar
Liao L, He X, Zhao B, Ngo C-W, Chua T-S (2018) Interpretable multimodal retrieval for fashion products. MM ’18, Association for Computing Machinery, pp 1571–1579
Lin X, Duan Y, Dong Q, Lu J, Zhou J (2018) Deep variational metric learning. In: Proc. european conf. computer vision (ECCV), pp 689–704
Liong VE, Lu Tan, Tan Y, Zhou J (2016) Deep coupled metric learning for cross-modal matching. IEEE Trans. Multimedia 19(6):1234–1244
Article Google Scholar
Mahmood M, Jalal A, Kim K (2020) WHITE STAG model: wise human interaction tracking and estimation (WHITE) using spatio-temporal and angular-geometric (STAG) descriptors. Multimedia Tools and Applications 79(11):6919–6950
Article Google Scholar
Mekala D, Gupta V, Paranjape B, Karnick H (2016) SCDV: Sparse composite document vectors using soft clustering over distributional representations. arXiv preprint arXiv:1612.06778
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean, J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems(NeurIPS), pp 3111–3119
Nadeem A, Jalal A, Kim K (2020) Human actions tracking and recognition based on body parts detection via artificial neural network. In: 2020 3rd International conference on advancements in computational sciences (ICACS), pp 1–6
Nitish S, Ruslan S (2014) Multimodal learning with deep boltzmann machines. J. Mach. Learn. Res. 15:2949–2980
MathSciNet MATH Google Scholar
Oh Song H, Jegelka S, Rathod, V, Murphy K (2017) Deep metric learning via facility location. In: Proc. IEEE conf. on computer vision and pattern recognition (CVPR), pp 5382–5390
Osterland S, Weber J (2019) Analytical analysis of single-stage pressure relief valves. International Journal of Hydromechatronics 2:32
Article Google Scholar
Passalis N, Iosifidis A, Gabbouj M, Tefas A (2020) Variance-preserving deep metric learning for content-based image retrieval. Pattern Recognition Letters 131:8–14
Article Google Scholar
Quaid MAK, Jalal A (2020) Wearable sensors based human behavioral pattern recognition using statistical features and reweighted genetic algorithm. Multimedia Tools and Applications 79(9):6061–6083
Article Google Scholar
Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv:1710.05941
Rizwan SA, Jalal A, Kim, K (2020) An accurate facial expression detector using multi-landmarks selection and local transform features. In: 2020 3rd International conference on advancements in computational sciences (ICACS), pp 1–6
Roostaiyan SM, Imani E, Baghshah MS (2017) Multi-modal deep distance metric learning. Intelligent Data Analysis 21(6):1351–1369
Article Google Scholar
Roy A, Paul A, Pirsiavash H, Pan, S (2017) Automated detection of substance use-related social media posts based on image and text analysis. In: 2017 IEEE 29th International conf. tools with artificial intelligence (ICTAI). IEEE, pp 72–779
Sang J (2014) User-centric social multimedia computing. Springer, New York
Book Google Scholar
Saritha RR, Paul V, Kumar PG (2019) Content based image retrieval using deep learning process. Cluster Computing 22(2):4187–4200
Article Google Scholar
Seyedin S, Ahadi SM (2009) Robust mvdr-based feature extraction for speech recognition. In: 2009 7th International conference on information, communications and signal processing (ICICS), pp 1–5
Shi Y, Siddharth N, Paige B, Torr P (2019) Variational mixture-of-experts autoencoders for multi-modal deep generative models. In: Proc. advances in neural information processing system (NeurIPS), pp 15692–15703
Shokri M, Tavakoli K (2019) A review on the artificial neural network approach to analysis and prediction of seismic damage in infrastructure. International Journal of Hydromechatronics 2:178
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sønderby,C, R T, M L, Sønderby S, WO (2016) How to train deep variational autoencoders and probabilistic ladder networks. In: Proc. int. conf. machine learning (ICML), pp 1–9
Sparling EI, Sen S (2011) Rating: How difficult is it? In: Proceedings of the fifth ACM conference on recommender systems , RecSys ’11, Association for Computing Machinery, pp 149–156
Susan S, Agrawal P, Mittal M, Bansal S (2019) New shape descriptor in the context of edge continuity. CAAI Transactions on Intelligence Technology 4(2):101–109
Article Google Scholar
Suzuki M, Nakayama K, Matsuo Y (2016) Joint multimodal learning with deep generative models. arXiv:1611.01891
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proc. conf. computer vision and pattern recognition (CVPR), pp 1–9
Tabrizi SA, Shakery A, Zamani H, Tavallaei MA (2018) Person: Personalized information retrieval evaluation based on citation networks. Information Processing & Management 54(4):630–656
Article Google Scholar
Tautkute I, Trzciński T, Skorupa AP, Brocki L, Marasek K (2019) Deepstyle: Multimodal search engine for fashion and interior design. IEEE Access 7:84613–84628
Article Google Scholar
Tingting Y, Junqian W, Lintai W, Yong X (2019) Three-stage network for age estimation. CAAI Transactions on Intelligence Technology 4(2):122–126
Article Google Scholar
Vedantam R, Fischer I, Huang J, Murphy K (2017) Generative models of visually grounded imagination. arXiv:1705.10762
Vicente-López E, de Campos LM, Fernández-Luna JM, Huete JF (2016) Use of textual and conceptual profiles for personalized retrieval of political documents. Knowledge-Based Systems 112:127–141
Article Google Scholar
Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y (2014) Learning fine-grained image similarity with deep ranking. In: Proc. IEEE conf. computer vision and pattern recognition (CVPR), pp 1386–1393
Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proc. of the IEEE international conf. on computer vision (ICCV), pp 2593–2601
Wang W, Yan X, Lee H, Livescu K (2016) Deep variational canonical correlation analysis. arXiv:1610.03454
Wiens T (2019) Engine speed reduction for hydraulic machinery using predictive algorithms. International Journal of Hydromechatronics 2(1):16–31
Article Google Scholar
Wu M, Goodman N (2018) Multimodal generative models for scalable weakly-supervised learning. In: Proc. conf. neural information processing systems (NeurIPS), pp 5575–5585
Wu Y, Wang S. Huang Q (2017) Online asymmetric similarity learning for cross-modal retrieval. In: Proc. IEEE conf. computer vsion and pattern recognition (CVPR), pp 4269–4278
Xu X, He L, Lu H, Gao L, Ji Y (2019) Deep adversarial metric learning for cross-modal retrieval. World Wide Web 22(2):657–672
Article Google Scholar
Yaacob NI, Tahir NM (2012) Feature selection for gait recognition. In: 2012 IEEE symposium on humanities, science and engineering research, pp. 379–383
Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans. Cybernetics 47(12):4014–4024
Article Google Scholar
Zhao W, Zhou D, Wu X, Lawless S, Liu J (2017) An augmented user model for personalized search in collaborative social tagging systems. EAI Endorsed Transactions on Collaborative Computing 3:12
Article Google Scholar
Zhu C, Miao D (2019) Influence of kernel clustering on an rbfn. CAAI Transactions on Intelligence Technology 4(4):255–260
Article Google Scholar

Download references

Acknowledgements

We thank Saad Anis, PhD, from Edanz Group (https://en-author-services.edanzgroup.com/) for editing a draft of this manuscript. This work was partly supported by JSPS KAKENHI Grant Number JP21K17861, and the MIC/SCOPE #181601001.

Author information

Authors and Affiliations

Department of Electrical, Electronics and Information Engineering, Nagaoka University of Technology, Nagaoka, Japan
Kazuma Ohtomo, Ryosuke Harakawa & Masahiro Iwahashi
Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan
Takahiro Ogawa & Miki Haseyama

Authors

Kazuma Ohtomo
View author publications
You can also search for this author in PubMed Google Scholar
Ryosuke Harakawa
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Ogawa
View author publications
You can also search for this author in PubMed Google Scholar
Miki Haseyama
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Iwahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazuma Ohtomo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ohtomo, K., Harakawa, R., Ogawa, T. et al. User-centric multimodal feature extraction for personalized retrieval of tumblr posts. Multimed Tools Appl 81, 2979–3003 (2022). https://doi.org/10.1007/s11042-021-11634-0

Download citation

Received: 30 June 2020
Revised: 26 July 2021
Accepted: 27 September 2021
Published: 11 November 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11042-021-11634-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

User-centric multimodal feature extraction for personalized retrieval of tumblr posts

Abstract

Access this article

Similar content being viewed by others

A Deep Approach for Multi-modal User Attribute Modeling

Multimodal data fusion framework based on autoencoders for top-N recommender systems

Learning a Semantic Space for Modeling Images, Tags and Feelings in Cross-Media Search

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

User-centric multimodal feature extraction for personalized retrieval of tumblr posts

Abstract

Access this article

Similar content being viewed by others

A Deep Approach for Multi-modal User Attribute Modeling

Multimodal data fusion framework based on autoencoders for top-N recommender systems

Learning a Semantic Space for Modeling Images, Tags and Feelings in Cross-Media Search

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation