Abstract
Tumblr is one of the most popular micro-blogging services worldwide on which users can share posts consisting of texts and images. This paper proposes a user-centric method of multimodal feature extraction for the personalized retrieval of Tumblr posts. To implement personalized retrieval, we formulate each user’s preferences as a triplet loss by using Likes as metadata as well as the text- and image-related features of posts. Furthermore, we develop a personalized multivariational autoencoder (PMVAE) by introducing a triplet loss into multivariational autoencoder (MVAE), which is among the most effective methods of multimodal feature extraction. Previously proposed variants of MVAE can project multiple kinds of features into the single latent features. However, because the latent features do not reflect each user’s preferences, retrieval performance when using the previous methods is limited. On the contrary, our PMVAE can extract relationships between text- and image-related features of posts by considering class-related information that represents whether a user prefers a given post. As a result, user-centric multimodal features, which separate a post that a user prefer and a post that a user does not prefer in the latent feature space, can be obtained. Because user-centric multimodal features have high discriminating power, the personalized retrieval of posts desired by each user becomes feasible by using them in such retrieval algorithms as the k-nearest neighbors and Annoy, which is a technique for approximate nearest neighbor search. We conduct experiments using 10 users and 150,947 contents, to verify the performance of k-NN and Annoy. The results show that our PMVAE increased normalized discounted cumulative gain (nDCG) compared with existing methods. The nDCG becomes 0.253 when using term frequency-inverse document frequency based text features and our end-to-end image features.
Similar content being viewed by others
Notes
References
Ahmed A, Jalal A, Kim K (2020) Rgb-d images for object segmentation, localization and recognition in indoor scenes using feature descriptor and hough voting. In: 2020 17th international Bhurban conference on applied sciences and technology (IBCAST), pp 290–295
Ai Q, Zhang Y, Bi K, Chen X, Croft WB (2017) Learning a hierarchical embedding model for personalized product search. In: Proc. international ACM SIGIR conf. research and development in information retrieval, pp 645–654
Alam F, Imran M, Ofli F (2017) Image4act: Online social media image processing for disaster response. In: Proc. conf. advances in social networks analysis and mining 2017, pp 601–604
Almatarneh S, Gamallo P, Pena FJR (2019) CiTIUS-COLE at semeval-2019 task 5: Combining linguistic features to identify hate speech against immigrants and women on multilingual tweets. In: Proc. workshop on semantic evaluation, pp 387–390
Badar ud din Tahir S, Jalal A, Batool M (2020) Wearable sensors for activity analysis using smo-based random forest over smart home and sports datasets. In: 2020 3rd International conference on advancements in computational sciences (ICACS), pp 1–6
Chang Y, Tang L, Inagaki Y, Liu Y (2014) What is Tumblr: A statistical overview and comparison. SIGKDD Explor. Newsl. 16(1):21–29
Chen, Y, Wang N, Zhang Z (2018) Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In: Thirty-second AAAI conf. artificial intelligence
Cheng Z, Jialie, S, Hoi SC (2016) On effective personalized music retrieval by exploring online user behaviors. In: Proc. international ACM SIGIR conf. on research and development in information Retrieval, pp 125–134
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Farooq A, Jalal A, Kamal S (2015) Dense rgb-d map-based human tracking and activity recognition using skin joints features and self-organizing map. KSII transactions on internet and information systems (TIIS) 5, 5
Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proc. european conf. computer vision (ECCV), pp 269–285
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: Proc. european conf. computer vision (ECCV). Springer, pp 241–257
Harakawa R, Ogawa T, Haseyama M (2016) Accurate and efficient extraction of hierarchical structure of web communities for web video retrieval. ITE Trans. Media Technology and Applications 4(1):49–59
Harakawa R, Takehara D, Ogawa T, Haseyama M (2018) Sentiment-aware personalized tweet recommendation through multimodal FFM. Multimedia Tools and Applications 77(14):18741–18759
Harakawa R, Takimura S, Ogawa T, Haseyama M, Iwahashi M (2019) Consensus clustering of tweet networks via semantic and sentiment similarity estimation. IEEE Access 7:116207–116217
He K, Zhang X, Ren S, Sun, J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conf. computer vision and pattern recognition, pp 770–778
Hu N, Zhang J, Pavlou PA (2009) Overcoming the j-shaped distribution of product reviews. Commun. ACM 52(10):144–147
Jalal A, Kamal S, Kim D (2014) Depth map-based human activity tracking and recognition using body joints features and self-organized map. In: Fifth international conference on computing, communications and networking technologies (ICCCNT), pp 1–6
Jalal A, Kamal S, Kim D (2014) A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors 14(7):11735–11759
Jalal A, Kamal S, Kim, D (2015) Depth silhouettes context: A new robust feature for human tracking and activity recognition based on embedded hmms. In: 2015 12th International conference on ubiquitous robots and ambient intelligence (URAI), pp 294–299
Jalal A, Kamal S, Kim, D (2015) Shape and motion features approach for activity tracking and recognition from kinect video camera. In: 2015 IEEE 29th International conference on advanced information networking and applications workshops, pp 445–450
Jalal A, Kamal S, Kim D (2016) Human Depth Sensors-Based Activity Recognition Using Spatiotemporal Features and Hidden Markov Model for Smart Environments. Journal of Computer Networks and Communications 2016:8087545
Jalal A, Kim J, Kim, T-H (2012) Development of a life logging system via depth imaging-based human activity recognition for smart homes. Proceedings of the international symposium on sustainable healthy buildings, pp 91–95
Jalal A, Kim Y (2014) Dense depth maps-based human pose tracking and recognition in dynamic scenes using ridge data. In: 2014 11th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 19–124
Jalal A, Kim Y-H, Kim Y-J, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognition 61:295–308
Jalal A, Quaid MAK, Kim K (2019) A Wrist Worn Acceleration Based Human Motion Analysis and Classification for Ambient Smart Home System. Journal of Electrical Engineering & Technology 14(4):1733–1739
Jalal A, Sharif N, Kim J, Kim T-S (2013) Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home. Indoor and built environment 22 , pp 271–279
Jin Z, Cao J, Guo H, Zhang Y, Wang Y, Luo, J (2017) Detection and analysis of 2016 US presidential election related rumors on Twitter. In: Proc. conf. SBP-BRiMS. Springer, pp 14–24
Kamal S, Jalal A (2016) A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors. Arabian Journal for Science and Engineering 41(3):1043–1051
Kamal S, Jalal A, Kim D (2016) Depth images-based human detection, tracking and activity recognition using spatiotemporal features and modified hmm. J Electric Eng Technol 6. https://doi.org/10.5370/JEET.2016.11.6.1857
Kaya M, Bilge H (2019) Deep metric learning: A survey. Symmetry 11(9):1066:1-1066:26
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and information conference, pp 372–378
Kim K, Jalal A, Mahmood M (2019) Vision-Based Human Activity Recognition System Using Depth Silhouettes: A Smart Home System for Monitoring the Residents. Journal of Electrical Engineering & Technology 14(6):2567–2573
Kim W, Goyal B, Chawla K, Lee J, Kwon, K (2018) Attention-based ensemble for deep metric learning. In: Proc. european conf. computer vision (ECCV), pp 736–751
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proc. conf. machine learning, pp 1188–1196
Lee J, Abu-El-Haija S, Varadarajan B, Natsev A (2018) Collaborative deep metric learning for video understanding. In: Proc. ACM special interest group on knowledge discovery in data (SIGKDD), pp 481–490
Li W, Zhang Y, Sun Y, Wang W, Li M, Zhang W, Lin, X (2019) Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement. IEEE Trans Knowl Data Eng :1–14
Liang J, Hu Q, Zhu P, Wang W (2018) Efficient multi-modal geometric mean metric learning. Pattern Recognition 75:188–198
Liao L, He X, Zhao B, Ngo C-W, Chua T-S (2018) Interpretable multimodal retrieval for fashion products. MM ’18, Association for Computing Machinery, pp 1571–1579
Lin X, Duan Y, Dong Q, Lu J, Zhou J (2018) Deep variational metric learning. In: Proc. european conf. computer vision (ECCV), pp 689–704
Liong VE, Lu Tan, Tan Y, Zhou J (2016) Deep coupled metric learning for cross-modal matching. IEEE Trans. Multimedia 19(6):1234–1244
Mahmood M, Jalal A, Kim K (2020) WHITE STAG model: wise human interaction tracking and estimation (WHITE) using spatio-temporal and angular-geometric (STAG) descriptors. Multimedia Tools and Applications 79(11):6919–6950
Mekala D, Gupta V, Paranjape B, Karnick H (2016) SCDV: Sparse composite document vectors using soft clustering over distributional representations. arXiv preprint arXiv:1612.06778
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean, J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems(NeurIPS), pp 3111–3119
Nadeem A, Jalal A, Kim K (2020) Human actions tracking and recognition based on body parts detection via artificial neural network. In: 2020 3rd International conference on advancements in computational sciences (ICACS), pp 1–6
Nitish S, Ruslan S (2014) Multimodal learning with deep boltzmann machines. J. Mach. Learn. Res. 15:2949–2980
Oh Song H, Jegelka S, Rathod, V, Murphy K (2017) Deep metric learning via facility location. In: Proc. IEEE conf. on computer vision and pattern recognition (CVPR), pp 5382–5390
Osterland S, Weber J (2019) Analytical analysis of single-stage pressure relief valves. International Journal of Hydromechatronics 2:32
Passalis N, Iosifidis A, Gabbouj M, Tefas A (2020) Variance-preserving deep metric learning for content-based image retrieval. Pattern Recognition Letters 131:8–14
Quaid MAK, Jalal A (2020) Wearable sensors based human behavioral pattern recognition using statistical features and reweighted genetic algorithm. Multimedia Tools and Applications 79(9):6061–6083
Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv:1710.05941
Rizwan SA, Jalal A, Kim, K (2020) An accurate facial expression detector using multi-landmarks selection and local transform features. In: 2020 3rd International conference on advancements in computational sciences (ICACS), pp 1–6
Roostaiyan SM, Imani E, Baghshah MS (2017) Multi-modal deep distance metric learning. Intelligent Data Analysis 21(6):1351–1369
Roy A, Paul A, Pirsiavash H, Pan, S (2017) Automated detection of substance use-related social media posts based on image and text analysis. In: 2017 IEEE 29th International conf. tools with artificial intelligence (ICTAI). IEEE, pp 72–779
Sang J (2014) User-centric social multimedia computing. Springer, New York
Saritha RR, Paul V, Kumar PG (2019) Content based image retrieval using deep learning process. Cluster Computing 22(2):4187–4200
Seyedin S, Ahadi SM (2009) Robust mvdr-based feature extraction for speech recognition. In: 2009 7th International conference on information, communications and signal processing (ICICS), pp 1–5
Shi Y, Siddharth N, Paige B, Torr P (2019) Variational mixture-of-experts autoencoders for multi-modal deep generative models. In: Proc. advances in neural information processing system (NeurIPS), pp 15692–15703
Shokri M, Tavakoli K (2019) A review on the artificial neural network approach to analysis and prediction of seismic damage in infrastructure. International Journal of Hydromechatronics 2:178
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sønderby,C, R T, M L, Sønderby S, WO (2016) How to train deep variational autoencoders and probabilistic ladder networks. In: Proc. int. conf. machine learning (ICML), pp 1–9
Sparling EI, Sen S (2011) Rating: How difficult is it? In: Proceedings of the fifth ACM conference on recommender systems , RecSys ’11, Association for Computing Machinery, pp 149–156
Susan S, Agrawal P, Mittal M, Bansal S (2019) New shape descriptor in the context of edge continuity. CAAI Transactions on Intelligence Technology 4(2):101–109
Suzuki M, Nakayama K, Matsuo Y (2016) Joint multimodal learning with deep generative models. arXiv:1611.01891
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proc. conf. computer vision and pattern recognition (CVPR), pp 1–9
Tabrizi SA, Shakery A, Zamani H, Tavallaei MA (2018) Person: Personalized information retrieval evaluation based on citation networks. Information Processing & Management 54(4):630–656
Tautkute I, Trzciński T, Skorupa AP, Brocki L, Marasek K (2019) Deepstyle: Multimodal search engine for fashion and interior design. IEEE Access 7:84613–84628
Tingting Y, Junqian W, Lintai W, Yong X (2019) Three-stage network for age estimation. CAAI Transactions on Intelligence Technology 4(2):122–126
Vedantam R, Fischer I, Huang J, Murphy K (2017) Generative models of visually grounded imagination. arXiv:1705.10762
Vicente-López E, de Campos LM, Fernández-Luna JM, Huete JF (2016) Use of textual and conceptual profiles for personalized retrieval of political documents. Knowledge-Based Systems 112:127–141
Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y (2014) Learning fine-grained image similarity with deep ranking. In: Proc. IEEE conf. computer vision and pattern recognition (CVPR), pp 1386–1393
Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proc. of the IEEE international conf. on computer vision (ICCV), pp 2593–2601
Wang W, Yan X, Lee H, Livescu K (2016) Deep variational canonical correlation analysis. arXiv:1610.03454
Wiens T (2019) Engine speed reduction for hydraulic machinery using predictive algorithms. International Journal of Hydromechatronics 2(1):16–31
Wu M, Goodman N (2018) Multimodal generative models for scalable weakly-supervised learning. In: Proc. conf. neural information processing systems (NeurIPS), pp 5575–5585
Wu Y, Wang S. Huang Q (2017) Online asymmetric similarity learning for cross-modal retrieval. In: Proc. IEEE conf. computer vsion and pattern recognition (CVPR), pp 4269–4278
Xu X, He L, Lu H, Gao L, Ji Y (2019) Deep adversarial metric learning for cross-modal retrieval. World Wide Web 22(2):657–672
Yaacob NI, Tahir NM (2012) Feature selection for gait recognition. In: 2012 IEEE symposium on humanities, science and engineering research, pp. 379–383
Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans. Cybernetics 47(12):4014–4024
Zhao W, Zhou D, Wu X, Lawless S, Liu J (2017) An augmented user model for personalized search in collaborative social tagging systems. EAI Endorsed Transactions on Collaborative Computing 3:12
Zhu C, Miao D (2019) Influence of kernel clustering on an rbfn. CAAI Transactions on Intelligence Technology 4(4):255–260
Acknowledgements
We thank Saad Anis, PhD, from Edanz Group (https://en-author-services.edanzgroup.com/) for editing a draft of this manuscript. This work was partly supported by JSPS KAKENHI Grant Number JP21K17861, and the MIC/SCOPE #181601001.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ohtomo, K., Harakawa, R., Ogawa, T. et al. User-centric multimodal feature extraction for personalized retrieval of tumblr posts. Multimed Tools Appl 81, 2979–3003 (2022). https://doi.org/10.1007/s11042-021-11634-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11634-0