skip to main content
10.1145/3474085.3481538acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards the Screenless Retailing

Published: 17 October 2021 Publication History

Abstract

Nowadays, almost all the online orders were placed through screened devices such as mobile phones, tablets, and computers. With the rapid development of the Internet of Things (IoT) and smart appliances, more and more screenless smart devices, e.g., smart speaker and smart refrigerator, appear in our daily lives. They open up new means of interaction and may provide an excellent opportunity to reach new customers and increase sales. However, not all the items are suitable for screenless shopping, since some items' appearance play an important role in consumer decision making. Typical examples include clothes, dolls, bags, and shoes. In this paper, we aim to infer the significance of every item's appearance in consumer decision making and identify the group of items that are suitable for screenless shopping. Specifically, we formulate the problem as a classification task that predicts if an item's appearance has a significant impact on people's purchase behavior. To solve this problem, we extract multi-modal features from three different views, and collect a set of necessary labels via crowdsourcing. We then propose an iterative semi-supervised learning framework with a carefully designed multi-modal enhancement module. Experimental results verify the effectiveness of the proposed method.

References

[1]
Shotaro Akaho. 2001. A kernel method for canonical correlation analysis. In International Meeting on Psychometric Society, 2001.
[2]
Galen Andrew, Raman Arora, Jeff A. Bilmes, and Karen Livescu. 2013. Deep Canonical Correlation Analysis. In Proceedings of the 30th International Conference on Machine Learning. 1247--1255.
[3]
Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Olivier Chapelle, and Kilian Q. Weinberger. 2010. Learning to rank with (a lot of) word features. Information Retrieval, Vol. 13, 3 (2010), 291--314.
[4]
Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. 2006. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. Journal of Machine Learning Research, Vol. 7 (2006), 2399--2434.
[5]
Avrim Blum and Shuchi Chawla. 2001. Learning from Labeled and Unlabeled Data using Graph Mincuts. In Proceedings of the 18th International Conference on Machine Learning. 19--26.
[6]
Avrim Blum and Tom M. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92--100.
[7]
Olivier Chapelle, Bernhard Schö lkopf, and Alexander Zien. 2006. Semi-supervised learning. MIT Press.
[8]
Olivier Chapelle and Alexander Zien. 2005. Semi-Supervised Classification by Low Density Separation. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics.
[9]
Dong-Dong Chen, Wei Wang, Wei Gao, and Zhi-Hua Zhou. 2018. Tri-net for Semi-Supervised Deep Learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2014--2020.
[10]
Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of operations research, Vol. 134, 1 (2005), 19--67.
[11]
Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 2121--2129.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[13]
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).
[14]
Harold Hotelling. 1936. Relations between two sets of variates. Biometrika, Vol. 28, 3/4 (1936), 321--377.
[15]
Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2014. Discriminative deep metric learning for face verification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1875--1882.
[16]
Thorsten Joachims. 1999. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of the 16th International Conference on Machine Learning. 200--209.
[17]
Durk P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in neural information processing systems. 3581--3589.
[18]
Ming Li and Zhi-Hua Zhou. 2007. Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples. IEEE Transactions on Systems, Man, and Cybernetics: Systems, Part A, Vol. 37, 6 (2007), 1088--1098.
[19]
Shao-Yuan Li, Yuan Jiang, and Zhi-Hua Zhou. 2014. Partial Multi-View Clustering. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1968--1974.
[20]
Xinwang Liu, Miaomiao Li, Lei Wang, Yong Dou, Jianping Yin, and En Zhu. 2017. Multiple Kernel k-Means with Incomplete Kernels. In Proceedings of the 31th AAAI Conference on Artificial Intelligence. 2259--2265.
[21]
Xinwang Liu, Xinzhong Zhu, Miaomiao Li, Lei Wang, Chang Tang, Jianping Yin, Dinggang Shen, Huaimin Wang, and Wen Gao. 2018. Late fusion incomplete multi-view clustering. IEEE transactions on pattern analysis and machine intelligence (2018).
[22]
R. Manmatha, Chao-Yuan Wu, Alexander J. Smola, and Philipp Krähenbühl. 2017. Sampling Matters in Deep Embedding Learning. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 2859--2867.
[23]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems. 3111--3119.
[24]
David J. Miller and Hasan S. Uyar. 1996. A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data. In Proceedings of the 9th International Conference on Neural Information Processing Systems. 571--577.
[25]
Jingchao Ni, Shiyu Chang, Xiao Liu, Wei Cheng, Haifeng Chen, Dongkuan Xu, and Xiang Zhang. 2018. Co-Regularized Deep Multi-Network Embedding. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. 469--478.
[26]
Feiping Nie, Guohao Cai, and Xuelong Li. 2017. Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours. In Proceedings of the 31th AAAI Conference on Artificial Intelligence. 2408--2414.
[27]
Kamal Nigam and Rayid Ghani. 2000. Analyzing the Effectiveness and Applicability of Co-training. In Proceedings of the 9th International Conference on Information and Knowledge Management. 86--93.
[28]
Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom M. Mitchell. 2000. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, Vol. 39, 2/3 (2000), 103--134.
[29]
Vahid Noroozi, Sara Bahaadini, Lei Zheng, Sihong Xie, Weixiang Shao, and Philip S Yu. 2018. Semi-supervised Deep Representation Learning for Multi-View Problems. arXiv preprint arXiv:1811.04480 (2018).
[30]
Qi Qian, Rong Jin, Jinfeng Yi, Lijun Zhang, and Shenghuo Zhu. 2015. Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD). Machine Learning, Vol. 99, 3 (2015), 353--372.
[31]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. 815--823.
[32]
William Robson Schwartz, Aniruddha Kembhavi, David Harwood, and Larry S. Davis. 2009. Human detection using partial least squares analysis. In Proceedings of the 2009 IEEE International Conference on Computer Vision. 24--31.
[33]
Behzad M. Shahshahani and David A. Landgrebe. 1994. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, Vol. 32, 5 (1994), 1087--1095.
[34]
Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. 2005. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of ICML workshop on learning with multiple views, Vol. 2005. Citeseer, 74--79.
[35]
Biljana L Risteska Stojkoska and Kire V Trivodaliev. 2017. A review of Internet of Things for smart home: Challenges and solutions. Journal of Cleaner Production, Vol. 140 (2017), 1454--1464.
[36]
Shiliang Sun. 2013. A survey of multi-view machine learning. Neural Computing and Applications, Vol. 23, 7--8 (2013), 2031--2038.
[37]
Hong Tao, Chenping Hou, Feiping Nie, Jubo Zhu, and Dongyun Yi. 2017. Scalable multi-view semi-supervised classification via adaptive regression. IEEE Transactions on Image Processing, Vol. 26, 9 (2017), 4283--4296.
[38]
Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. 2015. On deep multi-view representation learning. In International Conference on Machine Learning. 1083--1092.
[39]
Herman Wold. 1982. Soft modeling: the basic design and some extensions. Systems under indirect observation, Vol. 2 (1982), 343.
[40]
Chang Xu, Dacheng Tao, and Chao Xu. 2013. A Survey on Multi-view Learning. CoRR, Vol. abs/1304.5634 (2013).
[41]
Chang Xu, Dacheng Tao, and Chao Xu. 2015. Multi-view learning with incomplete views. IEEE Transactions on Image Processing, Vol. 24, 12 (2015), 5812--5825.
[42]
Yang Yang, De-Chuan Zhan, Xiang-Rong Sheng, and Yuan Jiang. 2018. Semi-Supervised Multi-Modal Learning with Incomplete Modalities. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2998--3004.
[43]
Qiyue Yin, Shu Wu, and Liang Wang. 2015. Incomplete multi-view clustering via subspace learning. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 383--392.
[44]
Bin Zhang, Qianyao Qiang, Fei Wang, and Feiping Nie. 2020. Fast Multi-view Semi-supervised Learning with Learned Graph. IEEE Transactions on Knowledge and Data Engineering (2020).
[45]
Bin Zhou, Wentao Li, Ka Wing Chan, Yijia Cao, Yonghong Kuang, Xi Liu, and Xiong Wang. 2016. Smart home energy management systems: Concept, configurations, and scheduling strategies. Renewable and Sustainable Energy Reviews, Vol. 61 (2016), 30--40.
[46]
Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schö lkopf. 2003. Learning with Local and Global Consistency. In Proceedings of the 16th International Conference on Neural Information Processing Systems. 321--328.
[47]
Zhi-Hua Zhou, Ke-Jia Chen, and Hong-Bin Dai. 2006. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems, Vol. 24, 2 (2006), 219--244.
[48]
Zhi-Hua Zhou and Ming Li. 2005 a. Semi-Supervised Regression with Co-Training. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 908--916.
[49]
Zhi-Hua Zhou and Ming Li. 2005 b. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, Vol. 17, 11 (2005), 1529--1541.
[50]
Zhi-Hua Zhou and Ming Li. 2010. Semi-supervised learning by disagreement. Knowledge and Information Systems, Vol. 24, 3 (2010), 415--439.
[51]
Zhi-Hua Zhou, De-Chuan Zhan, and Qiang Yang. 2007. Semi-Supervised Learning with Very Few Labeled Training Examples. In Proceedings of the 22th AAAI Conference on Artificial Intelligence. 675--680.
[52]
Xiaojin Zhu. 2007. Semi-Supervised Learning Literature Survey. Technical Report 1530. University of Wisconsin-Madison.

Cited By

View all
  • (2024)Feasibility of continuous smart health monitoring in pregnant population: A mixed-method approachPLOS Digital Health10.1371/journal.pdig.00005173:6(e0000517)Online publication date: 5-Jun-2024
  • (2024)Modeling Multiple Aesthetic Views for Series Photo SelectionIEEE Transactions on Multimedia10.1109/TMM.2023.329075126(1983-1995)Online publication date: 1-Jan-2024
  • (2024)Multimodal Temporal Fusion Transformers are Good Product Demand ForecastersIEEE MultiMedia10.1109/MMUL.2024.337382731:2(48-60)Online publication date: 1-Apr-2024
  • Show More Cited By

Index Terms

  1. Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards the Screenless Retailing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multi-modal enhancement
    2. new shopping interaction
    3. screenless retail
    4. semi-supervised learning

    Qualifiers

    • Research-article

    Funding Sources

    • The Fundamental Research Funds of Shandong University

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Feasibility of continuous smart health monitoring in pregnant population: A mixed-method approachPLOS Digital Health10.1371/journal.pdig.00005173:6(e0000517)Online publication date: 5-Jun-2024
    • (2024)Modeling Multiple Aesthetic Views for Series Photo SelectionIEEE Transactions on Multimedia10.1109/TMM.2023.329075126(1983-1995)Online publication date: 1-Jan-2024
    • (2024)Multimodal Temporal Fusion Transformers are Good Product Demand ForecastersIEEE MultiMedia10.1109/MMUL.2024.337382731:2(48-60)Online publication date: 1-Apr-2024
    • (2023)Missingness-Pattern-Adaptive Learning With Incomplete DataIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326278445:9(11053-11066)Online publication date: 1-Sep-2023
    • (2023)Forecasting Fine-Grained Urban Flows Via Spatio-Temporal Contrastive Self-SupervisionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320073435:8(8008-8023)Online publication date: 1-Aug-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media