research-article

Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards the Screenless Retailing

Authors:

Dong-Dong Chen,

Zhihua ZhouAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1120 - 1128

https://doi.org/10.1145/3474085.3481538

Published: 17 October 2021 Publication History

Abstract

Nowadays, almost all the online orders were placed through screened devices such as mobile phones, tablets, and computers. With the rapid development of the Internet of Things (IoT) and smart appliances, more and more screenless smart devices, e.g., smart speaker and smart refrigerator, appear in our daily lives. They open up new means of interaction and may provide an excellent opportunity to reach new customers and increase sales. However, not all the items are suitable for screenless shopping, since some items' appearance play an important role in consumer decision making. Typical examples include clothes, dolls, bags, and shoes. In this paper, we aim to infer the significance of every item's appearance in consumer decision making and identify the group of items that are suitable for screenless shopping. Specifically, we formulate the problem as a classification task that predicts if an item's appearance has a significant impact on people's purchase behavior. To solve this problem, we extract multi-modal features from three different views, and collect a set of necessary labels via crowdsourcing. We then propose an iterative semi-supervised learning framework with a carefully designed multi-modal enhancement module. Experimental results verify the effectiveness of the proposed method.

References

[1]

Shotaro Akaho. 2001. A kernel method for canonical correlation analysis. In International Meeting on Psychometric Society, 2001.

[2]

Galen Andrew, Raman Arora, Jeff A. Bilmes, and Karen Livescu. 2013. Deep Canonical Correlation Analysis. In Proceedings of the 30th International Conference on Machine Learning. 1247--1255.

Digital Library

[3]

Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Olivier Chapelle, and Kilian Q. Weinberger. 2010. Learning to rank with (a lot of) word features. Information Retrieval, Vol. 13, 3 (2010), 291--314.

Digital Library

[4]

Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. 2006. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. Journal of Machine Learning Research, Vol. 7 (2006), 2399--2434.

Digital Library

[5]

Avrim Blum and Shuchi Chawla. 2001. Learning from Labeled and Unlabeled Data using Graph Mincuts. In Proceedings of the 18th International Conference on Machine Learning. 19--26.

Digital Library

[6]

Avrim Blum and Tom M. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92--100.

Digital Library

[7]

Olivier Chapelle, Bernhard Schö lkopf, and Alexander Zien. 2006. Semi-supervised learning. MIT Press.

Digital Library

[8]

Olivier Chapelle and Alexander Zien. 2005. Semi-Supervised Classification by Low Density Separation. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics.

[9]

Dong-Dong Chen, Wei Wang, Wei Gao, and Zhi-Hua Zhou. 2018. Tri-net for Semi-Supervised Deep Learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2014--2020.

Digital Library

[10]

Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of operations research, Vol. 134, 1 (2005), 19--67.

[11]

Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 2121--2129.

Digital Library

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[13]

Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).

[14]

Harold Hotelling. 1936. Relations between two sets of variates. Biometrika, Vol. 28, 3/4 (1936), 321--377.

[15]

Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2014. Discriminative deep metric learning for face verification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1875--1882.

Digital Library

[16]

Thorsten Joachims. 1999. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of the 16th International Conference on Machine Learning. 200--209.

Digital Library

[17]

Durk P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in neural information processing systems. 3581--3589.

Digital Library

[18]

Ming Li and Zhi-Hua Zhou. 2007. Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples. IEEE Transactions on Systems, Man, and Cybernetics: Systems, Part A, Vol. 37, 6 (2007), 1088--1098.

Digital Library

[19]

Shao-Yuan Li, Yuan Jiang, and Zhi-Hua Zhou. 2014. Partial Multi-View Clustering. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1968--1974.

Digital Library

[20]

Xinwang Liu, Miaomiao Li, Lei Wang, Yong Dou, Jianping Yin, and En Zhu. 2017. Multiple Kernel k-Means with Incomplete Kernels. In Proceedings of the 31th AAAI Conference on Artificial Intelligence. 2259--2265.

Digital Library

[21]

Xinwang Liu, Xinzhong Zhu, Miaomiao Li, Lei Wang, Chang Tang, Jianping Yin, Dinggang Shen, Huaimin Wang, and Wen Gao. 2018. Late fusion incomplete multi-view clustering. IEEE transactions on pattern analysis and machine intelligence (2018).

[22]

R. Manmatha, Chao-Yuan Wu, Alexander J. Smola, and Philipp Krähenbühl. 2017. Sampling Matters in Deep Embedding Learning. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 2859--2867.

[23]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems. 3111--3119.

Digital Library

[24]

David J. Miller and Hasan S. Uyar. 1996. A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data. In Proceedings of the 9th International Conference on Neural Information Processing Systems. 571--577.

Digital Library

[25]

Jingchao Ni, Shiyu Chang, Xiao Liu, Wei Cheng, Haifeng Chen, Dongkuan Xu, and Xiang Zhang. 2018. Co-Regularized Deep Multi-Network Embedding. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. 469--478.

Digital Library

[26]

Feiping Nie, Guohao Cai, and Xuelong Li. 2017. Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours. In Proceedings of the 31th AAAI Conference on Artificial Intelligence. 2408--2414.

Digital Library

[27]

Kamal Nigam and Rayid Ghani. 2000. Analyzing the Effectiveness and Applicability of Co-training. In Proceedings of the 9th International Conference on Information and Knowledge Management. 86--93.

Digital Library

[28]

Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom M. Mitchell. 2000. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, Vol. 39, 2/3 (2000), 103--134.

Digital Library

[29]

Vahid Noroozi, Sara Bahaadini, Lei Zheng, Sihong Xie, Weixiang Shao, and Philip S Yu. 2018. Semi-supervised Deep Representation Learning for Multi-View Problems. arXiv preprint arXiv:1811.04480 (2018).

[30]

Qi Qian, Rong Jin, Jinfeng Yi, Lijun Zhang, and Shenghuo Zhu. 2015. Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD). Machine Learning, Vol. 99, 3 (2015), 353--372.

Digital Library

[31]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. 815--823.

[32]

William Robson Schwartz, Aniruddha Kembhavi, David Harwood, and Larry S. Davis. 2009. Human detection using partial least squares analysis. In Proceedings of the 2009 IEEE International Conference on Computer Vision. 24--31.

[33]

Behzad M. Shahshahani and David A. Landgrebe. 1994. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, Vol. 32, 5 (1994), 1087--1095.

[34]

Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. 2005. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of ICML workshop on learning with multiple views, Vol. 2005. Citeseer, 74--79.

[35]

Biljana L Risteska Stojkoska and Kire V Trivodaliev. 2017. A review of Internet of Things for smart home: Challenges and solutions. Journal of Cleaner Production, Vol. 140 (2017), 1454--1464.

[36]

Shiliang Sun. 2013. A survey of multi-view machine learning. Neural Computing and Applications, Vol. 23, 7--8 (2013), 2031--2038.

[37]

Hong Tao, Chenping Hou, Feiping Nie, Jubo Zhu, and Dongyun Yi. 2017. Scalable multi-view semi-supervised classification via adaptive regression. IEEE Transactions on Image Processing, Vol. 26, 9 (2017), 4283--4296.

Digital Library

[38]

Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. 2015. On deep multi-view representation learning. In International Conference on Machine Learning. 1083--1092.

Digital Library

[39]

Herman Wold. 1982. Soft modeling: the basic design and some extensions. Systems under indirect observation, Vol. 2 (1982), 343.

[40]

Chang Xu, Dacheng Tao, and Chao Xu. 2013. A Survey on Multi-view Learning. CoRR, Vol. abs/1304.5634 (2013).

[41]

Chang Xu, Dacheng Tao, and Chao Xu. 2015. Multi-view learning with incomplete views. IEEE Transactions on Image Processing, Vol. 24, 12 (2015), 5812--5825.

Digital Library

[42]

Yang Yang, De-Chuan Zhan, Xiang-Rong Sheng, and Yuan Jiang. 2018. Semi-Supervised Multi-Modal Learning with Incomplete Modalities. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2998--3004.

Digital Library

[43]

Qiyue Yin, Shu Wu, and Liang Wang. 2015. Incomplete multi-view clustering via subspace learning. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 383--392.

Digital Library

[44]

Bin Zhang, Qianyao Qiang, Fei Wang, and Feiping Nie. 2020. Fast Multi-view Semi-supervised Learning with Learned Graph. IEEE Transactions on Knowledge and Data Engineering (2020).

Digital Library

[45]

Bin Zhou, Wentao Li, Ka Wing Chan, Yijia Cao, Yonghong Kuang, Xi Liu, and Xiong Wang. 2016. Smart home energy management systems: Concept, configurations, and scheduling strategies. Renewable and Sustainable Energy Reviews, Vol. 61 (2016), 30--40.

[46]

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schö lkopf. 2003. Learning with Local and Global Consistency. In Proceedings of the 16th International Conference on Neural Information Processing Systems. 321--328.

Digital Library

[47]

Zhi-Hua Zhou, Ke-Jia Chen, and Hong-Bin Dai. 2006. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems, Vol. 24, 2 (2006), 219--244.

Digital Library

[48]

Zhi-Hua Zhou and Ming Li. 2005 a. Semi-Supervised Regression with Co-Training. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 908--916.

Digital Library

[49]

Zhi-Hua Zhou and Ming Li. 2005 b. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, Vol. 17, 11 (2005), 1529--1541.

Digital Library

[50]

Zhi-Hua Zhou and Ming Li. 2010. Semi-supervised learning by disagreement. Knowledge and Information Systems, Vol. 24, 3 (2010), 415--439.

Digital Library

[51]

Zhi-Hua Zhou, De-Chuan Zhan, and Qiang Yang. 2007. Semi-Supervised Learning with Very Few Labeled Training Examples. In Proceedings of the 22th AAAI Conference on Artificial Intelligence. 675--680.

Digital Library

[52]

Xiaojin Zhu. 2007. Semi-Supervised Learning Literature Survey. Technical Report 1530. University of Wisconsin-Madison.

Cited By

Sharifi-Heris ZFortier MRahmani ASharifiheris HBender M(2024)Feasibility of continuous smart health monitoring in pregnant population: A mixed-method approachPLOS Digital Health10.1371/journal.pdig.00005173:6(e0000517)Online publication date: 5-Jun-2024
https://doi.org/10.1371/journal.pdig.0000517
Huang JGong YZhang LZhang JNie LYin Y(2024)Modeling Multiple Aesthetic Views for Series Photo SelectionIEEE Transactions on Multimedia10.1109/TMM.2023.329075126(1983-1995)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3290751
Sukel MRudinac SWorring M(2024)Multimodal Temporal Fusion Transformers are Good Product Demand ForecastersIEEE MultiMedia10.1109/MMUL.2024.337382731:2(48-60)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/MMUL.2024.3373827
Show More Cited By

Index Terms

Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards the Screenless Retailing
1. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Stacked co-training for semi-supervised multi-label learning
Abstract
Due to the difficulty of annotation, multi-label learning sometimes obtains a small amount of labeled data and a large amount of unlabeled data as supplements. To make up this issue, many algorithms extended the existing semi-supervised ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Fundamental Research Funds of Shandong University

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
149
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sharifi-Heris ZFortier MRahmani ASharifiheris HBender M(2024)Feasibility of continuous smart health monitoring in pregnant population: A mixed-method approachPLOS Digital Health10.1371/journal.pdig.00005173:6(e0000517)Online publication date: 5-Jun-2024
https://doi.org/10.1371/journal.pdig.0000517
Huang JGong YZhang LZhang JNie LYin Y(2024)Modeling Multiple Aesthetic Views for Series Photo SelectionIEEE Transactions on Multimedia10.1109/TMM.2023.329075126(1983-1995)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3290751
Sukel MRudinac SWorring M(2024)Multimodal Temporal Fusion Transformers are Good Product Demand ForecastersIEEE MultiMedia10.1109/MMUL.2024.337382731:2(48-60)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/MMUL.2024.3373827
Gong YLi ZLiu WLu XLiu XTsang IYin Y(2023)Missingness-Pattern-Adaptive Learning With Incomplete DataIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326278445:9(11053-11066)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3262784
Qu HGong YChen MZhang JZheng YYin Y(2023)Forecasting Fine-Grained Urban Flows Via Spatio-Temporal Contrastive Self-SupervisionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320073435:8(8008-8023)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3200734

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten