research-article

Product Bundle Identification using Semi-Supervised Learning

Authors:

Asnat Greenstein-Messica,

Bracha ShapiraAuthors Info & Claims

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 791 - 800

https://doi.org/10.1145/3397271.3401128

Published: 25 July 2020 Publication History

Abstract

Many sellers on e-commerce platforms offer buyers product bundles, which package together two or more different items. The identification of such bundles is a necessary step to support a variety of related services, from recommendation to dynamic pricing. In this work, we present a comprehensive study of bundle identification on a large e-commerce website. Our analysis of bundle compared to non-bundle listed items reveals several key differentiating characteristics, spanning the listing's title, image, and attributes. Following, we experiment with a multi-modal classifier, which takes advantage of these characteristics as features. Our analysis also shows that a bundle indicator input by sellers tends to be highly noisy and carries only a weak signal. The bundle identification task therefore faces the challenge of having a small set of manually-labeled clean examples and a larger set of noisy-labeled examples, in conjunction with class imbalance due to the relative scarcity of bundles.

Our experiments with basic supervised classifiers, using the manually-labeled and/or the noisy-labeled data for training, demonstrates only moderate performance. We therefore turn to a semisupervised approach and propose GREED, a self-training ensemblebased algorithm with a greedy model selection. Our evaluation over two different meta-categories shows a superior performance of semi-supervised approaches for the bundle identification task, with GREED outperforming several semi-supervised alternatives. The combination of textual, image, and some metadata features is shown to yield the best performance, reaching an AUC of 0.89 and 0.92 for the two meta-categories, respectively

References

[1]

William James Adams and Janet L Yellen. 1976. Commodity Bundling and the Burden of Monopoly. The quarterly journal of economics, Vol. 90, 3 (1976), 475--498.

[2]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In Proc. of ICLR.

[3]

Jinze Bai, Chang Zhou, Junshuai Song, Xiaoru Qu, Weiting An, Zhao Li, and Jun Gao. 2019. Personalized Bundle List Recommendation. In Proc. of WWW. 60--71.

Digital Library

[4]

Arash Beheshtian-Ardakani, Mohammad Fathian, and Mohammadreza Gholamian. 2018. A Novel Model for Product Bundling and Direct Marketing in E-Commerce based on Market Segmentation. Decision Science Letters, Vol. 7, 1 (2018), 39--54.

[5]

Moran Beladev, Lior Rokach, and Bracha Shapira. 2016. Recommender Systems for Product Bundling. Knowledge-Based Systems, Vol. 111 (2016), 193--206.

Digital Library

[6]

Adam Berger and John Lafferty. 1999. Information Retrieval as Statistical Translation. In Proc. of SIGIR. 222--229.

Digital Library

[7]

Jakramate Bootkrajang and Ata Kabán. 2012. Label-Noise Robust Logistic Regression and its Applications. In Proc. of ECML-PKDD. 143--158.

[8]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. JAIR, Vol. 16 (2002), 321--357.

[9]

Nitesh V Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter, Vol. 6, 1 (2004), 1--6.

Digital Library

[10]

Kevin Clark, Minh-Thang Luong, Christopher D Manning, and Quoc V Le. 2018. Semi-Supervised Sequence Modeling with Cross-View Training. arXiv preprint, Vol. abs/1809.08370 (2018).

[11]

Paolo Dragone, Giovanni Pellegrini, Michele Vescovi, Katya Tentori, and Andrea Passerini. 2018. No More Ready-made Deals: Constructive Recommendation for Telco Service Bundling. In Proc. of RecSys. 163--171.

Digital Library

[12]

Guy Elad, Ido Guy, Slava Novgorodov, Benny Kimelfeld, and Kira Radinsky. 2019. Learning to Generate Personalized Product Descriptions. In Proc. of CIKM. 389--398.

Digital Library

[13]

Yan Fang, Xinyue Xiao, Xiaoyu Wang, and Huiqing Lan. 2018. Customized Bundle Recommendation by Association Rules of Product Categories for Online Supermarkets. In Proc. of DSC. 472--475.

[14]

Noelia Oses Fernandez, Jon Kepa Gerrikagoitia, and Aurkene Alzua-Sorzabal. 2015. Dynamic Pricing Patterns on an Internet Distribution Channel: The Case Study of Bilbao's Hotels in 2013. In Information and Communication Technologies in Tourism 2015. 735--747.

[15]

Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, Vol. 76, 5 (1971), 378--382.

[16]

Benoit Frenay and Michel Verleysen. 2014. Classification in the Presence of Label Noise: A Survey. IEEE transactions on neural networks and learning systems, Vol. 25, 5 (2014), 845--869.

[17]

Dragan Gamberger, Nada Lavrac, and Saso Dzeroski. 2000. Noise Detection and Elimination in Data Preprocessing: Experiments in Medical Domains. Applied Artificial Intelligence, Vol. 14, 2 (2000), 205--223.

[18]

Robert Garfinkel, Ram Gopal, Arvind Tripathi, and Fang Yin. 2006. Design of a Shopbot and Recommender System for Bundle Purchases. Decision Support Systems, Vol. 42, 3 (2006), 1974--1986.

Digital Library

[19]

Xinyu Ge, Yousha Zhang, Yu Qian, and Hua Yuan. 2017. Effects of Product Characteristics on the Bundling Strategy Implemented by Recommendation Systems. In Proc. of ICSSSM. 1--6.

[20]

Anindya Ghose, Arun Sundararajan, et almbox. 2006. Evaluating Pricing Strategy using E-Commerce Data: Evidence and Estimation Challenges. Statist. Sci., Vol. 21, 2 (2006), 131--142.

[21]

Jacob Goldberger and Ehud Ben-Reuven. 2017. Training Deep Neural-Networks using a Noise Adaptation Layer. In Proc. of ICLR.

[22]

Ido Guy and Bracha Shapira. 2018. From Royals to Vegans: Characterizing Question Trolling on a Community Question Answering Website. In Proc. of SIGIR. 835--844.

Digital Library

[23]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-Teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels. In Proc. of NIPS. 8536--8546.

[24]

Ward Hanson and R Kipp Martin. 1990. Optimal Bundle Pricing. Management Science, Vol. 36, 2 (1990), 155--174.

Digital Library

[25]

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proc. of IJCNN. 1322--1328.

[26]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. of CVPR. 770--778.

[27]

Sharon Hirsch, Ido Guy, Alexander Nus, Arnon Dagan, and Oren Kurland. 2020. Query Reformulation in E-Commerce Search. In Proc. of SIGIR.

Digital Library

[28]

Ruiqi Hu, Shirui Pan, Jing Jiang, and Guodong Long. 2017. Graph ladder networks for network classification. In Proc. of CIKM. 2103--2106.

Digital Library

[29]

Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. 2019. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5070--5079.

[30]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv preprint, Vol. abs/1607.01759 (2016).

[31]

Anitha Kannan, Inmar E. Givoni, Rakesh Agrawal, and Ariel Fuxman. 2011. Matching Unstructured Product Offers to Structured Product Specifications. In Proc. of KDD. 404--412.

Digital Library

[32]

Guannan Liu, Yanjie Fu, Guoqing Chen, Hui Xiong, and Can Chen. 2017. Modeling Buying Motives for Personalized Product Bundle Recommendation. ACM Trans. Knowl. Discov. Data, Vol. 11, 3 (2017), 28:1--28:26.

Digital Library

[33]

Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, and Hui Xiong. 2011. Personalized Travel Package Recommendation. In Proc. of ICDM. 407--416.

Digital Library

[34]

Prem Melville and Raymond J Mooney. 2003. Constructing Diverse Classifier Ensembles using Artificial Training Examples. In Proc. of IJCAI, Vol. 3. 505--510.

[35]

Yu Meng, Jiaming Shen, Chao Zhang, and Jiawei Han. 2018. Weakly-Supervised Neural Text Classification. In Proc. of CIKM. 983--992.

Digital Library

[36]

Ajinkya More. 2016. Attribute Extraction from Product Titles in eCommerce. arXiv preprint, Vol. abs/1608.04670 (2016).

[37]

Ajay Nagesh and Mihai Surdeanu. 2018. Keep your bearings: Lightly-supervised information extraction with ladder networks that avoids semantic drift. In Proc. of NAACL-HLT. 352--358.

[38]

Kamal Nigam and Rayid Ghani. 2000. Analyzing the Effectiveness and Applicability of Co-Training. In Proc. of CIKM. 86--93.

Digital Library

[39]

Slava Novgorodov, Ido Guy, Guy Elad, and Kira Radinsky. 2019. Generating Product Descriptions from User Reviews. In Proc. of WWW. 1354--1364.

Digital Library

[40]

Apurva Pathak, Kshitiz Gupta, and Julian McAuley. 2017. Generating and Personalizing Bundle Recommendations on Steam. In Proc. of SIGIR. 1073--1076.

Digital Library

[41]

Fabian Pedregosa et al. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., Vol. 12 (2011), 2825--2830.

[42]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proc. of EMNLP. 1532--1543.

[43]

Utkarsh Porwal. 2019. Learning Image Information for eCommerce Queries. arXiv preprint, Vol. abs/1904.12856 (2019).

[44]

Shuyao Qi, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2016. Recommending Packages to Groups. In Proc. of ICDM. 449--458.

[45]

Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In Advances in neural information processing systems. 3546--3554.

[46]

Lior Rokach. 2010. Ensemble-based Classifiers. Artificial Intelligence Review, Vol. 33, 1--2 (2010), 1--39.

Digital Library

[47]

Jennifer Rowley. 2000. Product Search in e-Shopping: a Review and Research Propositions. Journal of consumer marketing, Vol. 17, 1 (2000), 20--35.

[48]

Karan Samel and Xu Miao. 2018. Active Deep Learning to Tune Down the Noise in Labels. In Proc. of KDD. 685--694.

Digital Library

[49]

J Ben Schafer, Joseph Konstan, and John Riedl. 1999. Recommender Systems in E-Commerce. In Proc. of EC. 158--166.

Digital Library

[50]

J Ben Schafer, Joseph A Konstan, and John Riedl. 2001. E-commerce Recommendation Applications. Data Mining and Knowledge Discovery, Vol. 5, 1--2 (2001), 115--153.

Digital Library

[51]

Dimitris Serbos, Shuyao Qi, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2017. Fairness in Package-to-Group Recommendations. In Proc. of WWW. 371--379.

Digital Library

[52]

Kashif Shah, Selcuk Kopru, and Jean David Ruvini. 2018. Neural Network Based Extreme Classification and Similarity Models for Product Matching. In Proc. of NAACL:HLT. 8--15.

[53]

Stefan Stremersch and Gerard J Tellis. 2002. Strategic Bundling of Products and Prices: A New Synthesis for Marketing. Journal of marketing, Vol. 66, 1 (2002), 55--72.

[54]

Jian-Hua Tao, Jian Huang, Ya Li, Zheng Lian, and Ming-Yue Niu. 2019. Semi-supervised ladder networks for speech emotion recognition. International Journal of Automation and Computing, Vol. 16, 4 (2019), 437--448.

Digital Library

[55]

Min Xie, Laks V. S. Lakshmanan, and Peter T. Wood. 2014. Generating Top-k Packages via Preference Elicitation. Proc. VLDB Endow., Vol. 7, 14 (2014), 1941--1952.

Digital Library

[56]

Manjit S Yadav. 1994. How Buyers Evaluate Product Bundles: A Model of Anchoring and Adjustment. Journal of Consumer Research, Vol. 21, 2 (1994), 342--353.

[57]

De-Nian Yang, Wang-Chien Lee, Nai-Hui Chia, Mao Ye, and Hui-Ju Hung. 2012. On Bundle Configuration for Viral Marketing in Social Networks. In Proc. of CIKM. 2234--2238.

Digital Library

[58]

Show-Jane Yen and Yue-Shi Lee. 2009. Cluster-based Under-Sampling Approaches for Imbalanced Data Distributions. Expert Systems with Applications, Vol. 36, 3 (2009), 5718--5727.

Digital Library

[59]

Jing Zhang, Xindong Wu, and Victor S Shengs. 2015. Active Learning with Imbalanced Multiple Noisy Labeling. IEEE Transactions on Cybernetics, Vol. 45, 5 (2015), 1095--1107.

[60]

Tao Zhu, Patrick Harrington, Junjun Li, and Lei Tang. 2014. Bundle Recommendation in Ecommerce. In Proc. of SIGIR. 657--666.

Digital Library

Cited By

Chen XWang TGuo TGuo KZhou JLi HSong ZGao XZhang X(2025)Unveiling the power of language models in chemical research question answeringCommunications Chemistry10.1038/s42004-024-01394-x8:1Online publication date: 5-Jan-2025
https://doi.org/10.1038/s42004-024-01394-x
Sun ZFeng KYang JFang HQu XOng YLiu W(2024)Revisiting Bundle Recommendation for Intent-aware Product BundlingACM Transactions on Recommender Systems10.1145/3652865Online publication date: 15-Mar-2024
https://doi.org/10.1145/3652865
Carloman ABermudo UEstilloso EE.Llantos O(2024)Bundle AIProcedia Computer Science10.1016/j.procs.2023.12.153231:C(24-31)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.procs.2023.12.153
Show More Cited By

Index Terms

Product Bundle Identification using Semi-Supervised Learning
1. Information systems
  1. World Wide Web
    1. Web applications
      1. Electronic commerce
        Online shopping
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Semi-supervised learning

Recommendations

Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

Text categorization is one of the fundamental tasks in text mining. Classical supervised methods need lot of labeled data to train a classifier. Since assigning labels to the large amount of data is very costly and time consuming, it is useful to use ...
Instance selection in semi-supervised learning
Canadian AI'11: Proceedings of the 24th Canadian conference on Advances in artificial intelligence

Semi-supervised learning methods utilize abundant unlabeled data to help to learn a better classifier when the number of labeled instances is very small. A common method is to select and label unlabeled instances that the current classifier has high ...
Semi-supervised Ensemble Learning Using Label Propagation
CIT '12: Proceedings of the 2012 IEEE 12th International Conference on Computer and Information Technology

Ensemble learning has been widely used in data mining and pattern recognition. However, when the number of labeled data samples is very small, it is difficult to train a base classifier for ensemble learning, therefore, it is necessary to utilize an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2020

2548 pages

ISBN:9781450380164

DOI:10.1145/3397271

General Chairs:
Jimmy Huang
York University, Canada
,
Yi Chang
Jilin University, China
,
Xueqi Cheng
Chinese Academy of Sciences, China
,
Program Chairs:
Jaap Kamps
University of Amsterdam, Netherlands
,
Vanessa Murdock
Amazon, U.S.A.
,
Ji-Rong Wen
Renmin University of China, China
,
Yiqun Liu
Tsinghua University, China

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '20

Sponsor:

SIGIR

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval

July 25 - 30, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
546
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)4

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen XWang TGuo TGuo KZhou JLi HSong ZGao XZhang X(2025)Unveiling the power of language models in chemical research question answeringCommunications Chemistry10.1038/s42004-024-01394-x8:1Online publication date: 5-Jan-2025
https://doi.org/10.1038/s42004-024-01394-x
Sun ZFeng KYang JFang HQu XOng YLiu W(2024)Revisiting Bundle Recommendation for Intent-aware Product BundlingACM Transactions on Recommender Systems10.1145/3652865Online publication date: 15-Mar-2024
https://doi.org/10.1145/3652865
Carloman ABermudo UEstilloso EE.Llantos O(2024)Bundle AIProcedia Computer Science10.1016/j.procs.2023.12.153231:C(24-31)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.procs.2023.12.153
Yang WYang CLi JTan YLu XShi C(2024)Non-autoregressive personalized bundle generationInformation Processing & Management10.1016/j.ipm.2024.10381461:5(103814)Online publication date: Sep-2024
https://doi.org/10.1016/j.ipm.2024.103814
Garrido-Labrador JSerrano-Mamolar AMaudes-Raedo JRodríguez JGarcía-Osorio C(2024)Ensemble methods and semi-supervised learning for information fusionInformation Fusion10.1016/j.inffus.2024.102310107:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.inffus.2024.102310
Nguyen HBui TNguyen LHoang HThi Nguyen CLe HLe D(2024)Bundle Recommendation with Item-Level Causation-Enhanced Multi-view LearningMachine Learning and Knowledge Discovery in Databases. Research Track and Demo Track10.1007/978-3-031-70371-3_19(324-341)Online publication date: 22-Aug-2024
https://doi.org/10.1007/978-3-031-70371-3_19
Yang WLi JTan STan YLu X(2023)A heterogeneous graph neural network model for list recommendationKnowledge-Based Systems10.1016/j.knosys.2023.110822277(110822)Online publication date: Oct-2023
https://doi.org/10.1016/j.knosys.2023.110822
Taha K(2023)Semi-supervised and un-supervised clusteringInformation Systems10.1016/j.is.2023.102178114:COnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.is.2023.102178
Avny Brosh TLivne ASar Shalom OShapira BLast M(2022)BRUCE: Bundle Recommendation Using Contextualized item EmbeddingsProceedings of the 16th ACM Conference on Recommender Systems10.1145/3523227.3546754(237-245)Online publication date: 12-Sep-2022
https://dl.acm.org/doi/10.1145/3523227.3546754
Sun ZYang JFeng KFang HQu XOng YAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Revisiting Bundle RecommendationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531904(2900-2911)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531904
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten