skip to main content
10.1145/3397271.3401128acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Product Bundle Identification using Semi-Supervised Learning

Published: 25 July 2020 Publication History

Abstract

Many sellers on e-commerce platforms offer buyers product bundles, which package together two or more different items. The identification of such bundles is a necessary step to support a variety of related services, from recommendation to dynamic pricing. In this work, we present a comprehensive study of bundle identification on a large e-commerce website. Our analysis of bundle compared to non-bundle listed items reveals several key differentiating characteristics, spanning the listing's title, image, and attributes. Following, we experiment with a multi-modal classifier, which takes advantage of these characteristics as features. Our analysis also shows that a bundle indicator input by sellers tends to be highly noisy and carries only a weak signal. The bundle identification task therefore faces the challenge of having a small set of manually-labeled clean examples and a larger set of noisy-labeled examples, in conjunction with class imbalance due to the relative scarcity of bundles.
Our experiments with basic supervised classifiers, using the manually-labeled and/or the noisy-labeled data for training, demonstrates only moderate performance. We therefore turn to a semisupervised approach and propose GREED, a self-training ensemblebased algorithm with a greedy model selection. Our evaluation over two different meta-categories shows a superior performance of semi-supervised approaches for the bundle identification task, with GREED outperforming several semi-supervised alternatives. The combination of textual, image, and some metadata features is shown to yield the best performance, reaching an AUC of 0.89 and 0.92 for the two meta-categories, respectively

References

[1]
William James Adams and Janet L Yellen. 1976. Commodity Bundling and the Burden of Monopoly. The quarterly journal of economics, Vol. 90, 3 (1976), 475--498.
[2]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In Proc. of ICLR.
[3]
Jinze Bai, Chang Zhou, Junshuai Song, Xiaoru Qu, Weiting An, Zhao Li, and Jun Gao. 2019. Personalized Bundle List Recommendation. In Proc. of WWW. 60--71.
[4]
Arash Beheshtian-Ardakani, Mohammad Fathian, and Mohammadreza Gholamian. 2018. A Novel Model for Product Bundling and Direct Marketing in E-Commerce based on Market Segmentation. Decision Science Letters, Vol. 7, 1 (2018), 39--54.
[5]
Moran Beladev, Lior Rokach, and Bracha Shapira. 2016. Recommender Systems for Product Bundling. Knowledge-Based Systems, Vol. 111 (2016), 193--206.
[6]
Adam Berger and John Lafferty. 1999. Information Retrieval as Statistical Translation. In Proc. of SIGIR. 222--229.
[7]
Jakramate Bootkrajang and Ata Kabán. 2012. Label-Noise Robust Logistic Regression and its Applications. In Proc. of ECML-PKDD. 143--158.
[8]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. JAIR, Vol. 16 (2002), 321--357.
[9]
Nitesh V Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter, Vol. 6, 1 (2004), 1--6.
[10]
Kevin Clark, Minh-Thang Luong, Christopher D Manning, and Quoc V Le. 2018. Semi-Supervised Sequence Modeling with Cross-View Training. arXiv preprint, Vol. abs/1809.08370 (2018).
[11]
Paolo Dragone, Giovanni Pellegrini, Michele Vescovi, Katya Tentori, and Andrea Passerini. 2018. No More Ready-made Deals: Constructive Recommendation for Telco Service Bundling. In Proc. of RecSys. 163--171.
[12]
Guy Elad, Ido Guy, Slava Novgorodov, Benny Kimelfeld, and Kira Radinsky. 2019. Learning to Generate Personalized Product Descriptions. In Proc. of CIKM. 389--398.
[13]
Yan Fang, Xinyue Xiao, Xiaoyu Wang, and Huiqing Lan. 2018. Customized Bundle Recommendation by Association Rules of Product Categories for Online Supermarkets. In Proc. of DSC. 472--475.
[14]
Noelia Oses Fernandez, Jon Kepa Gerrikagoitia, and Aurkene Alzua-Sorzabal. 2015. Dynamic Pricing Patterns on an Internet Distribution Channel: The Case Study of Bilbao's Hotels in 2013. In Information and Communication Technologies in Tourism 2015. 735--747.
[15]
Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, Vol. 76, 5 (1971), 378--382.
[16]
Benoit Frenay and Michel Verleysen. 2014. Classification in the Presence of Label Noise: A Survey. IEEE transactions on neural networks and learning systems, Vol. 25, 5 (2014), 845--869.
[17]
Dragan Gamberger, Nada Lavrac, and Saso Dzeroski. 2000. Noise Detection and Elimination in Data Preprocessing: Experiments in Medical Domains. Applied Artificial Intelligence, Vol. 14, 2 (2000), 205--223.
[18]
Robert Garfinkel, Ram Gopal, Arvind Tripathi, and Fang Yin. 2006. Design of a Shopbot and Recommender System for Bundle Purchases. Decision Support Systems, Vol. 42, 3 (2006), 1974--1986.
[19]
Xinyu Ge, Yousha Zhang, Yu Qian, and Hua Yuan. 2017. Effects of Product Characteristics on the Bundling Strategy Implemented by Recommendation Systems. In Proc. of ICSSSM. 1--6.
[20]
Anindya Ghose, Arun Sundararajan, et almbox. 2006. Evaluating Pricing Strategy using E-Commerce Data: Evidence and Estimation Challenges. Statist. Sci., Vol. 21, 2 (2006), 131--142.
[21]
Jacob Goldberger and Ehud Ben-Reuven. 2017. Training Deep Neural-Networks using a Noise Adaptation Layer. In Proc. of ICLR.
[22]
Ido Guy and Bracha Shapira. 2018. From Royals to Vegans: Characterizing Question Trolling on a Community Question Answering Website. In Proc. of SIGIR. 835--844.
[23]
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-Teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels. In Proc. of NIPS. 8536--8546.
[24]
Ward Hanson and R Kipp Martin. 1990. Optimal Bundle Pricing. Management Science, Vol. 36, 2 (1990), 155--174.
[25]
Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proc. of IJCNN. 1322--1328.
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. of CVPR. 770--778.
[27]
Sharon Hirsch, Ido Guy, Alexander Nus, Arnon Dagan, and Oren Kurland. 2020. Query Reformulation in E-Commerce Search. In Proc. of SIGIR.
[28]
Ruiqi Hu, Shirui Pan, Jing Jiang, and Guodong Long. 2017. Graph ladder networks for network classification. In Proc. of CIKM. 2103--2106.
[29]
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. 2019. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5070--5079.
[30]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv preprint, Vol. abs/1607.01759 (2016).
[31]
Anitha Kannan, Inmar E. Givoni, Rakesh Agrawal, and Ariel Fuxman. 2011. Matching Unstructured Product Offers to Structured Product Specifications. In Proc. of KDD. 404--412.
[32]
Guannan Liu, Yanjie Fu, Guoqing Chen, Hui Xiong, and Can Chen. 2017. Modeling Buying Motives for Personalized Product Bundle Recommendation. ACM Trans. Knowl. Discov. Data, Vol. 11, 3 (2017), 28:1--28:26.
[33]
Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, and Hui Xiong. 2011. Personalized Travel Package Recommendation. In Proc. of ICDM. 407--416.
[34]
Prem Melville and Raymond J Mooney. 2003. Constructing Diverse Classifier Ensembles using Artificial Training Examples. In Proc. of IJCAI, Vol. 3. 505--510.
[35]
Yu Meng, Jiaming Shen, Chao Zhang, and Jiawei Han. 2018. Weakly-Supervised Neural Text Classification. In Proc. of CIKM. 983--992.
[36]
Ajinkya More. 2016. Attribute Extraction from Product Titles in eCommerce. arXiv preprint, Vol. abs/1608.04670 (2016).
[37]
Ajay Nagesh and Mihai Surdeanu. 2018. Keep your bearings: Lightly-supervised information extraction with ladder networks that avoids semantic drift. In Proc. of NAACL-HLT. 352--358.
[38]
Kamal Nigam and Rayid Ghani. 2000. Analyzing the Effectiveness and Applicability of Co-Training. In Proc. of CIKM. 86--93.
[39]
Slava Novgorodov, Ido Guy, Guy Elad, and Kira Radinsky. 2019. Generating Product Descriptions from User Reviews. In Proc. of WWW. 1354--1364.
[40]
Apurva Pathak, Kshitiz Gupta, and Julian McAuley. 2017. Generating and Personalizing Bundle Recommendations on Steam. In Proc. of SIGIR. 1073--1076.
[41]
Fabian Pedregosa et al. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., Vol. 12 (2011), 2825--2830.
[42]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proc. of EMNLP. 1532--1543.
[43]
Utkarsh Porwal. 2019. Learning Image Information for eCommerce Queries. arXiv preprint, Vol. abs/1904.12856 (2019).
[44]
Shuyao Qi, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2016. Recommending Packages to Groups. In Proc. of ICDM. 449--458.
[45]
Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In Advances in neural information processing systems. 3546--3554.
[46]
Lior Rokach. 2010. Ensemble-based Classifiers. Artificial Intelligence Review, Vol. 33, 1--2 (2010), 1--39.
[47]
Jennifer Rowley. 2000. Product Search in e-Shopping: a Review and Research Propositions. Journal of consumer marketing, Vol. 17, 1 (2000), 20--35.
[48]
Karan Samel and Xu Miao. 2018. Active Deep Learning to Tune Down the Noise in Labels. In Proc. of KDD. 685--694.
[49]
J Ben Schafer, Joseph Konstan, and John Riedl. 1999. Recommender Systems in E-Commerce. In Proc. of EC. 158--166.
[50]
J Ben Schafer, Joseph A Konstan, and John Riedl. 2001. E-commerce Recommendation Applications. Data Mining and Knowledge Discovery, Vol. 5, 1--2 (2001), 115--153.
[51]
Dimitris Serbos, Shuyao Qi, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2017. Fairness in Package-to-Group Recommendations. In Proc. of WWW. 371--379.
[52]
Kashif Shah, Selcuk Kopru, and Jean David Ruvini. 2018. Neural Network Based Extreme Classification and Similarity Models for Product Matching. In Proc. of NAACL:HLT. 8--15.
[53]
Stefan Stremersch and Gerard J Tellis. 2002. Strategic Bundling of Products and Prices: A New Synthesis for Marketing. Journal of marketing, Vol. 66, 1 (2002), 55--72.
[54]
Jian-Hua Tao, Jian Huang, Ya Li, Zheng Lian, and Ming-Yue Niu. 2019. Semi-supervised ladder networks for speech emotion recognition. International Journal of Automation and Computing, Vol. 16, 4 (2019), 437--448.
[55]
Min Xie, Laks V. S. Lakshmanan, and Peter T. Wood. 2014. Generating Top-k Packages via Preference Elicitation. Proc. VLDB Endow., Vol. 7, 14 (2014), 1941--1952.
[56]
Manjit S Yadav. 1994. How Buyers Evaluate Product Bundles: A Model of Anchoring and Adjustment. Journal of Consumer Research, Vol. 21, 2 (1994), 342--353.
[57]
De-Nian Yang, Wang-Chien Lee, Nai-Hui Chia, Mao Ye, and Hui-Ju Hung. 2012. On Bundle Configuration for Viral Marketing in Social Networks. In Proc. of CIKM. 2234--2238.
[58]
Show-Jane Yen and Yue-Shi Lee. 2009. Cluster-based Under-Sampling Approaches for Imbalanced Data Distributions. Expert Systems with Applications, Vol. 36, 3 (2009), 5718--5727.
[59]
Jing Zhang, Xindong Wu, and Victor S Shengs. 2015. Active Learning with Imbalanced Multiple Noisy Labeling. IEEE Transactions on Cybernetics, Vol. 45, 5 (2015), 1095--1107.
[60]
Tao Zhu, Patrick Harrington, Junjun Li, and Lei Tang. 2014. Bundle Recommendation in Ecommerce. In Proc. of SIGIR. 657--666.

Cited By

View all
  • (2025)Unveiling the power of language models in chemical research question answeringCommunications Chemistry10.1038/s42004-024-01394-x8:1Online publication date: 5-Jan-2025
  • (2024)Revisiting Bundle Recommendation for Intent-aware Product BundlingACM Transactions on Recommender Systems10.1145/3652865Online publication date: 15-Mar-2024
  • (2024)Bundle AIProcedia Computer Science10.1016/j.procs.2023.12.153231:C(24-31)Online publication date: 12-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. electronic commerce
  2. ensemble learning
  3. product bundling
  4. self-training
  5. semi-supervised learning

Qualifiers

  • Research-article

Conference

SIGIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)4
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Unveiling the power of language models in chemical research question answeringCommunications Chemistry10.1038/s42004-024-01394-x8:1Online publication date: 5-Jan-2025
  • (2024)Revisiting Bundle Recommendation for Intent-aware Product BundlingACM Transactions on Recommender Systems10.1145/3652865Online publication date: 15-Mar-2024
  • (2024)Bundle AIProcedia Computer Science10.1016/j.procs.2023.12.153231:C(24-31)Online publication date: 12-Apr-2024
  • (2024)Non-autoregressive personalized bundle generationInformation Processing & Management10.1016/j.ipm.2024.10381461:5(103814)Online publication date: Sep-2024
  • (2024)Ensemble methods and semi-supervised learning for information fusionInformation Fusion10.1016/j.inffus.2024.102310107:COnline publication date: 1-Jul-2024
  • (2024)Bundle Recommendation with Item-Level Causation-Enhanced Multi-view LearningMachine Learning and Knowledge Discovery in Databases. Research Track and Demo Track10.1007/978-3-031-70371-3_19(324-341)Online publication date: 22-Aug-2024
  • (2023)A heterogeneous graph neural network model for list recommendationKnowledge-Based Systems10.1016/j.knosys.2023.110822277(110822)Online publication date: Oct-2023
  • (2023)Semi-supervised and un-supervised clusteringInformation Systems10.1016/j.is.2023.102178114:COnline publication date: 1-Mar-2023
  • (2022)BRUCE: Bundle Recommendation Using Contextualized item EmbeddingsProceedings of the 16th ACM Conference on Recommender Systems10.1145/3523227.3546754(237-245)Online publication date: 12-Sep-2022
  • (2022)Revisiting Bundle RecommendationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531904(2900-2911)Online publication date: 6-Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media