skip to main content
10.1145/3514221.3526124acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Automated Category Tree Construction in E-Commerce

Published: 11 June 2022 Publication History

Abstract

Category trees play a central role in many web applications, enabling browsing-style information access. Building trees that reflect users' dynamic interests is, however, a challenging task, carried out by taxonomists. This manual construction leads to outdated trees as it is hard to keep track of market trends. While taxonomists can identify candidate categories, i.e. sets of items with a shared label, most such categories cannot simultaneously exist in the tree, as platforms set a bound on the number of categories an item may belong to. To address this setting, we formalize the problem of constructing a tree where the categories are maximally similar to desirable candidate categories while satisfying combinatorial requirements and provide a model that captures practical considerations.
In previous work, we proved inapproximability bounds for this model. Nevertheless, in this work we provide two heuristic algorithms, and demonstrate their effectiveness over datasets from real-life e-commerce platforms, far exceeding the worst-case bounds. We also identify a natural special case, for which we devise a solution with tight approximation guarantees. Moreover, we explain how our approach facilitates continual updates, maintaining consistency with an existing tree. Finally, we propose to include in the input candidate categories derived from result sets to recent search queries to reflect dynamic user interests and trends.

References

[1]
https://export.ebay.com/en/start-sell/selling-basics/seller-fees/fees-optional-listing-upgrades/.
[2]
Ctcr implementation. https://github.com/shayg1/CategoryTrees.
[3]
Crowdflower search relevance. https://data.world/crowdflower/ecommerce-search-relevance, 2015.
[4]
Home depot product search relevance. https://www.kaggle.com/c/home-depot-product-search-relevance/data, 2016.
[5]
Innerwear data from victoria's secret and others. https://www.kaggle.com/PromptCloudHQ/innerwear-data-from-victorias-secret-and-others, 2017.
[6]
Elasticsearch. https://www.elastic.co/elasticsearch, 2020.
[7]
Geir Agnarsson, Magnús M Halldórsson, and Elena Losievskaja. Sdp-based algorithms for maximum independent set problems on hypergraphs. Theoretical Computer Science, 470:1--9, 2013.
[8]
Uri Avron, Shay Gershtein, Ido Guy, Tova Milo, and Slava Novgorodov. ConCaT: Construction of Category Trees from Search Queries in E-Commerce . In ICDE, 2021.
[9]
Slobodan Beliga, Ana Mevs trović, and Sanda Martinvc ić-Ipvs ić. An overview of graph-based keyword extraction methods and approaches. JIOS, 39(1):1--20, 2015.
[10]
Ali Cevahir and Koji Murakami. Large-scale multi-class and hierarchical product categorization for an e-commerce giant. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 525--535, 2016.
[11]
Shui-Lung Chuang and Lee-Feng Chien. A practical web-based approach to generating topic hierarchy for text segments. In CIKM, page 127--136, 2004.
[12]
Eyal Dushkin, Shay Gershtein, Tova Milo, and Slava Novgorodov. Query driven data labeling with experts: Why pay twice? In EDBT, 2019.
[13]
Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, and Slava Novgorodov. On the hardness of category tree construction. In ICDT, pages 4:1--4:17, 2022.
[14]
Amit Gupta, Rémi Lebret, Hamza Harkous, and Karl Aberer. Taxonomy induction using hypernym subsequences. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1329--1338, 2017.
[15]
Magnús M Halldórsson and Elena Losievskaja. Independent sets in bounded-degree hypergraphs. Discrete applied mathematics, 157(8):1773--1786, 2009.
[16]
Idan Hasson, Slava Novgorodov, Gilad Fuchs, and Yoni Acriche. Category recognition in e-commerce using sequence-to-sequence hierarchical classification. In WSDM, 2021.
[17]
Ruining He and Julian McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proc. of WWW, pages 507--517, 2016.
[18]
Y. Hsieh, S. Wu, L. Chen, and P. Yang. Constructing hierarchical product categories for e-commerce by word embedding and clustering. In IRI, 2017.
[19]
Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang, and Jiawei Han. Corel: Seed-guided topical taxonomy construction by concept learning and relation transferring. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1928--1936, 2020.
[20]
Hua Jiang, Chu-Min Li, and Felip Manya. An exact algorithm for the maximum weight clique problem in large graphs. In AAAI, pages 830--838, 2017.
[21]
Zornitsa Kozareva. Everyone likes shopping! multi-class product categorization for e-commerce. In NAACL, pages 1329--1333, 2015.
[22]
Sebastian Lamm, Christian Schulz, Darren Strash, Robert Williger, and Huashuo Zhang. Exactly solving the maximum weight independent set problem on large real-world graphs. In 2019 Proceedings of the Twenty-First Workshop on Algorithm Engineering and Experiments (ALENEX), pages 144--158. SIAM, 2019.
[23]
Xueqing Liu, Yangqiu Song, Shixia Liu, and Haixun Wang. Automatic taxonomy construction from keywords. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1433--1441, 2012.
[24]
Maximilian Nickel and Douwe Kiela. Poincar$backslash$'e embeddings for learning hierarchical representations. arXiv preprint arXiv:1705.08039, 2017.
[25]
Alexander Panchenko, Stefano Faralli, Eugen Ruppert, Steffen Remus, Hubert Naets, Cédrick Fairon, Simone Paolo Ponzetto, and Chris Biemann. Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pages 1320--1327, 2016.
[26]
Kunal Punera, Suju Rajan, and Joydeep Ghosh. Automatically learning document taxonomies for hierarchical classification. In Proc. of WWW, 2005.
[27]
Cécile Robin, James O'Neill, and Paul Buitelaar. Automatic taxonomy generation: A use-case in the legal domain. In Language and Technology Conference, pages 318--328. Springer, 2017.
[28]
Jingbo Shang, Xinyang Zhang, Liyuan Liu, Sha Li, and Jiawei Han. Nettaxo: Automated topic taxonomy construction from text-rich network. In Proceedings of The Web Conference 2020, page 1908--1919, 2020.
[29]
Dan Shen, Jean-David Ruvini, and Badrul Sarwar. Large-scale item categorization for e-commerce. pages 595--604, 10 2012.
[30]
Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, and Jiawei Han. Taxoexpan: self-supervised taxonomy expansion with position-enhanced graph neural network. In Proceedings of The Web Conference 2020, pages 486--497, 2020.
[31]
Yuyin Sun, Adish Singla, Dieter Fox, and Andreas Krause. Building hierarchies of concepts via crowdsourcing, 2015.
[32]
Lei Tang, Jianping Zhang, and Huan Liu. Acclimatizing taxonomic semantics for hierarchical content classification. volume 2006, pages 384--393, 01 2006.
[33]
Chi Wang, Marina Danilevsky, Nihit Desai, Yinan Zhang, Phuong Nguyen, Thrivikrama Taula, and Jiawei Han. A phrase mining framework for recursive construction of a topical hierarchy. In KDD, pages 437--445, 2013.
[34]
Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, and Chao Zhang. Steam: Self-supervised taxonomy expansion with mini-paths. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1026--1035, 2020.
[35]
Quan Yuan, Gao Cong, Aixin Sun, Chin-Yew Lin, and Nadia Magnenat Thalmann. Category hierarchy maintenance: a data-driven approach. In SIGIR, 2012.
[36]
Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni, and Jiawei Han. Taxogen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering. In KDD, page 2701--2709, 2018.
[37]
Yuchen Zhang, Amr Ahmed, Vanja Josifovski, and Alexander Smola. Taxonomy discovery for personalized recommendation. In WSDM, pages 243--252, 2014.

Cited By

View all
  • (2024)Automated Category Tree Construction: Hardness Bounds and AlgorithmsACM Transactions on Database Systems10.1145/366428349:3(1-32)Online publication date: 13-Jul-2024
  • (2023)Automated E-Commerce Price Comparison Website using PHP, XAMPP, MongoDB, Django, and Web Scrapping2023 International Conference on Computer Communication and Informatics (ICCCI)10.1109/ICCCI56745.2023.10128573(1-6)Online publication date: 23-Jan-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. category tree construction
  2. e-commerce

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automated Category Tree Construction: Hardness Bounds and AlgorithmsACM Transactions on Database Systems10.1145/366428349:3(1-32)Online publication date: 13-Jul-2024
  • (2023)Automated E-Commerce Price Comparison Website using PHP, XAMPP, MongoDB, Django, and Web Scrapping2023 International Conference on Computer Communication and Informatics (ICCCI)10.1109/ICCCI56745.2023.10128573(1-6)Online publication date: 23-Jan-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media