skip to main content
10.1145/3556089.3556149acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicemeConference Proceedingsconference-collections
research-article

E-Commerce Product Matching at Internet Scale

Published:30 November 2022Publication History

ABSTRACT

In E-Commerce, Product Matching is one of the fundamental problems for various use cases like (1) Competitive pricing of products, (2) deduplication of products in catalog, (3) grouping items from various merchants (4) Recommending products. The requirement is to match a product accurately against a catalog spread across tens of thousands of taxonomy nodes and millions of items. Product matching results must be accurate, and the margin for error is minimal to use Product Matching across use cases. This paper proposes a combination of Deep Learning models integrated into the scalable architecture to achieve the required results. Here we have approached the problem at the grass-root level consisting of five stages (1) Identifying attributes per taxonomy node (2) classification of products (3) Attribute Enrichment from NER (Text) and Image feature extraction (4) Search against multiple indices and filter results for mandatory attributes (5) Re-rank to improve the relevancy of shortlisted results. We have defined product data quality and measured it at every stage to improve the overall performance of Product Matching. This approach has yielded accurate product matching results at scale minimising false-positives

References

  1. [1] P. Langley, Crafting Papers on Machine Learning, 2000, pp.1207–1216, Pat Langley, Proceedings of the 17th International Conference on Machine Learning (ICML 2000), Stanford, CA, Morgan Kaufmann.Google ScholarGoogle Scholar
  2. [2] Sun, Chi and Qiu, Xipeng and Xu, Yige and Huang, XuanjingGoogle ScholarGoogle Scholar
  3. [3] Piotr Bojanowski and Edouard Grave and Armand Joulin and Tomás Mikolov Enriching Word Vectors with Subword Information CoRR abs/1607.04606 2016 http://arxiv.org/abs/1607.04606 arXiv 1607.0460 Mon, 28 Dec 2020 11:31:02 +0100 https://dblp.org/rec/journals/corr/BojanowskiGJM16.bib dblp computer science bibliography, https://dblp.orgGoogle ScholarGoogle Scholar
  4. [4] J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.Google ScholarGoogle Scholar
  5. [5] I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.Google ScholarGoogle Scholar
  6. [6] K. Elissa, “Title of paper if known,” unpublished.Google ScholarGoogle Scholar
  7. [7] R. Nicole, “Title of paper with only first word capitalized,” J. Name Stand. Abbrev., in press.Google ScholarGoogle Scholar
  8. [8] Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron spectroscopy studies on magneto-optical media and plastic substrate interface,” IEEE Transl. J. Magn. Japan, vol. 2, pp. 740–741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.Google ScholarGoogle Scholar
  10. [10] Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas,Bag of Tricks for Efficient Text Classification, 2016Google ScholarGoogle Scholar
  11. [11] Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018Google ScholarGoogle Scholar
  12. [12]Ajinkya, Attribute Extraction from Product Titles in eCommerce, 2016Google ScholarGoogle Scholar
  13. [13] Huang, Zhiheng and Xu, Wei and Yu, Kai, Bidirectional LSTM-CRF Models for Sequence Tagging, 2015Google ScholarGoogle Scholar
  14. [14] Stahlberg, Felix, Neural Machine Translation: A Review and Survey, 2019Google ScholarGoogle Scholar
  15. [15] Ratner, Alexander and Bach, Stephen H. and Ehrenberg, Henry and Fries, Jason and Wu, Sen and Ré, Christopher, Snorkel: Rapid Training Data Creation with Weak Supervision, 2017Google ScholarGoogle Scholar
  16. [16] Kaiming He and Georgia Gkioxari and Piotr Dollár and Ross Girshick, Mask R-CNN, 2018Google ScholarGoogle Scholar
  17. [17] Ajinkya More, Product Matching in eCommerce using deep learning, 2017Google ScholarGoogle Scholar
  18. [18] Janusz Tracz and Piotr Iwo Wójcik and Kalina Jasinska-Kobus and Riccardo Belluzzo and Robert Mroczkowski and Ireneusz Gawlik, BERT-based similarity learning for product matching, 2020Google ScholarGoogle Scholar
  19. [19] Edouard Grave, et.al Efficient softmax approximation for GPUs, 2017Google ScholarGoogle Scholar
  20. [20] Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov, Bag of Tricks for Efficient Text Classification, 2016Google ScholarGoogle Scholar
  21. [21]Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space, 2013Google ScholarGoogle Scholar
  22. [22] Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola,Feature Hashing for Large Scale Multitask Learning, 2010Google ScholarGoogle Scholar

Index Terms

  1. E-Commerce Product Matching at Internet Scale

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICEME '22: Proceedings of the 2022 13th International Conference on E-business, Management and Economics
      July 2022
      691 pages
      ISBN:9781450396394
      DOI:10.1145/3556089

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 November 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)45
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader