skip to main content
10.1145/3077136.3096469acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
invited-talk

Structuring the Unstructured: From Startup to Making Sense of eBay's Huge eCommerce Inventory

Published: 07 August 2017 Publication History

Abstract

Electronic commerce continues to gain popularity in recent years. On eBay, one of the largest on-line marketplaces in the world, millions of new listings (items) are submitted by a variety of sellers every day. This renders a rich diverse inventory characterized by a particularly long tail. In addition, many items in the inventory lack basic structured information, such as product identifiers, brand, category, and other properties, due to sellers' tendency to input unstructured information only, namely title and description. Such inventory therefore requires a handful of large-scale solutions to assist in organizing the data and gaining business insights. In 2016, eBay acquired SalesPredict to help structure its unstructured data. In this proposed presentation, we will share the story of a research startup from its inception until its acquisition and integration as eBay's data science team. We will review the numerous challenges from research and engineering perspectives of a startup and the principal challenges the eBay data science organization deals with today. These include the identification of duplicate, similar, and related products; the extraction of name-value attributes from item titles and descriptions; the matching of items entered by sellers to catalog products; the ranking of item titles based on their likelihood to serve as "good" product titles; and the creation of "browse node" pages to address complex search queries from potential buyers. We will describe how the eBay data science team approaches these challenges and some of the solutions already launched to production. These solutions involve the use of large-scale machine learning, information retrieval, and natural language processing techniques, and should therefore be of interest to the SIGIR audience at large.

References

[1]
Ido Guy 2016. Searching by Talking: Analysis of Voice Queries on Mobile Web Search Proc. SIGIR. 35--44.
[2]
Ido Guy, Avihai Mejer, Alexander Nus, and Fiana Raiber. 2017. Extracting and Ranking Travel Tips from User-Generated Reviews Proc. WWW. 987--996.
[3]
Ido Guy, Inbal Ronen, Elad Kravi, and Maya Barnea. 2016natexlaba. Increasing Activity in Enterprise Online Communities Using Content Recommendation. ACM Trans. Comput.-Hum. Interact. Vol. 23, 4, Article bibinfoarticleno22 (2016), bibinfonumpages22:1--22:28 pages.
[4]
Ido Guy, Inbal Ronen, Naama Zwerdling, Irena Zuyev-Grabovitch, and Michal Jacovi 2016natexlabb. What is Your Organization 'Like': A Study of Liking Activity in the Enterprise Proc. CHI. 3025--3037.
[5]
Elad Kravi et al. 2016. One Query, Many Clicks: Analysis of Queries with Multiple Clicks by the Same User Proc. CIKM. 1423--1432.
[6]
Karin Mauge, Khash Rohanimanesh, and Jean-David Ruvini. 2012. Structuring e-Commerce Inventory. In Proc. ACL. 805--814.
[7]
Neel Sundaresan. 2011. Recommender Systems at the Long Tail. In Proc. RecSys. 1--6.

Cited By

View all
  • (2021)Metric Learning Based Vision Transformer for Product MatchingNeural Information Processing10.1007/978-3-030-92185-9_1(3-13)Online publication date: 8-Dec-2021
  • (2020)Query Reformulation in E-Commerce SearchProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401065(1319-1328)Online publication date: 25-Jul-2020

Index Terms

  1. Structuring the Unstructured: From Startup to Making Sense of eBay's Huge eCommerce Inventory

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
      August 2017
      1476 pages
      ISBN:9781450350228
      DOI:10.1145/3077136
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 August 2017

      Check for updates

      Qualifiers

      • Invited-talk

      Conference

      SIGIR '17
      Sponsor:

      Acceptance Rates

      SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Metric Learning Based Vision Transformer for Product MatchingNeural Information Processing10.1007/978-3-030-92185-9_1(3-13)Online publication date: 8-Dec-2021
      • (2020)Query Reformulation in E-Commerce SearchProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401065(1319-1328)Online publication date: 25-Jul-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media