skip to main content
10.1145/2505515.2505721acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

On segmentation of eCommerce queries

Published: 27 October 2013 Publication History

Abstract

In this paper, we present QSEGMENT, a real-life query segmentation system for eCommerce queries. QSEGMENT uses frequency data from the query log which we call buyers' data and also frequency data from product titles what we call sellers' data. We exploit the taxonomical structure of the marketplace to build domain specific frequency models. Using such an approach, QSEGMENT performs better than previously described baselines for query segmentation. Also, we perform a large scale evaluation by using an unsupervised IR metric which we refer to as user-intent-score. We discuss the overall architecture of QSEGMENT as well as various use cases and interesting observations around segmenting eCommerce queries.

References

[1]
C. Anderson. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion, 2006.
[2]
P. Bailey, R. W. White, H. Liu, and G. Kumaran. Mining historic query trails to label long and rare search engine queries. ACM Trans. Web, 4(4), 2010.
[3]
M. Bendersky, C. W. Bruce, and D. Smith. Two-stage query segmentation for information retrieval. In SIGIR '09: In Proc. of ACM SIGIR Conference, 2009.
[4]
S. Bergsma and Q. I. Wang. Learning noun phrase query segmentation. In Proceedings of International Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 819--826, 2007.
[5]
S. Brants and A. Franz. Web 1t 5-gram version 1. In Linguistic Data Consortium LDC2006T13, 2006.
[6]
Y. Chen and Y.-Q. Zhang. A query substitution-search result refinement approach for long query web searches. In Proc. of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01, pages 245--251, 2009.
[7]
R. Cummins and C. O'Riordan. Learning in a pairwise term-term proximity framework for information retrieval. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '09, pages 251--258, 2009.
[8]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 10--10, Berkeley, CA, USA, 2004. USENIX Association.
[9]
J. Gao, J.-Y. Nie, G. Wu, and G. Cao. Dependence language model for information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference, pages 170--177, 2004.
[10]
S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In Proceedings of the third ACM international conference on Web search and data mining, WSDM '10, pages 201--210, New York, NY, USA, 2010. ACM.
[11]
S. Gollapudi, S. Ieong, A. Ntoulas, and S. Paparizos. Efficient query rewrite for structured web queries. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 2417--2420, 2011.
[12]
M. Hagen, M. Potthast, A. Beyer, and B. Stein. Towards optimum query segmentation: in doubt without. In Proceedings of the 21st ACM international conference on Information and knowledge management, CIKM '12, pages 1015--1024, 2012.
[13]
M. Hagen, M. Potthast, B. Stein, and C. Brautigam. The power of naive query segmentation. In SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference, 2010.
[14]
M. Hagen, M. Potthast, B. Stein, and C. Brautigam. Query segmentation revisited. In WWW '11: Proceedings of the 20th international conference on World Wide Web, 2011.
[15]
M. A. Hasan, N. Parikh, G. Singh, and N. Sundaresan. Query suggestion for e-commerce sites. In Proceedings of the fourth ACM international conference on Web search and data mining, WSDM '11, pages 765--774, 2011.
[16]
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, March 1963.
[17]
J. Huang, J. Gao, J. Miao, X. Li, K. Wang, F. Behr, and C. L. Giles. Exploring web scale language models for search query processing. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 451--460, 2010.
[18]
S. Huang, X. Wu, and A. Bolivar. The effect of title term suggestion on e-commerce sites. In Proceedings of the 10th ACM workshop on Web information and data management, WIDM '08, pages 31--38, New York, NY, USA, 2008. ACM.
[19]
R. Jones, B. Rey, O. Madani, and W. Freiner. Generating query substitutions. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 387--396, 2006.
[20]
S. Kullback and R. A. Leibler. On information and sufficiency. Ann. Math. Statist., 22(1):79--86, 1951.
[21]
G. Kumaran and V. R. Carvalho. Reducing long queries using query quality predictors. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '09, pages 564--571, 2009.
[22]
V. Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10:707, 1966.
[23]
Y. Li, B.-J. P. Hsu, C. Zhai, and K. Wang. Unsupervised query segmentation using clickthrough for information retrieval. In SIGIR '11: Procedings of 34th Annual ACM SIGIR Conference, 2011.
[24]
Y. Li, B.-J. P. Hsu, C. Zhai, and K. Wang. Unsupervised query segmentation using clickthrough for information retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 285--294, 2011.
[25]
N. Mishra, R. Saha Roy, N. Ganguly, S. Laxman, and M. Choudhury. Unsupervised query segmentation using only query logs. In Proceedings of the 20th international conference companion on World wide web, pages 91--92, 2011.
[26]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999--66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120.
[27]
N. Parikh, G. Singh, and N. Sundaresan. Query suggestion with large scale data. In Volume 31: Handbook of Statistics, 1st Edition; Machine Learning: Theory and Applications, chapter 20, pages 493--518, 2013.
[28]
N. Parikh and N. Sundaresan. Inferring semantic query relations from collective user behavior. In CIKM '08: Proceedings of the ACM 13th conference on Information and Knowledge Management, 2008.
[29]
K. Risvik, T. Mikolajewski, and P. Boros. Query segmentation for web search. In WWW '03: Proceedings of the 12th international conference on World Wide Web, 2003.
[30]
R. Saha Roy, N. Ganguly, M. Choudhury, and S. Laxman. An ir-based evaluation framework for web search query segmentation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 881--890, 2012.
[31]
D. Shen, J.-D. Ruvini, and B. Sarwar. Large-scale item categorization for e-commerce. In Proceedings of the 21st ACM international conference on Information and knowledge management, CIKM '12, pages 595--604, New York, NY, USA, 2012. ACM.
[32]
D. Shen, J. D. Ruvini, M. Somaiya, and N. Sundaresan. Item categorization in the e-commerce domain. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 1921--1924, New York, NY, USA, 2011. ACM.
[33]
X. Shi and C. C. Yang. Mining related queries from search engine query logs. In Proceedings of the 15th international conference on World Wide Web, WWW '06, pages 943--944, New York, NY, USA, 2006. ACM.
[34]
G. Singh, N. Parikh, and N. Sundaresan. User behavior in zero-recall ecommerce queries. In SIGIR '11: Procedings of 34th Annual ACM SIGIR Conference, pages 75--84, 2011.
[35]
G. Singh, N. Parikh, and N. Sundaresan. Rewriting null e-commerce queries to recommend products. In Proceedings of the 21st international conference companion on World Wide Web, 2012.
[36]
M. Srikanth and R. Srihari. Biterm language models for document retrieval. In Proceedings of the 25th annual international ACM SIGIR conference, pages 425--426, 2002.
[37]
P. Sriram and N. Parikh. Towards ideal segmentation of ebay queries. In Technical Report: eBay Research Labs, 2013.
[38]
B. Tan and F. Peng. Unsupervised query segmentation using generative language models and wikipedia. In WWW '08: Proceedings of the 17th international conference on World Wide Web, 2008.

Cited By

View all
  • (2022)Access Beyond Borders: Linked Open Data Applications on Cultural HeritageNew Review of Information Networking10.1080/13614576.2022.214533427:2(71-90)Online publication date: 17-Nov-2022
  • (2021)Towards adaptive structured Dirichlet smoothing model for digital resource objectsMultimedia Tools and Applications10.1007/s11042-020-10305-w80:8(12175-12194)Online publication date: 1-Mar-2021
  • (2021)Distant Supervision for E-commerce Query Segmentation via Attention NetworkIntelligent Processing Practices and Tools for E-Commerce Data, Information, and Knowledge10.1007/978-3-030-78303-7_1(3-19)Online publication date: 27-May-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ecommerce search
  2. query rewrite
  3. query segmentation
  4. query suggestion

Qualifiers

  • Research-article

Conference

CIKM'13
Sponsor:
CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
October 27 - November 1, 2013
California, San Francisco, USA

Acceptance Rates

CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Access Beyond Borders: Linked Open Data Applications on Cultural HeritageNew Review of Information Networking10.1080/13614576.2022.214533427:2(71-90)Online publication date: 17-Nov-2022
  • (2021)Towards adaptive structured Dirichlet smoothing model for digital resource objectsMultimedia Tools and Applications10.1007/s11042-020-10305-w80:8(12175-12194)Online publication date: 1-Mar-2021
  • (2021)Distant Supervision for E-commerce Query Segmentation via Attention NetworkIntelligent Processing Practices and Tools for E-Commerce Data, Information, and Knowledge10.1007/978-3-030-78303-7_1(3-19)Online publication date: 27-May-2021
  • (2020)Opportunities and challenges in enhancing access to metadata of cultural heritage collections: a surveyArtificial Intelligence Review10.1007/s10462-019-09773-w53:5(3621-3646)Online publication date: 1-Jun-2020
  • (2017)Product review summarization through question retrieval and diversificationInformation Retrieval10.1007/s10791-017-9311-020:6(575-605)Online publication date: 1-Dec-2017
  • (2016)Retrieving Non-Redundant Questions to Summarize a Product ReviewProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911544(385-394)Online publication date: 7-Jul-2016
  • (2015)Information Retrieval with Verbose QueriesProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767877(1121-1124)Online publication date: 9-Aug-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media