skip to main content
10.1145/2783258.2783415acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Extreme States Distribution Decomposition Method for Search Engine Online Evaluation

Published: 10 August 2015 Publication History

Abstract

Nowadays, the development of most leading web services is controlled by online experiments that qualify and quantify the steady stream of their updates. The challenging problem is to define an appropriate online metric of user behavior, so-called Overall Evaluation Criterion (OEC), which is both interpretable and sensitive. The state-of-the-art approach is to choose a type of entities to observe in the behavior data, to define a key metric for these observations, and to estimate the average value of this metric over the observations in each of the system versions. A significant disadvantage of the OEC obtained in this way is that the average value of the key metric does not necessarily change, even if its distribution changes significantly. The reason is that the difference between the mean values of the key metric over the two variants of the system does not necessarily reflect the character of the change in the distribution.
We develop a novel method of quantifying the change in the distribution of the key metric, which is (1) interpretable, (2) is based on the analysis of the two distributions as a whole, and, for this reason, is sensitive to more ways the two distributions may actually differ. We provide a thorough theoretical analysis of our approach and show experimentally that, other things being equal, it produces more sensitive OEC than the average.

Supplementary Material

MP4 File (p845.mp4)

References

[1]
E. Bakshy and D. Eckles. Uncertainty in online experiments with dependent data: An evaluation of bootstrap methods. In KDD'2013, pages 1303--1311, 2013.
[2]
G. E. Box, J. S. Hunter, and W. G. Hunter. Statistics for experimenters: design, innovation, and discovery. AMC, 10:12, 2005.
[3]
G. Buscher, L. van Elst, and A. Dengel. Segment-level display time as implicit feedback: a comparison to eye tracking. In SIGIR'2009, pages 67--74, 2009.
[4]
S. Chakraborty, F. Radlinski, M. Shokouhi, and P. Baecke. On correlation of absence time and search effectiveness. In SIGIR'2014, pages 1163--1166, 2014.
[5]
T. Crook, B. Frasca, R. Kohavi, and R. Longbotham. Seven pitfalls to avoid when running controlled experiments on the web. In KDD'2009, pages 1105--1114, 2009.
[6]
K. Dai. Modeling score distributions for information retrieval. 2012.
[7]
A. Deng and V. Hu. Diluted treatment effect estimation for trigger analysis in online controlled experiments. In WSDM'2015, pages 349--358, 2015.
[8]
A. Deng, T. Li, and Y. Guo. Statistical inference in two-stage online controlled experiments with treatment selection and validation. In WWW'2014, pages 609--618, 2014.
[9]
A. Deng, Y. Xu, R. Kohavi, and T. Walker. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In WSDM'2013, 2013.
[10]
A. Drutsa. Sign-aware periodicity metrics of user engagement for online search quality evaluation. In SIGIR'2015, 2015.
[11]
A. Drutsa, G. Gusev, and P. Serdyukov. Engagement periodicity in search engine usage: Analysis and its application to search quality evaluation. In WSDM'2015, pages 27--36, 2015.
[12]
A. Drutsa, G. Gusev, and P. Serdyukov. Future user engagement prediction and its application to improve the sensitivity of online experiments. In WWW'2015, pages 256--266, 2015.
[13]
G. Dupret and M. Lalmas. Absence time and user engagement: evaluating ranking functions. In WSDM'2013, pages 173--182, 2013.
[14]
B. Efron and R. J. Tibshirani. An introduction to the bootstrap. CRC press, 1994.
[15]
U. Fano. Description of states in quantum mechanics by density matrix and operator techniques. Reviews of Modern Physics, 29(1):74--93, 1957.
[16]
J. Jagarlamudi and P. N. Bennett. Fractional similarity: Cross-lingual feature selection for search. In ECIR'2011, pages 226--237. 2011.
[17]
F. James. Statistical methods in experimental physics. Singapore: World Scientific, 7(4), 2006.
[18]
Y. Kim, A. Hassan, R. W. White, and I. Zitouni. Modeling dwell time to predict click-level satisfaction. In WSDM'2014, pages 193--202, 2014.
[19]
R. Kohavi, T. Crook, R. Longbotham, B. Frasca, R. Henne, J. L. Ferres, and T. Melamed. Online experimentation at microsoft. Data Mining Case Studies, page 11, 2009.
[20]
R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker, and Y. Xu. Trustworthy online controlled experiments: Five puzzling outcomes explained. In KDD'2012, pages 786--794, 2012.
[21]
R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu, and N. Pohlmann. Online controlled experiments at large scale. In KDD'2013, pages 1168--1176, 2013.
[22]
R. Kohavi, A. Deng, R. Longbotham, and Y. Xu. Seven rules of thumb for web site experimenters. In KDD'2014, 2014.
[23]
R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov., 18(1):140--181, 2009.
[24]
J. Lehmann, M. Lalmas, G. Dupret, and R. Baeza-Yates. Online multitasking and user engagement. In CIKM'2013, pages 519--528, 2013.
[25]
R. H. Lopes. Kolmogorov-smirnov test. In International Encyclopedia of Statistical Science, pages 718--720. Springer, 2011.
[26]
E. T. Peterson. Web analytics demystified: a marketer's guide to understanding how your web site affects your business. Ingram, 2004.
[27]
A. Shishkin, P. Zhinalieva, and K. Nikolaev. Quality-biased ranking for queries with commercial intent. In WWW'2013, pages 1145--1148, 2013.
[28]
Y. Song, X. Shi, and X. Fu. Evaluating and predicting user engagement change with degraded search relevance. In WWW'2013, pages 1213--1224, 2013.
[29]
D. Tang, A. Agarwal, D. O'Brien, and M. Meyer. Overlapping experiment infrastructure: More, better, faster experimentation. In KDD'2010, pages 17--26, 2010.
[30]
R. W. White and D. Kelly. A study on the effects of personalization and task information on implicit feedback performance. In CIKM'2006, pages 297--306, 2006.

Cited By

View all
  • (2019)Effective Online Evaluation for Web SearchProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331378(1399-1400)Online publication date: 18-Jul-2019
  • (2019)New Performance Index “Attractiveness Factor” for Evaluating Websites via Obtaining Transition of Users’ InterestsData Science and Engineering10.1007/s41019-019-00112-15:1(48-64)Online publication date: 21-Nov-2019
  • (2018)Consistent Transformation of Ratio Metrics for Efficient Online Controlled ExperimentsProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159699(55-63)Online publication date: 2-Feb-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. a/b test
  2. distribution decomposition
  3. effect variable

Qualifiers

  • Research-article

Conference

KDD '15
Sponsor:

Acceptance Rates

KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Effective Online Evaluation for Web SearchProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331378(1399-1400)Online publication date: 18-Jul-2019
  • (2019)New Performance Index “Attractiveness Factor” for Evaluating Websites via Obtaining Transition of Users’ InterestsData Science and Engineering10.1007/s41019-019-00112-15:1(48-64)Online publication date: 21-Nov-2019
  • (2018)Consistent Transformation of Ratio Metrics for Efficient Online Controlled ExperimentsProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159699(55-63)Online publication date: 2-Feb-2018
  • (2017)Using the Delay in a Treatment Effect to Improve Sensitivity and Preserve Directionality of Engagement Metrics in A/B ExperimentsProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052664(1301-1310)Online publication date: 3-Apr-2017
  • (2017)Periodicity in User Engagement with a Search Engine and Its Application to Online Controlled ExperimentsACM Transactions on the Web10.1145/285682211:2(1-35)Online publication date: 14-Apr-2017
  • (2016)Online Evaluation for Information RetrievalFoundations and Trends in Information Retrieval10.1561/150000005110:1(1-117)Online publication date: 1-Jun-2016
  • (2016)Boosted Decision Tree Regression Adjustment for Variance Reduction in Online Controlled ExperimentsProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939688(235-244)Online publication date: 13-Aug-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media