poster

Temporal multi-hierarchy smoothing for estimating rates of rare events

Authors:

Deepak AgarwalAuthors Info & Claims

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1361 - 1369

https://doi.org/10.1145/2020408.2020610

Published: 21 August 2011 Publication History

Abstract

We consider the problem of estimating rates of rare events obtained through interactions among several categorical variables that are heavy-tailed and hierarchical. In our previous work, we proposed a scalable log-linear model called LMMH (Log-Linear Models for Multiple Hierarchies) that combats data sparsity at granular levels through small sample size corrections that borrow strength from rate estimates at coarser resolutions. This paper extends our previous work in two directions. First, we model excess heterogeneity by fitting local LMMH models to relatively homogeneous subsets of the data. To ensure scalable computation, these subsets are induced through a decision tree, we call this Treed-LMMH. Second, the Treed-LMMH method is coupled with temporal smoothing procedure based on a fast Kalman filter style algorithm. We show that simultaneously performing hierarchical and temporal smoothing leads to significant improvement in predictive accuracy. Our methods are illustrated on a large scale computational advertising dataset consisting of billions of observations and hundreds of millions of attribute combinations(cells).

References

[1]

D. Agarwal, R. Agrawal, R. Khanna, and N. Kota. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In KDD '10, pages 213--222, 2010.

Digital Library

[2]

D. Agarwal, A. Z. Broder, D. Chakrabarti, D. Diklic, V. Josifovski, and M. Sayyadian. Estimating rates of rare events at multiple resolutions. In KDD '07, pages 16--25, 2007.

Digital Library

[3]

D. Agarwal, B. C. Chen, P. Elango, N. Motgi, S. T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, pages 17--24, 2008.

[4]

A.Gelman and J.Hill. Data Analysis using Regression/Multi-level Hierarchical Models. Cambridge University Press, 2007.

[5]

J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B, 48:259--302, 1986.

[6]

L. Brieman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, 1984.

[7]

A. Broder. Computational advertising. In SODA '08: Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 992--992, 2008.

Digital Library

[8]

A. Broder. Computational advertising and recommender systems. In Proceedings of the 2008 ACM conference on Recommender systems, pages 1--2. ACM, 2008.

Digital Library

[9]

Y. Chen, D. Pavlov, and J. F. Canny. Large-scale behavioral targeting. In KDD '09, pages 209--218, 2009.

Digital Library

[10]

H. A. Chipman, E. I. George, and R. E. McCulloch. Bayesian treed models. Machine Learning, 48:299--320, 2002.

Digital Library

[11]

A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web, pages 271--280. ACM, 2007.

Digital Library

[12]

J. Dean and S. Ghemawat. Mapreduce:simplified data processing on large clusters. In Sixth Symposium on Operating System Design and Implementation, pages 137--150, 2004.

Digital Library

[13]

D.G.Clayton and J.Kaldor. Empirical bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics, 43:671--681, 1987.

[14]

A. Dobra and J. Gehrke. Secret: a scalable linear regression tree algorithm. In KDD '02, pages 481--487, 2002.

Digital Library

[15]

R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2nd Edition). Wiley-Interscience, 2001.

Digital Library

[16]

M. Dudik, D. M. Blei, and R. E. Schapire. Hierarchical maximum entropy density estimation. In ICML '07: Proceedings of the 24th international conference on Machine learning, pages 249--256, 2007.

Digital Library

[17]

W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In KDD '01, pages 67--76, 2001.

Digital Library

[18]

U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1022--1027, 1993.

[19]

J. H. Friedman, T. Hastie, and R. Tibshirani. Response to mease and wyner,evidence contrary to the statistical view of boosting. Journal of Machine Learning Research, 9:1--26, 2008.

Digital Library

[20]

G.M.Fulgoni and M.P.Morn. How online advertising works:wither the click? Empirical Generalizations in Advertising Conference for Industry and Academia, 2008.

[21]

H. Ishwaran and J. Rao. Spike and Slab Variable Selection: Frequentist and Bayesian Strategies. Annals of Statistics, 33(2):730--773, 2005.

[22]

R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35--45, 1960.

[23]

R. Kohavi and R. Quinlan. Decision Tree Discovery. Handbook of Data Mining and Knowledge Discovery,Oxford University Press, 2002.

[24]

B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. PLANET: Massively parallel learning of tree ensembles with MapReduce. In VLDB'09, pages 1426--1437, 2009.

Digital Library

[25]

D. S. Vogel, O. Asparouhov, and T. Scheffer. Scalable Look-Ahead Linear Regression Trees. In KDD'07, pages 757--764, 2007.

Digital Library

[26]

K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 1113--1120, 2009.

Digital Library

[27]

B. Zadrozny and C. Elkan. Learning and making decisions when costs and probabilities are both unknown. In KDD '01, pages 204--213, 2001.

Digital Library

Cited By

Gharibshah ZZhu X(2021)User Response Prediction in Online AdvertisingACM Computing Surveys10.1145/344666254:3(1-43)Online publication date: 8-May-2021
https://dl.acm.org/doi/10.1145/3446662
Bhamidipati NKant RMishra SLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)A Large Scale Prediction Engine for App Install Clicks and ConversionsProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132868(167-175)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132868
Kalyanakrishnan SSingh DKant RLi JWang XGarofalakis MSoboroff ISuel TWang M(2014)On Building Decision Trees from Large-scale Data in Applications of On-line AdvertisingProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2662044(669-678)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2661829.2662044
Show More Cited By

Index Terms

Temporal multi-hierarchy smoothing for estimating rates of rare events
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Mathematics of computing
  1. Information theory

Recommendations

Estimating rates of rare events with multiple hierarchies through scalable log-linear models
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

We consider the problem of estimating rates of rare events for high dimensional, multivariate categorical data where several dimensions are hierarchical. Such problems are routine in several data mining applications including computational advertising, ...
A studentized permutation test for the non-parametric Behrens-Fisher problem

For the non-parametric Behrens-Fisher problem a permutation test based on the studentized rank statistic of Brunner and Munzel is proposed. This procedure is applicable to count or ordered categorical data. By applying the central limit theorem of ...
Estimating rates of rare events through a multidimensional dynamic hierarchical Bayesian framework

We consider the problem of estimating occurrence rates of rare events for extremely sparse data using pre-existing hierarchies and selected features to perform inference along multiple dimensions. In particular, we focus on the problem of estimating ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2011

1446 pages

ISBN:9781450308137

DOI:10.1145/2020408

General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

KDD '11

Sponsor:

KDD '11: The 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2011

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
350
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gharibshah ZZhu X(2021)User Response Prediction in Online AdvertisingACM Computing Surveys10.1145/344666254:3(1-43)Online publication date: 8-May-2021
https://dl.acm.org/doi/10.1145/3446662
Bhamidipati NKant RMishra SLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)A Large Scale Prediction Engine for App Install Clicks and ConversionsProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132868(167-175)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132868
Kalyanakrishnan SSingh DKant RLi JWang XGarofalakis MSoboroff ISuel TWang M(2014)On Building Decision Trees from Large-scale Data in Applications of On-line AdvertisingProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2662044(669-678)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2661829.2662044
Chapelle OManavoglu ERosales R(2014)Simple and Scalable Response Prediction for Display AdvertisingACM Transactions on Intelligent Systems and Technology10.1145/25321285:4(1-34)Online publication date: 29-Dec-2014
https://dl.acm.org/doi/10.1145/2532128

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten