skip to main content
10.1145/2020408.2020610acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Temporal multi-hierarchy smoothing for estimating rates of rare events

Published: 21 August 2011 Publication History

Abstract

We consider the problem of estimating rates of rare events obtained through interactions among several categorical variables that are heavy-tailed and hierarchical. In our previous work, we proposed a scalable log-linear model called LMMH (Log-Linear Models for Multiple Hierarchies) that combats data sparsity at granular levels through small sample size corrections that borrow strength from rate estimates at coarser resolutions. This paper extends our previous work in two directions. First, we model excess heterogeneity by fitting local LMMH models to relatively homogeneous subsets of the data. To ensure scalable computation, these subsets are induced through a decision tree, we call this Treed-LMMH. Second, the Treed-LMMH method is coupled with temporal smoothing procedure based on a fast Kalman filter style algorithm. We show that simultaneously performing hierarchical and temporal smoothing leads to significant improvement in predictive accuracy. Our methods are illustrated on a large scale computational advertising dataset consisting of billions of observations and hundreds of millions of attribute combinations(cells).

References

[1]
D. Agarwal, R. Agrawal, R. Khanna, and N. Kota. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In KDD '10, pages 213--222, 2010.
[2]
D. Agarwal, A. Z. Broder, D. Chakrabarti, D. Diklic, V. Josifovski, and M. Sayyadian. Estimating rates of rare events at multiple resolutions. In KDD '07, pages 16--25, 2007.
[3]
D. Agarwal, B. C. Chen, P. Elango, N. Motgi, S. T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, pages 17--24, 2008.
[4]
A.Gelman and J.Hill. Data Analysis using Regression/Multi-level Hierarchical Models. Cambridge University Press, 2007.
[5]
J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B, 48:259--302, 1986.
[6]
L. Brieman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, 1984.
[7]
A. Broder. Computational advertising. In SODA '08: Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 992--992, 2008.
[8]
A. Broder. Computational advertising and recommender systems. In Proceedings of the 2008 ACM conference on Recommender systems, pages 1--2. ACM, 2008.
[9]
Y. Chen, D. Pavlov, and J. F. Canny. Large-scale behavioral targeting. In KDD '09, pages 209--218, 2009.
[10]
H. A. Chipman, E. I. George, and R. E. McCulloch. Bayesian treed models. Machine Learning, 48:299--320, 2002.
[11]
A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web, pages 271--280. ACM, 2007.
[12]
J. Dean and S. Ghemawat. Mapreduce:simplified data processing on large clusters. In Sixth Symposium on Operating System Design and Implementation, pages 137--150, 2004.
[13]
D.G.Clayton and J.Kaldor. Empirical bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics, 43:671--681, 1987.
[14]
A. Dobra and J. Gehrke. Secret: a scalable linear regression tree algorithm. In KDD '02, pages 481--487, 2002.
[15]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2nd Edition). Wiley-Interscience, 2001.
[16]
M. Dudik, D. M. Blei, and R. E. Schapire. Hierarchical maximum entropy density estimation. In ICML '07: Proceedings of the 24th international conference on Machine learning, pages 249--256, 2007.
[17]
W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item associations. In KDD '01, pages 67--76, 2001.
[18]
U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1022--1027, 1993.
[19]
J. H. Friedman, T. Hastie, and R. Tibshirani. Response to mease and wyner,evidence contrary to the statistical view of boosting. Journal of Machine Learning Research, 9:1--26, 2008.
[20]
G.M.Fulgoni and M.P.Morn. How online advertising works:wither the click? Empirical Generalizations in Advertising Conference for Industry and Academia, 2008.
[21]
H. Ishwaran and J. Rao. Spike and Slab Variable Selection: Frequentist and Bayesian Strategies. Annals of Statistics, 33(2):730--773, 2005.
[22]
R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35--45, 1960.
[23]
R. Kohavi and R. Quinlan. Decision Tree Discovery. Handbook of Data Mining and Knowledge Discovery,Oxford University Press, 2002.
[24]
B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. PLANET: Massively parallel learning of tree ensembles with MapReduce. In VLDB'09, pages 1426--1437, 2009.
[25]
D. S. Vogel, O. Asparouhov, and T. Scheffer. Scalable Look-Ahead Linear Regression Trees. In KDD'07, pages 757--764, 2007.
[26]
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 1113--1120, 2009.
[27]
B. Zadrozny and C. Elkan. Learning and making decisions when costs and probabilities are both unknown. In KDD '01, pages 204--213, 2001.

Cited By

View all
  • (2021)User Response Prediction in Online AdvertisingACM Computing Surveys10.1145/344666254:3(1-43)Online publication date: 8-May-2021
  • (2017)A Large Scale Prediction Engine for App Install Clicks and ConversionsProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132868(167-175)Online publication date: 6-Nov-2017
  • (2014)On Building Decision Trees from Large-scale Data in Applications of On-line AdvertisingProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2662044(669-678)Online publication date: 3-Nov-2014
  • Show More Cited By

Index Terms

  1. Temporal multi-hierarchy smoothing for estimating rates of rare events

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 August 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. computational advertising
      2. count data
      3. decision trees
      4. display advertising
      5. kalman filtering
      6. multi-hierarchy smoother

      Qualifiers

      • Poster

      Conference

      KDD '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)User Response Prediction in Online AdvertisingACM Computing Surveys10.1145/344666254:3(1-43)Online publication date: 8-May-2021
      • (2017)A Large Scale Prediction Engine for App Install Clicks and ConversionsProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132868(167-175)Online publication date: 6-Nov-2017
      • (2014)On Building Decision Trees from Large-scale Data in Applications of On-line AdvertisingProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2662044(669-678)Online publication date: 3-Nov-2014
      • (2014)Simple and Scalable Response Prediction for Display AdvertisingACM Transactions on Intelligent Systems and Technology10.1145/25321285:4(1-34)Online publication date: 29-Dec-2014

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media