skip to main content
10.1145/2501040.2501982acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Audience segment expansion using distributed in-database k-means clustering

Published:11 August 2013Publication History

ABSTRACT

Online display advertisers extensively use the concept of a user segment to cluster users into targetable groups. When the sizes of such segments are less than the desired value for campaign budgets, there is a need to use probabilistic modeling to expand the size. This process is termed look-alike modeling. Given the multitude of data providers and on-line data sources, there are thousands of segments for each targetable consumer extracted from billions of online (even offline) actions performed by millions of users. The majority of advertisers, marketers and publishers have to use large scale distributed infrastructures to create thousands of user segments on a daily basis. Developing accurate data mining models efficiently within such platforms is a challenging task. The volume and variety of data can be a significant bottleneck for non-disk resident algorithms, since operating time for training and scoring hundreds of segments with millions of targetable users is non-trivial.

In this paper, we present a novel k-means based distributed in-database algorithm for look-alike modeling implemented within the nPario database system. We demonstrate the utility of the algorithm: accurate, invariant of size and skew of the targetable audience(very few positive examples), and dependent linearly on the capacity and number of nodes in the distributed environment. To the best of our knowledge this is the first ever commercially deployed distributed look-alike modeling implementation to solve this problem. We compare the performance of our algorithm with other distributed and non-distributed look-alike modeling techniques, and report the results over a multi-core environment.

References

  1. D. J. Abadi, P. A. Boncz, and S. Harizopoulos. Column-oriented database systems. Proceedings of the VLDB Endowment, 2(2):1664--1665, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: How different are they really? In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 967--980. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bindra, S. Pokuri, K. Uppala, and A. Teredesai. Distributed big advertiser data mining. In 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pages 914--914. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Broder and V. Josifovski. Computational advertising MS&E239. Stanford University Course Materials, 2011.Google ScholarGoogle Scholar
  5. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. X. Feng, A. Kumar, B. Recht, and C. Ré. Towards a unified architecture for in-RDBMS analytics. In Proceedings of the 2012 International Conference on Management of Data, pages 325--336. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. A. Hartigan and M. A. Wong. Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, pages 100--108, 1979.Google ScholarGoogle Scholar
  8. J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, et al. The MADlib analytics library: or MAD skills, the SQL. Proceedings of the VLDB Endowment, 5(12):1700--1711, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM computing surveys (CSUR), 31(3):264--323, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. A. B. P. C. LLP. IAB Internet advertising revenue report. www.iab.net, 2011.Google ScholarGoogle Scholar
  11. A. Mangalampalli, A. Ratnaparkhi, A. O. Hatch, A. Bagherjeiran, R. Parekh, and V. Pudi. A feature-pair-based associative classification approach to look-alike modeling for conversion-oriented user-targeting in tail campaigns. In Proceedings of the 20th International Conference Companion on World Wide Web, pages 85--86. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Owen, R. Anil, T. Dunning, and E. Friedman. Mahout in Action. Manning Publications Co., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Sinha, V. Ahuja, and Y. Medury. Cluster analysis for consumer segmentation using a brand customer centricity calculator. Apeejay Business Review, page 68.Google ScholarGoogle Scholar
  14. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, et al. C-store: a column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, pages 553--564. VLDB Endowment, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Wang, D. Huo, J. Huang, Y. Xu, L. Yan, W. Sun, and X. Li. An approach for improving k-means algorithm on market segmentation. In 2010 International Conference on System Science and Engineering (ICSSE), pages 368--372. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Wedel and W. A. Kamakura. Market Segmentation: Conceptual and Methodological Foundations, volume 8. Springer, 2000.Google ScholarGoogle Scholar
  17. J. Yan, D. Shen, T. Mah, N. Liu, Z. Chen, and Y. Li. Behavioral targeting online advertising. Online Multimedia Advertising: Techniques and Technologies, pages 213--232, 2011.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Audience segment expansion using distributed in-database k-means clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ADKDD '13: Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
      August 2013
      49 pages
      ISBN:9781450323239
      DOI:10.1145/2501040

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 August 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate12of21submissions,57%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader