research-article

Audience segment expansion using distributed in-database k-means clustering

Authors:
Archana Ramesh

nPario Inc., Redmond, WA

nPario Inc., Redmond, WA
View Profile

,
Ankur Teredesai

University of Washington

University of Washington
View Profile

,
Ashish Bindra

nPario Inc., Redmond, WA

nPario Inc., Redmond, WA
View Profile

,
Sreenivasulu Pokuri

nPario Inc., Redmond, WA

nPario Inc., Redmond, WA
View Profile

,
Krishna Uppala

nPario Inc., Redmond, WA

nPario Inc., Redmond, WA
View Profile

ADKDD '13: Proceedings of the Seventh International Workshop on Data Mining for Online AdvertisingAugust 2013Article No.: 5Pages 1–9https://doi.org/10.1145/2501040.2501982

Published:11 August 2013Publication History

ADKDD '13: Proceedings of the Seventh International Workshop on Data Mining for Online Advertising

Pages 1–9

ABSTRACT

Online display advertisers extensively use the concept of a user segment to cluster users into targetable groups. When the sizes of such segments are less than the desired value for campaign budgets, there is a need to use probabilistic modeling to expand the size. This process is termed look-alike modeling. Given the multitude of data providers and on-line data sources, there are thousands of segments for each targetable consumer extracted from billions of online (even offline) actions performed by millions of users. The majority of advertisers, marketers and publishers have to use large scale distributed infrastructures to create thousands of user segments on a daily basis. Developing accurate data mining models efficiently within such platforms is a challenging task. The volume and variety of data can be a significant bottleneck for non-disk resident algorithms, since operating time for training and scoring hundreds of segments with millions of targetable users is non-trivial.

In this paper, we present a novel k-means based distributed in-database algorithm for look-alike modeling implemented within the nPario database system. We demonstrate the utility of the algorithm: accurate, invariant of size and skew of the targetable audience(very few positive examples), and dependent linearly on the capacity and number of nodes in the distributed environment. To the best of our knowledge this is the first ever commercially deployed distributed look-alike modeling implementation to solve this problem. We compare the performance of our algorithm with other distributed and non-distributed look-alike modeling techniques, and report the results over a multi-core environment.

References

D. J. Abadi, P. A. Boncz, and S. Harizopoulos. Column-oriented database systems. Proceedings of the VLDB Endowment, 2(2):1664--1665, 2009. Google ScholarDigital Library
D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: How different are they really? In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 967--980. ACM, 2008. Google ScholarDigital Library
A. Bindra, S. Pokuri, K. Uppala, and A. Teredesai. Distributed big advertiser data mining. In 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pages 914--914. IEEE, 2012. Google ScholarDigital Library
A. Broder and V. Josifovski. Computational advertising MS&E239. Stanford University Course Materials, 2011.Google Scholar
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
X. Feng, A. Kumar, B. Recht, and C. Ré. Towards a unified architecture for in-RDBMS analytics. In Proceedings of the 2012 International Conference on Management of Data, pages 325--336. ACM, 2012. Google ScholarDigital Library
J. A. Hartigan and M. A. Wong. Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, pages 100--108, 1979.Google Scholar
J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, et al. The MADlib analytics library: or MAD skills, the SQL. Proceedings of the VLDB Endowment, 5(12):1700--1711, 2012. Google ScholarDigital Library
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM computing surveys (CSUR), 31(3):264--323, 1999. Google ScholarDigital Library
I. A. B. P. C. LLP. IAB Internet advertising revenue report. www.iab.net, 2011.Google Scholar
A. Mangalampalli, A. Ratnaparkhi, A. O. Hatch, A. Bagherjeiran, R. Parekh, and V. Pudi. A feature-pair-based associative classification approach to look-alike modeling for conversion-oriented user-targeting in tail campaigns. In Proceedings of the 20th International Conference Companion on World Wide Web, pages 85--86. ACM, 2011. Google ScholarDigital Library
S. Owen, R. Anil, T. Dunning, and E. Friedman. Mahout in Action. Manning Publications Co., 2011. Google ScholarDigital Library
N. Sinha, V. Ahuja, and Y. Medury. Cluster analysis for consumer segmentation using a brand customer centricity calculator. Apeejay Business Review, page 68.Google Scholar
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, et al. C-store: a column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, pages 553--564. VLDB Endowment, 2005. Google ScholarDigital Library
H. Wang, D. Huo, J. Huang, Y. Xu, L. Yan, W. Sun, and X. Li. An approach for improving k-means algorithm on market segmentation. In 2010 International Conference on System Science and Engineering (ICSSE), pages 368--372. IEEE, 2010.Google ScholarCross Ref
M. Wedel and W. A. Kamakura. Market Segmentation: Conceptual and Methodological Foundations, volume 8. Springer, 2000.Google Scholar
J. Yan, D. Shen, T. Mah, N. Liu, Z. Chen, and Y. Li. Behavioral targeting online advertising. Online Multimedia Advertising: Techniques and Technologies, pages 213--232, 2011.Google ScholarCross Ref

Index Terms

Audience segment expansion using distributed in-database k-means clustering
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Audience Expansion for Online Social Network Advertising
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Online social network advertising platforms, such as that provided by LinkedIn, generally allow marketers to specify targeting options so that their ads appear to a desired demographic. Audience Expansion is a technique developed at LinkedIn to simplify ...
Read More
High-resolution imaging using a wideband MIMO radar system with two distributed arrays

Imaging a fast maneuvering target has been an active research area in past decades. Usually, an array antenna with multiple elements is implemented to avoid the motion compensations involved in the Inverse synthetic aperture radar (ISAR) imaging. ...
Read More
An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
FCST '15: Proceedings of the 2015 Ninth International Conference on Frontier of Computer Science and Technology

Clustering algorithm is applied to many fields, especially in the data mining. Due to the increasing number of the data, it's too hard for the clustering algorithm to afford the computation time in traditional computing model. When handling with big ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ADKDD '13: Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
August 2013
49 pages
ISBN:9781450323239
DOI:10.1145/2501040
Conference Chairs:
Esin Saka
Microsoft Corporation
,
Dou Shen
Baidu Inc.
,
Bin Gao
Microsoft Research Asia
,
Jun Yan
Microsoft Research Asia
,
Ying Li
Concurix Corporation
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate12of21submissions,57%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 451
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Audience segment expansion using distributed in-database k-means clustering

ADKDD '13: Proceedings of the Seventh International Workshop on Data Mining for Online Advertising

ABSTRACT

References

Cited By

Index Terms

Recommendations

Audience Expansion for Online Social Network Advertising

High-resolution imaging using a wideband MIMO radar system with two distributed arrays

An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Audience segment expansion using distributed in-database k-means clustering

ADKDD '13: Proceedings of the Seventh International Workshop on Data Mining for Online Advertising

ABSTRACT

References

Cited By

Index Terms

Recommendations

Audience Expansion for Online Social Network Advertising

High-resolution imaging using a wideband MIMO radar system with two distributed arrays

An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media