research-article

Public Access

Interactive Audience Expansion On Large Scale Online Visitor Data

Authors:

Gromit Yeuk-Yin Chan,

Cláudio T. Silva,

Juliana FreireAuthors Info & Claims

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 2621 - 2631

https://doi.org/10.1145/3447548.3467179

Published: 14 August 2021 Publication History

Abstract

Online marketing platforms often store millions of website visitors' behavior as a large sparse matrix with rows as visitors and columns as behavior. These platforms allow marketers to conduct Audience Expansion, a technique to identify new audiences with similar behavior to the original target audiences. In this paper, we propose a method to achieve interactive Audience Expansion from millions of visitor data efficiently. Unlike other methods that undergo significant computations upon inputs, our approach provides interactive responses when a marketer inputs the target audiences and similarity measures. The idea is to apply data summarization technique on the large visitor matrix to obtain a small set of summaries representing the similarities in the matrix. We propose efficient algorithms to compute the data summaries on a distributed computing environment (i.e., Spark) and conduct the expansion using the summaries. Our experiment shows that our approach (1) provides 10 times more accurate and 27 times faster Audience Expansion results on real datasets and (2) achieves a 98% speed-up compared to straightforward data summarization implementations. We also present an interface to apply the algorithm for real-world scenarios.

References

[1]

Yong-Yeol Ahn, James P Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature, Vol. 466, 7307 (2010), 761--764.

[2]

Sihem Amer-Yahia. 2018. Human Factors in Data Science. In ICDE. 1--12.

[3]

Domenica Arlia and Massimo Coppola. 2001. Experiments in parallel clustering with DBSCAN. In ECPP. 326--331.

[4]

Abraham Bagherjeiran, Andrew Hatch, Adwait Ratnaparkhi, and Rajesh Parekh. 2010. Large-scale customized models for advertisers. In ICDMW. 1029--1036.

[5]

Ashish Bindra, Srinivasulu Pokuri, Krishna Uppala, and Ankur Teredesai. 2012. Distributed big advertiser data mining. In ICDMW. 914--914.

[6]

Christian Böhm, Robert Noll, Claudia Plant, and Bianca Wackersreuther. 2009. Density-based clustering using graphics processors. In CIKM. 661--670.

[7]

Marco Cavallo and cC aug atay Demiralp. 2018. Clustrophile 2: guided visual clustering analysis. TVCG, Vol. 25, 1 (2018), 267--276.

[8]

Gromit Yeuk-Yin Chan, Fan Du, Ryan A. Rossi, Anup B. Rao, Eunyee Koh, Cláudio T. Silva, and Juliana Freire. 2020. Real-Time Clustering for Large Sparse Online Visitor Data. In WWW. 1049--1059.

[9]

Gromit Yeuk-Yin Chan, Panpan Xu, Zeng Dai, and Liu Ren. 2018. textscViBr: Visualizing Bipartite Relations at Scale with the Minimum Description Length Principle. TVCG, Vol. 25, 1 (2018), 321--330.

Digital Library

[10]

Randell Cotta, Mingyang Hu, Dan Jiang, and Peizhou Liao. 2019. Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling. In WSDM. 483--491.

[11]

Weiwei Cui, Hong Zhou, Huamin Qu, Pak Chung Wong, and Xiaoming Li. 2008. Geometry-based edge clustering for graph visualization. TVCG, Vol. 14, 6 (2008), 1277--1284.

Digital Library

[12]

Stephanie deWet and Jiafan Ou. 2019. Finding Users Who Act Alike: Transfer Learning for Expanding Advertiser Audiences. In KDD. 2251--2259.

[13]

Inderjit S Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD. 269--274.

[14]

Inderjit S Dhillon and Dharmendra S Modha. 2002. A data-clustering algorithm on distributed memory multiprocessors. In Large-scale parallel data mining. 245--260.

[15]

Khoa D Doan, Pranjul Yadav, and Chandan K Reddy. 2019. Adversarial factorization autoencoder for look-alike modeling. In CIKM. 2803--2812.

[16]

Fan Du, Catherine Plaisant, Neil Spring, and Ben Shneiderman. 2018. Visual interfaces for recommendation systems: Finding similar and dissimilar peers. TIST, Vol. 10, 1 (2018), 9.

[17]

D Foti, D Lipari, Clara Pizzuti, and Domenico Talia. 2000. Scalable parallel clustering for data mining on multicomputers. In IPDPS. Springer, 390--398.

[18]

Lichuan Gu, Yueyue Han, Chao Wang, Wei Chen, Jun Jiao, and Xiaohui Yuan. 2019. Module overlapping structure detection in PPI using an improved link similarity-based Markov clustering algorithm. Neural Comp. and App., Vol. 31, 5 (2019), 1481--1490.

Digital Library

[19]

Jeffrey Heer and Maneesh Agrawala. 2008. Design considerations for collaborative visual analytics. Information visualization, Vol. 7, 1 (2008), 49--62.

[20]

Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, and Remco Chang. 2009. iPCA: An Interactive System for PCA-based Visual Analytics. In Computer Graphics Forum, Vol. 28. 767--774.

Digital Library

[21]

Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, and Jeff Yuan. 2013. Focused matrix factorization for audience selection in display advertising. In ICDE. 386--397.

[22]

Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melancc on. 2008a. Visual analytics: Definition, process, and challenges. In Information visualization. Springer, 154--175.

[23]

Daniel A Keim, Florian Mansmann, Jörn Schneidewind, Jim Thomas, and Hartmut Ziegler. 2008b. Visual analytics: Scope and challenges. In Visual data mining. 76--90.

[24]

Paul Kim and Sangwook Kim. 2015. Detecting overlapping and hierarchical communities in complex network using interaction-based edge clustering. Physica A: Stat. Mech. App., Vol. 417 (2015), 46--56.

[25]

Jihoon Ko, Yunbum Kook, and Kijung Shin. 2020. Incremental Lossless Graph Summarization. In KDD. 317--327.

[26]

Bum Chul Kwon, Ben Eysenbach, Janu Verma, Kenney Ng, Christopher De Filippi, Walter F Stewart, and Adam Perer. 2017. Clustervision: Visual supervision of unsupervised clustering. TVCG, Vol. 24, 1 (2017), 142--151.

[27]

Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John Stasko, and Haesun Park. 2012. iVisClustering: An interactive visual document clustering via topic modeling. In Computer graphics forum, Vol. 31. 1155--1164.

[28]

Kyuhan Lee, Hyeonsoo Jo, Jihoon Ko, Sungsu Lim, and Kijung Shin. 2020. SSumM: Sparse Summarization of Massive Graphs. arXiv:2006.01060 (2020).

[29]

Gavin Li, Jaebong Kim, and Andy Feng. 2013. Yahoo audience expansion: migration from hadoop streaming to spark. Proc.of the Spark Summit (2013).

[30]

Haishan Liu, David Pardoe, Kun Liu, Manoj Thakur, Frank Cao, and Chongzhe Li. 2016. Audience expansion for online social network advertising. In KDD. 165--174.

[31]

Qiang Ma, Eeshan Wagh, Jiayi Wen, Zhen Xia, Robert Ormandi, and Datong Chen. 2016a. Score Look-Alike Audiences. In ICDMW. 647--654.

[32]

Qiang Ma, Musen Wen, Zhen Xia, and Datong Chen. 2016b. A Sub-linear, Massive-scale Look-alike Audience Extension System A Massive-scale Look-alike Audience Extension. In Big Data, Streams, & Heterogeneous Mining Workshop. 51--67.

[33]

Ashish Mangalampalli, Adwait Ratnaparkhi, Andrew O Hatch, Abraham Bagherjeiran, Rajesh Parekh, and Vikram Pudi. 2011. A feature-pair-based associative classification approach to look-alike modeling for conversion-oriented user-targeting in tail campaigns. In WWW. 85--86.

[34]

Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph summarization with bounded error. In SIGMOD. 419--432.

[35]

Vu Nguyen, Tu Dinh Nguyen, Trung Le, Svetha Venkatesh, and Dinh Phung. 2016. One-pass logistic regression for label-drift and large-scale classification on distributed systems. In ICDM. 1113--1118.

[36]

Artem Popov and Daria Iakovleva. 2018. Adaptive look-alike targeting in social networks advertising. Procedia Computer Science, Vol. 136 (2018), 255--264.

[37]

Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of massive datasets .

[38]

Archana Ramesh, Ankur Teredesai, Ashish Bindra, Sreenivasulu Pokuri, and Krishna Uppala. 2013. Audience segment expansion using distributed in-database k-means clustering. In Workshop on Data Mining for Online Advertising. 1--9.

Digital Library

[39]

Jorma Rissanen. 1978. Modeling by shortest data description. Automatica, Vol. 14, 5 (1978), 465--471.

Digital Library

[40]

Michael T Schaub and Santiago Segarra. 2018. Flow smoothing and denoising: graph signal processing in the edge-space. In GlobalSIP. IEEE, 735--739.

[41]

Tobias Schreck, Jürgen Bernard, Tatiana Von Landesberger, and Jörn Kohlhammer. 2009. Visual cluster analysis of trajectory data with interactive kohonen maps. Information Visualization, Vol. 8, 1 (2009), 14--29.

Digital Library

[42]

Jianqiang Shen, Sahin Cem Geyik, and Ali Dasdan. 2015. Effective audience extension in online advertising. In KDD. 2099--2108.

[43]

Chuan Shi, Yanan Cai, Di Fu, Yuxiao Dong, and Bin Wu. 2013. A link clustering based overlapping community detection algorithm. DKE, Vol. 87 (2013), 394--404.

Digital Library

[44]

Ryan W Solava, Ryan P Michaels, and Tijana Milenkovi?. 2012. Graphlet-based edge clustering reveals pathogen-interacting proteins. Bioinfo., Vol. 28, 18 (2012), i480--i486.

Digital Library

[45]

Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu. 2010. Discovering overlapping groups in social media. In ICDM. 569--578.

[46]

John Wenskovitch, Ian Crandell, Naren Ramakrishnan, Leanna House, and Chris North. 2017. Towards a systematic combination of dimension reduction and clustering in visual analytics. TVCG, Vol. 24, 1 (2017), 131--141.

[47]

Weinan Zhang, Lingxi Chen, and Jun Wang. 2016. Implicit Look-Alike Modelling in Display Ads. In ECIR. 589--601.

[48]

Weizhong Zhao, Huifang Ma, and Qing He. 2009. Parallel k-means clustering based on mapreduce. In CLOUD. 674--679.

[49]

Hong Zhou, Xiaoru Yuan, Weiwei Cui, Huamin Qu, and Baoquan Chen. 2008. Energy-based hierarchical edge clustering of graphs. In Pac. Vis. Sym. 55--61.

[50]

Chenyi Zhuang, Ziqi Liu, Zhiqiang Zhang, Yize Tan, Zhengwei Wu, Zhining Liu, Jianping Wei, Jinjie Gu, Guannan Zhang, Jun Zhou, et al. 2020. Hubble: An Industrial System for Audience Expansion in Mobile Marketing. In KDD. 2455--2463.

Cited By

Davis DGao HLegrand THaldar MDeng AZhao HHe LKatariya SSerra ESpezzano F(2024)Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680089(4454-4461)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680089
Zhuzhel VGrabar VKaploukhaya NRivera-Castro RMironova LZaytsev ABurnaev E(2024)No Two Users Are Alike: Generating Audiences with Neural Clustering for Temporal Point ProcessesDoklady Mathematics10.1134/S1064562423701661108:S2(S511-S528)Online publication date: 25-Mar-2024
https://doi.org/10.1134/S1064562423701661

Index Terms

Interactive Audience Expansion On Large Scale Online Visitor Data
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms

Recommendations

Audience Expansion for Online Social Network Advertising
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Online social network advertising platforms, such as that provided by LinkedIn, generally allow marketers to specify targeting options so that their ads appear to a desired demographic. Audience Expansion is a technique developed at LinkedIn to simplify ...
Interactive stories and the audience: Why empathy is important
SPECIAL ISSUE: TV and Video Entertainment Environments

Interactive narratives have long been advocated as having the potential to create more immersive and transformative experiences for audiences by adding the pleasure of agency. In practice, however, finding the balance between sufficient interactivity ...
Audience dynamics of online catch up TV
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

This paper studies the demand for TV contents on online catch up platforms, in order to assess how catch up TV offers transform TV consumption. We build upon empirical data on French TV consumption in June 2011: a daily monitoring of online audience on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

August 2021

4259 pages

ISBN:9781450383325

DOI:10.1145/3447548

General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

KDD '21

Sponsor:

KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2021

Virtual Event, Singapore

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
494
Total Downloads

Downloads (Last 12 months)121
Downloads (Last 6 weeks)16

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Davis DGao HLegrand THaldar MDeng AZhao HHe LKatariya SSerra ESpezzano F(2024)Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680089(4454-4461)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680089
Zhuzhel VGrabar VKaploukhaya NRivera-Castro RMironova LZaytsev ABurnaev E(2024)No Two Users Are Alike: Generating Audiences with Neural Clustering for Temporal Point ProcessesDoklady Mathematics10.1134/S1064562423701661108:S2(S511-S528)Online publication date: 25-Mar-2024
https://doi.org/10.1134/S1064562423701661

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten