skip to main content
10.1145/3447548.3467179acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Public Access

Interactive Audience Expansion On Large Scale Online Visitor Data

Published: 14 August 2021 Publication History


Online marketing platforms often store millions of website visitors' behavior as a large sparse matrix with rows as visitors and columns as behavior. These platforms allow marketers to conduct Audience Expansion, a technique to identify new audiences with similar behavior to the original target audiences. In this paper, we propose a method to achieve interactive Audience Expansion from millions of visitor data efficiently. Unlike other methods that undergo significant computations upon inputs, our approach provides interactive responses when a marketer inputs the target audiences and similarity measures. The idea is to apply data summarization technique on the large visitor matrix to obtain a small set of summaries representing the similarities in the matrix. We propose efficient algorithms to compute the data summaries on a distributed computing environment (i.e., Spark) and conduct the expansion using the summaries. Our experiment shows that our approach (1) provides 10 times more accurate and 27 times faster Audience Expansion results on real datasets and (2) achieves a 98% speed-up compared to straightforward data summarization implementations. We also present an interface to apply the algorithm for real-world scenarios.


Yong-Yeol Ahn, James P Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature, Vol. 466, 7307 (2010), 761--764.
Sihem Amer-Yahia. 2018. Human Factors in Data Science. In ICDE. 1--12.
Domenica Arlia and Massimo Coppola. 2001. Experiments in parallel clustering with DBSCAN. In ECPP. 326--331.
Abraham Bagherjeiran, Andrew Hatch, Adwait Ratnaparkhi, and Rajesh Parekh. 2010. Large-scale customized models for advertisers. In ICDMW. 1029--1036.
Ashish Bindra, Srinivasulu Pokuri, Krishna Uppala, and Ankur Teredesai. 2012. Distributed big advertiser data mining. In ICDMW. 914--914.
Christian Böhm, Robert Noll, Claudia Plant, and Bianca Wackersreuther. 2009. Density-based clustering using graphics processors. In CIKM. 661--670.
Marco Cavallo and cC aug atay Demiralp. 2018. Clustrophile 2: guided visual clustering analysis. TVCG, Vol. 25, 1 (2018), 267--276.
Gromit Yeuk-Yin Chan, Fan Du, Ryan A. Rossi, Anup B. Rao, Eunyee Koh, Cláudio T. Silva, and Juliana Freire. 2020. Real-Time Clustering for Large Sparse Online Visitor Data. In WWW. 1049--1059.
Gromit Yeuk-Yin Chan, Panpan Xu, Zeng Dai, and Liu Ren. 2018. textscViBr: Visualizing Bipartite Relations at Scale with the Minimum Description Length Principle. TVCG, Vol. 25, 1 (2018), 321--330.
Randell Cotta, Mingyang Hu, Dan Jiang, and Peizhou Liao. 2019. Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling. In WSDM. 483--491.
Weiwei Cui, Hong Zhou, Huamin Qu, Pak Chung Wong, and Xiaoming Li. 2008. Geometry-based edge clustering for graph visualization. TVCG, Vol. 14, 6 (2008), 1277--1284.
Stephanie deWet and Jiafan Ou. 2019. Finding Users Who Act Alike: Transfer Learning for Expanding Advertiser Audiences. In KDD. 2251--2259.
Inderjit S Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD. 269--274.
Inderjit S Dhillon and Dharmendra S Modha. 2002. A data-clustering algorithm on distributed memory multiprocessors. In Large-scale parallel data mining. 245--260.
Khoa D Doan, Pranjul Yadav, and Chandan K Reddy. 2019. Adversarial factorization autoencoder for look-alike modeling. In CIKM. 2803--2812.
Fan Du, Catherine Plaisant, Neil Spring, and Ben Shneiderman. 2018. Visual interfaces for recommendation systems: Finding similar and dissimilar peers. TIST, Vol. 10, 1 (2018), 9.
D Foti, D Lipari, Clara Pizzuti, and Domenico Talia. 2000. Scalable parallel clustering for data mining on multicomputers. In IPDPS. Springer, 390--398.
Lichuan Gu, Yueyue Han, Chao Wang, Wei Chen, Jun Jiao, and Xiaohui Yuan. 2019. Module overlapping structure detection in PPI using an improved link similarity-based Markov clustering algorithm. Neural Comp. and App., Vol. 31, 5 (2019), 1481--1490.
Jeffrey Heer and Maneesh Agrawala. 2008. Design considerations for collaborative visual analytics. Information visualization, Vol. 7, 1 (2008), 49--62.
Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, and Remco Chang. 2009. iPCA: An Interactive System for PCA-based Visual Analytics. In Computer Graphics Forum, Vol. 28. 767--774.
Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, and Jeff Yuan. 2013. Focused matrix factorization for audience selection in display advertising. In ICDE. 386--397.
Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melancc on. 2008a. Visual analytics: Definition, process, and challenges. In Information visualization. Springer, 154--175.
Daniel A Keim, Florian Mansmann, Jörn Schneidewind, Jim Thomas, and Hartmut Ziegler. 2008b. Visual analytics: Scope and challenges. In Visual data mining. 76--90.
Paul Kim and Sangwook Kim. 2015. Detecting overlapping and hierarchical communities in complex network using interaction-based edge clustering. Physica A: Stat. Mech. App., Vol. 417 (2015), 46--56.
Jihoon Ko, Yunbum Kook, and Kijung Shin. 2020. Incremental Lossless Graph Summarization. In KDD. 317--327.
Bum Chul Kwon, Ben Eysenbach, Janu Verma, Kenney Ng, Christopher De Filippi, Walter F Stewart, and Adam Perer. 2017. Clustervision: Visual supervision of unsupervised clustering. TVCG, Vol. 24, 1 (2017), 142--151.
Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John Stasko, and Haesun Park. 2012. iVisClustering: An interactive visual document clustering via topic modeling. In Computer graphics forum, Vol. 31. 1155--1164.
Kyuhan Lee, Hyeonsoo Jo, Jihoon Ko, Sungsu Lim, and Kijung Shin. 2020. SSumM: Sparse Summarization of Massive Graphs. arXiv:2006.01060 (2020).
Gavin Li, Jaebong Kim, and Andy Feng. 2013. Yahoo audience expansion: migration from hadoop streaming to spark. Proc.of the Spark Summit (2013).
Haishan Liu, David Pardoe, Kun Liu, Manoj Thakur, Frank Cao, and Chongzhe Li. 2016. Audience expansion for online social network advertising. In KDD. 165--174.
Qiang Ma, Eeshan Wagh, Jiayi Wen, Zhen Xia, Robert Ormandi, and Datong Chen. 2016a. Score Look-Alike Audiences. In ICDMW. 647--654.
Qiang Ma, Musen Wen, Zhen Xia, and Datong Chen. 2016b. A Sub-linear, Massive-scale Look-alike Audience Extension System A Massive-scale Look-alike Audience Extension. In Big Data, Streams, & Heterogeneous Mining Workshop. 51--67.
Ashish Mangalampalli, Adwait Ratnaparkhi, Andrew O Hatch, Abraham Bagherjeiran, Rajesh Parekh, and Vikram Pudi. 2011. A feature-pair-based associative classification approach to look-alike modeling for conversion-oriented user-targeting in tail campaigns. In WWW. 85--86.
Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph summarization with bounded error. In SIGMOD. 419--432.
Vu Nguyen, Tu Dinh Nguyen, Trung Le, Svetha Venkatesh, and Dinh Phung. 2016. One-pass logistic regression for label-drift and large-scale classification on distributed systems. In ICDM. 1113--1118.
Artem Popov and Daria Iakovleva. 2018. Adaptive look-alike targeting in social networks advertising. Procedia Computer Science, Vol. 136 (2018), 255--264.
Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of massive datasets .
Archana Ramesh, Ankur Teredesai, Ashish Bindra, Sreenivasulu Pokuri, and Krishna Uppala. 2013. Audience segment expansion using distributed in-database k-means clustering. In Workshop on Data Mining for Online Advertising. 1--9.
Jorma Rissanen. 1978. Modeling by shortest data description. Automatica, Vol. 14, 5 (1978), 465--471.
Michael T Schaub and Santiago Segarra. 2018. Flow smoothing and denoising: graph signal processing in the edge-space. In GlobalSIP. IEEE, 735--739.
Tobias Schreck, Jürgen Bernard, Tatiana Von Landesberger, and Jörn Kohlhammer. 2009. Visual cluster analysis of trajectory data with interactive kohonen maps. Information Visualization, Vol. 8, 1 (2009), 14--29.
Jianqiang Shen, Sahin Cem Geyik, and Ali Dasdan. 2015. Effective audience extension in online advertising. In KDD. 2099--2108.
Chuan Shi, Yanan Cai, Di Fu, Yuxiao Dong, and Bin Wu. 2013. A link clustering based overlapping community detection algorithm. DKE, Vol. 87 (2013), 394--404.
Ryan W Solava, Ryan P Michaels, and Tijana Milenkovi?. 2012. Graphlet-based edge clustering reveals pathogen-interacting proteins. Bioinfo., Vol. 28, 18 (2012), i480--i486.
Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu. 2010. Discovering overlapping groups in social media. In ICDM. 569--578.
John Wenskovitch, Ian Crandell, Naren Ramakrishnan, Leanna House, and Chris North. 2017. Towards a systematic combination of dimension reduction and clustering in visual analytics. TVCG, Vol. 24, 1 (2017), 131--141.
Weinan Zhang, Lingxi Chen, and Jun Wang. 2016. Implicit Look-Alike Modelling in Display Ads. In ECIR. 589--601.
Weizhong Zhao, Huifang Ma, and Qing He. 2009. Parallel k-means clustering based on mapreduce. In CLOUD. 674--679.
Hong Zhou, Xiaoru Yuan, Weiwei Cui, Huamin Qu, and Baoquan Chen. 2008. Energy-based hierarchical edge clustering of graphs. In Pac. Vis. Sym. 55--61.
Chenyi Zhuang, Ziqi Liu, Zhiqiang Zhang, Yize Tan, Zhengwei Wu, Zhining Liu, Jianping Wei, Jinjie Gu, Guannan Zhang, Jun Zhou, et al. 2020. Hubble: An Industrial System for Audience Expansion in Mobile Marketing. In KDD. 2455--2463.

Cited By

View all
  • (2024)Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680089(4454-4461)Online publication date: 21-Oct-2024
  • (2024)No Two Users Are Alike: Generating Audiences with Neural Clustering for Temporal Point ProcessesDoklady Mathematics10.1134/S1064562423701661108:S2(S511-S528)Online publication date: 25-Mar-2024

Index Terms

  1. Interactive Audience Expansion On Large Scale Online Visitor Data



    Information & Contributors


    Published In

    cover image ACM Conferences
    KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
    August 2021
    4259 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2021


    Request permissions for this article.

    Check for updates

    Author Tags

    1. interactive audience expansion
    2. look-alike modeling


    • Research-article

    Funding Sources


    KDD '21

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)121
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 17 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2024)Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680089(4454-4461)Online publication date: 21-Oct-2024
    • (2024)No Two Users Are Alike: Generating Audiences with Neural Clustering for Temporal Point ProcessesDoklady Mathematics10.1134/S1064562423701661108:S2(S511-S528)Online publication date: 25-Mar-2024

    View Options

    View options


    View or Download as a PDF file.



    View online with eReader.


    Login options






    Share this Publication link

    Share on social media