ABSTRACT
Online marketing platforms often store millions of website visitors' behavior as a large sparse matrix with rows as visitors and columns as behavior. These platforms allow marketers to conduct Audience Expansion, a technique to identify new audiences with similar behavior to the original target audiences. In this paper, we propose a method to achieve interactive Audience Expansion from millions of visitor data efficiently. Unlike other methods that undergo significant computations upon inputs, our approach provides interactive responses when a marketer inputs the target audiences and similarity measures. The idea is to apply data summarization technique on the large visitor matrix to obtain a small set of summaries representing the similarities in the matrix. We propose efficient algorithms to compute the data summaries on a distributed computing environment (i.e., Spark) and conduct the expansion using the summaries. Our experiment shows that our approach (1) provides 10 times more accurate and 27 times faster Audience Expansion results on real datasets and (2) achieves a 98% speed-up compared to straightforward data summarization implementations. We also present an interface to apply the algorithm for real-world scenarios.
- Yong-Yeol Ahn, James P Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature, Vol. 466, 7307 (2010), 761--764.Google ScholarCross Ref
- Sihem Amer-Yahia. 2018. Human Factors in Data Science. In ICDE. 1--12.Google Scholar
- Domenica Arlia and Massimo Coppola. 2001. Experiments in parallel clustering with DBSCAN. In ECPP. 326--331.Google Scholar
- Abraham Bagherjeiran, Andrew Hatch, Adwait Ratnaparkhi, and Rajesh Parekh. 2010. Large-scale customized models for advertisers. In ICDMW. 1029--1036.Google Scholar
- Ashish Bindra, Srinivasulu Pokuri, Krishna Uppala, and Ankur Teredesai. 2012. Distributed big advertiser data mining. In ICDMW. 914--914.Google Scholar
- Christian Böhm, Robert Noll, Claudia Plant, and Bianca Wackersreuther. 2009. Density-based clustering using graphics processors. In CIKM. 661--670.Google Scholar
- Marco Cavallo and cC aug atay Demiralp. 2018. Clustrophile 2: guided visual clustering analysis. TVCG, Vol. 25, 1 (2018), 267--276.Google Scholar
- Gromit Yeuk-Yin Chan, Fan Du, Ryan A. Rossi, Anup B. Rao, Eunyee Koh, Cláudio T. Silva, and Juliana Freire. 2020. Real-Time Clustering for Large Sparse Online Visitor Data. In WWW. 1049--1059.Google Scholar
- Gromit Yeuk-Yin Chan, Panpan Xu, Zeng Dai, and Liu Ren. 2018. textscViBr: Visualizing Bipartite Relations at Scale with the Minimum Description Length Principle. TVCG, Vol. 25, 1 (2018), 321--330.Google ScholarDigital Library
- Randell Cotta, Mingyang Hu, Dan Jiang, and Peizhou Liao. 2019. Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling. In WSDM. 483--491.Google Scholar
- Weiwei Cui, Hong Zhou, Huamin Qu, Pak Chung Wong, and Xiaoming Li. 2008. Geometry-based edge clustering for graph visualization. TVCG, Vol. 14, 6 (2008), 1277--1284.Google ScholarDigital Library
- Stephanie deWet and Jiafan Ou. 2019. Finding Users Who Act Alike: Transfer Learning for Expanding Advertiser Audiences. In KDD. 2251--2259.Google Scholar
- Inderjit S Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD. 269--274.Google Scholar
- Inderjit S Dhillon and Dharmendra S Modha. 2002. A data-clustering algorithm on distributed memory multiprocessors. In Large-scale parallel data mining. 245--260.Google Scholar
- Khoa D Doan, Pranjul Yadav, and Chandan K Reddy. 2019. Adversarial factorization autoencoder for look-alike modeling. In CIKM. 2803--2812.Google Scholar
- Fan Du, Catherine Plaisant, Neil Spring, and Ben Shneiderman. 2018. Visual interfaces for recommendation systems: Finding similar and dissimilar peers. TIST, Vol. 10, 1 (2018), 9.Google Scholar
- D Foti, D Lipari, Clara Pizzuti, and Domenico Talia. 2000. Scalable parallel clustering for data mining on multicomputers. In IPDPS. Springer, 390--398.Google ScholarCross Ref
- Lichuan Gu, Yueyue Han, Chao Wang, Wei Chen, Jun Jiao, and Xiaohui Yuan. 2019. Module overlapping structure detection in PPI using an improved link similarity-based Markov clustering algorithm. Neural Comp. and App., Vol. 31, 5 (2019), 1481--1490.Google ScholarDigital Library
- Jeffrey Heer and Maneesh Agrawala. 2008. Design considerations for collaborative visual analytics. Information visualization, Vol. 7, 1 (2008), 49--62.Google Scholar
- Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, and Remco Chang. 2009. iPCA: An Interactive System for PCA-based Visual Analytics. In Computer Graphics Forum, Vol. 28. 767--774.Google ScholarCross Ref
- Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, and Jeff Yuan. 2013. Focused matrix factorization for audience selection in display advertising. In ICDE. 386--397.Google Scholar
- Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melancc on. 2008a. Visual analytics: Definition, process, and challenges. In Information visualization. Springer, 154--175.Google Scholar
- Daniel A Keim, Florian Mansmann, Jörn Schneidewind, Jim Thomas, and Hartmut Ziegler. 2008b. Visual analytics: Scope and challenges. In Visual data mining. 76--90.Google Scholar
- Paul Kim and Sangwook Kim. 2015. Detecting overlapping and hierarchical communities in complex network using interaction-based edge clustering. Physica A: Stat. Mech. App., Vol. 417 (2015), 46--56.Google ScholarCross Ref
- Jihoon Ko, Yunbum Kook, and Kijung Shin. 2020. Incremental Lossless Graph Summarization. In KDD. 317--327.Google Scholar
- Bum Chul Kwon, Ben Eysenbach, Janu Verma, Kenney Ng, Christopher De Filippi, Walter F Stewart, and Adam Perer. 2017. Clustervision: Visual supervision of unsupervised clustering. TVCG, Vol. 24, 1 (2017), 142--151.Google ScholarCross Ref
- Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John Stasko, and Haesun Park. 2012. iVisClustering: An interactive visual document clustering via topic modeling. In Computer graphics forum, Vol. 31. 1155--1164.Google Scholar
- Kyuhan Lee, Hyeonsoo Jo, Jihoon Ko, Sungsu Lim, and Kijung Shin. 2020. SSumM: Sparse Summarization of Massive Graphs. arXiv:2006.01060 (2020).Google Scholar
- Gavin Li, Jaebong Kim, and Andy Feng. 2013. Yahoo audience expansion: migration from hadoop streaming to spark. Proc.of the Spark Summit (2013).Google Scholar
- Haishan Liu, David Pardoe, Kun Liu, Manoj Thakur, Frank Cao, and Chongzhe Li. 2016. Audience expansion for online social network advertising. In KDD. 165--174.Google Scholar
- Qiang Ma, Eeshan Wagh, Jiayi Wen, Zhen Xia, Robert Ormandi, and Datong Chen. 2016a. Score Look-Alike Audiences. In ICDMW. 647--654.Google Scholar
- Qiang Ma, Musen Wen, Zhen Xia, and Datong Chen. 2016b. A Sub-linear, Massive-scale Look-alike Audience Extension System A Massive-scale Look-alike Audience Extension. In Big Data, Streams, & Heterogeneous Mining Workshop. 51--67.Google Scholar
- Ashish Mangalampalli, Adwait Ratnaparkhi, Andrew O Hatch, Abraham Bagherjeiran, Rajesh Parekh, and Vikram Pudi. 2011. A feature-pair-based associative classification approach to look-alike modeling for conversion-oriented user-targeting in tail campaigns. In WWW. 85--86.Google Scholar
- Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph summarization with bounded error. In SIGMOD. 419--432.Google Scholar
- Vu Nguyen, Tu Dinh Nguyen, Trung Le, Svetha Venkatesh, and Dinh Phung. 2016. One-pass logistic regression for label-drift and large-scale classification on distributed systems. In ICDM. 1113--1118.Google Scholar
- Artem Popov and Daria Iakovleva. 2018. Adaptive look-alike targeting in social networks advertising. Procedia Computer Science, Vol. 136 (2018), 255--264.Google ScholarCross Ref
- Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of massive datasets .Google Scholar
- Archana Ramesh, Ankur Teredesai, Ashish Bindra, Sreenivasulu Pokuri, and Krishna Uppala. 2013. Audience segment expansion using distributed in-database k-means clustering. In Workshop on Data Mining for Online Advertising. 1--9.Google ScholarDigital Library
- Jorma Rissanen. 1978. Modeling by shortest data description. Automatica, Vol. 14, 5 (1978), 465--471.Google ScholarDigital Library
- Michael T Schaub and Santiago Segarra. 2018. Flow smoothing and denoising: graph signal processing in the edge-space. In GlobalSIP. IEEE, 735--739.Google Scholar
- Tobias Schreck, Jürgen Bernard, Tatiana Von Landesberger, and Jörn Kohlhammer. 2009. Visual cluster analysis of trajectory data with interactive kohonen maps. Information Visualization, Vol. 8, 1 (2009), 14--29.Google ScholarDigital Library
- Jianqiang Shen, Sahin Cem Geyik, and Ali Dasdan. 2015. Effective audience extension in online advertising. In KDD. 2099--2108.Google Scholar
- Chuan Shi, Yanan Cai, Di Fu, Yuxiao Dong, and Bin Wu. 2013. A link clustering based overlapping community detection algorithm. DKE, Vol. 87 (2013), 394--404.Google ScholarDigital Library
- Ryan W Solava, Ryan P Michaels, and Tijana Milenkovi?. 2012. Graphlet-based edge clustering reveals pathogen-interacting proteins. Bioinfo., Vol. 28, 18 (2012), i480--i486.Google ScholarDigital Library
- Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu. 2010. Discovering overlapping groups in social media. In ICDM. 569--578.Google Scholar
- John Wenskovitch, Ian Crandell, Naren Ramakrishnan, Leanna House, and Chris North. 2017. Towards a systematic combination of dimension reduction and clustering in visual analytics. TVCG, Vol. 24, 1 (2017), 131--141.Google ScholarCross Ref
- Weinan Zhang, Lingxi Chen, and Jun Wang. 2016. Implicit Look-Alike Modelling in Display Ads. In ECIR. 589--601.Google Scholar
- Weizhong Zhao, Huifang Ma, and Qing He. 2009. Parallel k-means clustering based on mapreduce. In CLOUD. 674--679.Google Scholar
- Hong Zhou, Xiaoru Yuan, Weiwei Cui, Huamin Qu, and Baoquan Chen. 2008. Energy-based hierarchical edge clustering of graphs. In Pac. Vis. Sym. 55--61.Google Scholar
- Chenyi Zhuang, Ziqi Liu, Zhiqiang Zhang, Yize Tan, Zhengwei Wu, Zhining Liu, Jianping Wei, Jinjie Gu, Guannan Zhang, Jun Zhou, et al. 2020. Hubble: An Industrial System for Audience Expansion in Mobile Marketing. In KDD. 2455--2463.Google Scholar
Index Terms
- Interactive Audience Expansion On Large Scale Online Visitor Data
Recommendations
Audience Expansion for Online Social Network Advertising
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningOnline social network advertising platforms, such as that provided by LinkedIn, generally allow marketers to specify targeting options so that their ads appear to a desired demographic. Audience Expansion is a technique developed at LinkedIn to simplify ...
Interactive stories and the audience: Why empathy is important
SPECIAL ISSUE: TV and Video Entertainment EnvironmentsInteractive narratives have long been advocated as having the potential to create more immersive and transformative experiences for audiences by adding the pleasure of agency. In practice, however, finding the balance between sufficient interactivity ...
Audience dynamics of online catch up TV
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebThis paper studies the demand for TV contents on online catch up platforms, in order to assess how catch up TV offers transform TV consumption. We build upon empirical data on French TV consumption in June 2011: a daily monitoring of online audience on ...
Comments