skip to main content
10.1145/3447548.3467179acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Interactive Audience Expansion On Large Scale Online Visitor Data

Published:14 August 2021Publication History

ABSTRACT

Online marketing platforms often store millions of website visitors' behavior as a large sparse matrix with rows as visitors and columns as behavior. These platforms allow marketers to conduct Audience Expansion, a technique to identify new audiences with similar behavior to the original target audiences. In this paper, we propose a method to achieve interactive Audience Expansion from millions of visitor data efficiently. Unlike other methods that undergo significant computations upon inputs, our approach provides interactive responses when a marketer inputs the target audiences and similarity measures. The idea is to apply data summarization technique on the large visitor matrix to obtain a small set of summaries representing the similarities in the matrix. We propose efficient algorithms to compute the data summaries on a distributed computing environment (i.e., Spark) and conduct the expansion using the summaries. Our experiment shows that our approach (1) provides 10 times more accurate and 27 times faster Audience Expansion results on real datasets and (2) achieves a 98% speed-up compared to straightforward data summarization implementations. We also present an interface to apply the algorithm for real-world scenarios.

References

  1. Yong-Yeol Ahn, James P Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature, Vol. 466, 7307 (2010), 761--764.Google ScholarGoogle ScholarCross RefCross Ref
  2. Sihem Amer-Yahia. 2018. Human Factors in Data Science. In ICDE. 1--12.Google ScholarGoogle Scholar
  3. Domenica Arlia and Massimo Coppola. 2001. Experiments in parallel clustering with DBSCAN. In ECPP. 326--331.Google ScholarGoogle Scholar
  4. Abraham Bagherjeiran, Andrew Hatch, Adwait Ratnaparkhi, and Rajesh Parekh. 2010. Large-scale customized models for advertisers. In ICDMW. 1029--1036.Google ScholarGoogle Scholar
  5. Ashish Bindra, Srinivasulu Pokuri, Krishna Uppala, and Ankur Teredesai. 2012. Distributed big advertiser data mining. In ICDMW. 914--914.Google ScholarGoogle Scholar
  6. Christian Böhm, Robert Noll, Claudia Plant, and Bianca Wackersreuther. 2009. Density-based clustering using graphics processors. In CIKM. 661--670.Google ScholarGoogle Scholar
  7. Marco Cavallo and cC aug atay Demiralp. 2018. Clustrophile 2: guided visual clustering analysis. TVCG, Vol. 25, 1 (2018), 267--276.Google ScholarGoogle Scholar
  8. Gromit Yeuk-Yin Chan, Fan Du, Ryan A. Rossi, Anup B. Rao, Eunyee Koh, Cláudio T. Silva, and Juliana Freire. 2020. Real-Time Clustering for Large Sparse Online Visitor Data. In WWW. 1049--1059.Google ScholarGoogle Scholar
  9. Gromit Yeuk-Yin Chan, Panpan Xu, Zeng Dai, and Liu Ren. 2018. textscViBr: Visualizing Bipartite Relations at Scale with the Minimum Description Length Principle. TVCG, Vol. 25, 1 (2018), 321--330.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Randell Cotta, Mingyang Hu, Dan Jiang, and Peizhou Liao. 2019. Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling. In WSDM. 483--491.Google ScholarGoogle Scholar
  11. Weiwei Cui, Hong Zhou, Huamin Qu, Pak Chung Wong, and Xiaoming Li. 2008. Geometry-based edge clustering for graph visualization. TVCG, Vol. 14, 6 (2008), 1277--1284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Stephanie deWet and Jiafan Ou. 2019. Finding Users Who Act Alike: Transfer Learning for Expanding Advertiser Audiences. In KDD. 2251--2259.Google ScholarGoogle Scholar
  13. Inderjit S Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD. 269--274.Google ScholarGoogle Scholar
  14. Inderjit S Dhillon and Dharmendra S Modha. 2002. A data-clustering algorithm on distributed memory multiprocessors. In Large-scale parallel data mining. 245--260.Google ScholarGoogle Scholar
  15. Khoa D Doan, Pranjul Yadav, and Chandan K Reddy. 2019. Adversarial factorization autoencoder for look-alike modeling. In CIKM. 2803--2812.Google ScholarGoogle Scholar
  16. Fan Du, Catherine Plaisant, Neil Spring, and Ben Shneiderman. 2018. Visual interfaces for recommendation systems: Finding similar and dissimilar peers. TIST, Vol. 10, 1 (2018), 9.Google ScholarGoogle Scholar
  17. D Foti, D Lipari, Clara Pizzuti, and Domenico Talia. 2000. Scalable parallel clustering for data mining on multicomputers. In IPDPS. Springer, 390--398.Google ScholarGoogle ScholarCross RefCross Ref
  18. Lichuan Gu, Yueyue Han, Chao Wang, Wei Chen, Jun Jiao, and Xiaohui Yuan. 2019. Module overlapping structure detection in PPI using an improved link similarity-based Markov clustering algorithm. Neural Comp. and App., Vol. 31, 5 (2019), 1481--1490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeffrey Heer and Maneesh Agrawala. 2008. Design considerations for collaborative visual analytics. Information visualization, Vol. 7, 1 (2008), 49--62.Google ScholarGoogle Scholar
  20. Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, and Remco Chang. 2009. iPCA: An Interactive System for PCA-based Visual Analytics. In Computer Graphics Forum, Vol. 28. 767--774.Google ScholarGoogle ScholarCross RefCross Ref
  21. Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, and Jeff Yuan. 2013. Focused matrix factorization for audience selection in display advertising. In ICDE. 386--397.Google ScholarGoogle Scholar
  22. Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melancc on. 2008a. Visual analytics: Definition, process, and challenges. In Information visualization. Springer, 154--175.Google ScholarGoogle Scholar
  23. Daniel A Keim, Florian Mansmann, Jörn Schneidewind, Jim Thomas, and Hartmut Ziegler. 2008b. Visual analytics: Scope and challenges. In Visual data mining. 76--90.Google ScholarGoogle Scholar
  24. Paul Kim and Sangwook Kim. 2015. Detecting overlapping and hierarchical communities in complex network using interaction-based edge clustering. Physica A: Stat. Mech. App., Vol. 417 (2015), 46--56.Google ScholarGoogle ScholarCross RefCross Ref
  25. Jihoon Ko, Yunbum Kook, and Kijung Shin. 2020. Incremental Lossless Graph Summarization. In KDD. 317--327.Google ScholarGoogle Scholar
  26. Bum Chul Kwon, Ben Eysenbach, Janu Verma, Kenney Ng, Christopher De Filippi, Walter F Stewart, and Adam Perer. 2017. Clustervision: Visual supervision of unsupervised clustering. TVCG, Vol. 24, 1 (2017), 142--151.Google ScholarGoogle ScholarCross RefCross Ref
  27. Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John Stasko, and Haesun Park. 2012. iVisClustering: An interactive visual document clustering via topic modeling. In Computer graphics forum, Vol. 31. 1155--1164.Google ScholarGoogle Scholar
  28. Kyuhan Lee, Hyeonsoo Jo, Jihoon Ko, Sungsu Lim, and Kijung Shin. 2020. SSumM: Sparse Summarization of Massive Graphs. arXiv:2006.01060 (2020).Google ScholarGoogle Scholar
  29. Gavin Li, Jaebong Kim, and Andy Feng. 2013. Yahoo audience expansion: migration from hadoop streaming to spark. Proc.of the Spark Summit (2013).Google ScholarGoogle Scholar
  30. Haishan Liu, David Pardoe, Kun Liu, Manoj Thakur, Frank Cao, and Chongzhe Li. 2016. Audience expansion for online social network advertising. In KDD. 165--174.Google ScholarGoogle Scholar
  31. Qiang Ma, Eeshan Wagh, Jiayi Wen, Zhen Xia, Robert Ormandi, and Datong Chen. 2016a. Score Look-Alike Audiences. In ICDMW. 647--654.Google ScholarGoogle Scholar
  32. Qiang Ma, Musen Wen, Zhen Xia, and Datong Chen. 2016b. A Sub-linear, Massive-scale Look-alike Audience Extension System A Massive-scale Look-alike Audience Extension. In Big Data, Streams, & Heterogeneous Mining Workshop. 51--67.Google ScholarGoogle Scholar
  33. Ashish Mangalampalli, Adwait Ratnaparkhi, Andrew O Hatch, Abraham Bagherjeiran, Rajesh Parekh, and Vikram Pudi. 2011. A feature-pair-based associative classification approach to look-alike modeling for conversion-oriented user-targeting in tail campaigns. In WWW. 85--86.Google ScholarGoogle Scholar
  34. Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph summarization with bounded error. In SIGMOD. 419--432.Google ScholarGoogle Scholar
  35. Vu Nguyen, Tu Dinh Nguyen, Trung Le, Svetha Venkatesh, and Dinh Phung. 2016. One-pass logistic regression for label-drift and large-scale classification on distributed systems. In ICDM. 1113--1118.Google ScholarGoogle Scholar
  36. Artem Popov and Daria Iakovleva. 2018. Adaptive look-alike targeting in social networks advertising. Procedia Computer Science, Vol. 136 (2018), 255--264.Google ScholarGoogle ScholarCross RefCross Ref
  37. Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of massive datasets .Google ScholarGoogle Scholar
  38. Archana Ramesh, Ankur Teredesai, Ashish Bindra, Sreenivasulu Pokuri, and Krishna Uppala. 2013. Audience segment expansion using distributed in-database k-means clustering. In Workshop on Data Mining for Online Advertising. 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jorma Rissanen. 1978. Modeling by shortest data description. Automatica, Vol. 14, 5 (1978), 465--471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Michael T Schaub and Santiago Segarra. 2018. Flow smoothing and denoising: graph signal processing in the edge-space. In GlobalSIP. IEEE, 735--739.Google ScholarGoogle Scholar
  41. Tobias Schreck, Jürgen Bernard, Tatiana Von Landesberger, and Jörn Kohlhammer. 2009. Visual cluster analysis of trajectory data with interactive kohonen maps. Information Visualization, Vol. 8, 1 (2009), 14--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jianqiang Shen, Sahin Cem Geyik, and Ali Dasdan. 2015. Effective audience extension in online advertising. In KDD. 2099--2108.Google ScholarGoogle Scholar
  43. Chuan Shi, Yanan Cai, Di Fu, Yuxiao Dong, and Bin Wu. 2013. A link clustering based overlapping community detection algorithm. DKE, Vol. 87 (2013), 394--404.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ryan W Solava, Ryan P Michaels, and Tijana Milenkovi?. 2012. Graphlet-based edge clustering reveals pathogen-interacting proteins. Bioinfo., Vol. 28, 18 (2012), i480--i486.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu. 2010. Discovering overlapping groups in social media. In ICDM. 569--578.Google ScholarGoogle Scholar
  46. John Wenskovitch, Ian Crandell, Naren Ramakrishnan, Leanna House, and Chris North. 2017. Towards a systematic combination of dimension reduction and clustering in visual analytics. TVCG, Vol. 24, 1 (2017), 131--141.Google ScholarGoogle ScholarCross RefCross Ref
  47. Weinan Zhang, Lingxi Chen, and Jun Wang. 2016. Implicit Look-Alike Modelling in Display Ads. In ECIR. 589--601.Google ScholarGoogle Scholar
  48. Weizhong Zhao, Huifang Ma, and Qing He. 2009. Parallel k-means clustering based on mapreduce. In CLOUD. 674--679.Google ScholarGoogle Scholar
  49. Hong Zhou, Xiaoru Yuan, Weiwei Cui, Huamin Qu, and Baoquan Chen. 2008. Energy-based hierarchical edge clustering of graphs. In Pac. Vis. Sym. 55--61.Google ScholarGoogle Scholar
  50. Chenyi Zhuang, Ziqi Liu, Zhiqiang Zhang, Yize Tan, Zhengwei Wu, Zhining Liu, Jianping Wei, Jinjie Gu, Guannan Zhang, Jun Zhou, et al. 2020. Hubble: An Industrial System for Audience Expansion in Mobile Marketing. In KDD. 2455--2463.Google ScholarGoogle Scholar

Index Terms

  1. Interactive Audience Expansion On Large Scale Online Visitor Data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
      August 2021
      4259 pages
      ISBN:9781450383325
      DOI:10.1145/3447548

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 August 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24
    • Article Metrics

      • Downloads (Last 12 months)114
      • Downloads (Last 6 weeks)10

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader