skip to main content
10.1145/3038912.3052711acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Probabilistic Visitor Stitching on Cross-Device Web Logs

Published: 03 April 2017 Publication History

Abstract

Personalization -- the customization of experiences, interfaces, and content to individual users -- has catalyzed user growth and engagement for many web services. A critical prerequisite to personalization is establishing user identity. However the variety of devices, including mobile phones, appliances, and smart watches, from which users access web services from both anonymous and logged-in sessions poses a significant obstacle to user identification. The resulting entity resolution task of establishing user identity across devices and sessions is commonly referred to as ``visitor stitching.'' We introduce a general, probabilistic approach to visitor stitching using features and attributes commonly contained in web logs. Using web logs from two real-world corporate websites, we motivate the need for probabilistic models by quantifying the difficulties posed by noise, ambiguity, and missing information in deployment. Next, we introduce our approach using probabilistic soft logic (PSL), a statistical relational learning framework capable of capturing similarities across many sessions and enforcing transitivity. We present a detailed description of model features and design choices relevant to the visitor stitching problem. Finally, we evaluate our PSL model on binary classification performance for two real-world visitor stitching datasets. Our model demonstrates significantly better performance than several state-of-the-art classifiers, and we show how this advantage results from collective reasoning across sessions.

References

[1]
S. H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss markov random fields and probabilistic soft logic. arXiv:1505.04406 {cs.LG}, 2015.
[2]
L. Backstrom, J. Kleinberg, R. Kumar, and J. Novak. Spatial variation in search engine queries. WWW, pages 357--366, 2008.
[3]
L. Backstrom, E. Sun, and C. Marlow. Find me if you can: Improving geographical prediction with social and spatial proximity. WWW, pages 61--70, 2010.
[4]
L. Breiman and E. Schapire. Random forests. In Machine Learning, pages 5--32, 2001.
[5]
M. Casado. Peering through the shroud: The effect of edge opacity on ip-based client identification. In USENIX, 2007.
[6]
D. Coey and M. Bailey. People and cookies: Imperfect treatment assignment in online experiments. WWW, 2016.
[7]
D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. WWW, pages 761--770, 2009.
[8]
A. Dasgupta, M. Gurevich, L. Zhang, B. Tseng, and A. O. Thomas. Overcoming browser cookie churn with clustering. WSDM, pages 83--92, 2012.
[9]
P. Eckersley. How unique is your web browser? PETS, pages 1--18, 2010.
[10]
S. Fakhraei, B. Huang, L. Raschid, and L. Getoor. Network-based drug-target interaction prediction with probabilistic soft logic. CBB, IEEE/ACM Transactions on, 2014.
[11]
Google. Google Universal Analytics. https://developers.google.com/analytics/devguides/collection/analyticsjs/cookies-user-id, 2015.
[12]
P. Kouki, S. Fakhraei, J. Foulds, M. Eirinaki, and L. Getoor. Hyper: A flexible and extensible probabilistic framework for hybrid recommender systems. In RecSys, 2015.
[13]
D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. Geographic routing in social networks. PNAS, 102(33):11623--11628, 2005.
[14]
B. London, S. Khamis, S. H. Bach, B. Huang, L. Getoor, and L. Davis. Collective activity detection using hinge-loss Markov random fields. In CVPR Workshop on Structured Prediction: Tractability, Learning and Inference, 2013.
[15]
A. Malhotra, L. Totti, W. Meira Jr., P. Kumaraguru, and V. Almeida. Studying user footprints in different online social networks. ASONAM, pages 1065--1070, 2012.
[16]
H. Miao, X. Liu, B. Huang, and L. Getoor. A hypergraph-partitioned vertex programming approach for large-scale consensus optimization. In 2013 IEEE International Conference on Big Data, 2013.
[17]
B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Commun. ACM, 43(8):142--151, Aug. 2000.
[18]
B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalization based on association rule discovery from web usage data. In Proceedings of the 3rd International Workshop on Web Information and Data Management, WIDM, pages 9--15, 2001.
[19]
G. D. Montanez, R. W. White, and X. Huang. Cross-Device Search. CIKM, pages 1669--1678. ACM, 2014.
[20]
J. Pujara, B. London, and L. Getoor. Budgeted online collective inference. In UAI, 2015.
[21]
J. Pujara, H. Miao, L. Getoor, and W. Cohen. Knowledge graph identification. In ISWC, 2013.
[22]
J. Pujara, H. Miao, L. Getoor, and W. Cohen. Ontology-aware partitioning for knowledge graph identification. In CIKM Workshop on Automatic Knowledge Base Construction, 2013.
[23]
C. Riederer, Y. Kim, A. Chaintreau, N. Korula, and S. Lattanzi. Linking users across domains with location data: Theory and validation. WWW, pages 707--719, 2016.
[24]
R. Saha Roy, R. Sinha, N. Chhaya, and S. Saini. Probabilistic deduplication of anonymous web traffic. WWW Companion, pages 103--104, 2015.
[25]
M. Sumner, E. Frank, and M. Hall. Speeding up logistic model tree induction. PKDD, pages 675--683, 2005.
[26]
G. I. Webb. Multiboosting: A technique for combining boosting and wagging. In Machine Learning, pages 159--196, 2000.
[27]
D. B. West. Introduction to Graph Theory. Prentice Hall, 2 edition, September 2000.
[28]
R. W. White and A. H. Awadallah. Personalizing Search on Shared Devices. In SIGIR, 2015.
[29]
Y. C. Yang. Web user behavioral profiling for user identification. Decis. Support Syst., 49(3):261--271, 2010.
[30]
J. Zhang and P. S. Yu. Integrated anchor and social link predictions across social networks. IJCAI, pages 2125--2131, 2015.

Cited By

View all
  • (2023)Privacy Aware Experiments without CookiesProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3573036(1144-1147)Online publication date: 27-Feb-2023
  • (2022)Users' mental models of cross-device search under controlled and autonomous motivationsAslib Journal of Information Management10.1108/AJIM-02-2022-005775:1(68-89)Online publication date: 31-May-2022
  • (2021)From Closing Triangles to Higher-Order Motif Closures for Better Unsupervised Online Link PredictionProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481920(4085-4093)Online publication date: 26-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '17: Proceedings of the 26th International Conference on World Wide Web
April 2017
1678 pages
ISBN:9781450349130

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 03 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-device users
  2. personalization
  3. visitor stitching

Qualifiers

  • Research-article

Conference

WWW '17
Sponsor:
  • IW3C2

Acceptance Rates

WWW '17 Paper Acceptance Rate 164 of 966 submissions, 17%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Privacy Aware Experiments without CookiesProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3573036(1144-1147)Online publication date: 27-Feb-2023
  • (2022)Users' mental models of cross-device search under controlled and autonomous motivationsAslib Journal of Information Management10.1108/AJIM-02-2022-005775:1(68-89)Online publication date: 31-May-2022
  • (2021)From Closing Triangles to Higher-Order Motif Closures for Better Unsupervised Online Link PredictionProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481920(4085-4093)Online publication date: 26-Oct-2021
  • (2020)Entity Resolution in Dynamic Heterogeneous NetworksCompanion Proceedings of the Web Conference 202010.1145/3366424.3391264(662-668)Online publication date: 20-Apr-2020
  • (2020)Siamese Neural Networks for User Identity Linkage Through Web BrowsingIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2019.292957531:8(2741-2751)Online publication date: Aug-2020
  • (2020)node2bits: Compact Time- and Attribute-Aware Node Representations for User StitchingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-46150-8_29(483-506)Online publication date: 30-Apr-2020
  • (2019)Exploratory study of cross-device search tasksInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10207356:6Online publication date: 1-Nov-2019
  • (2018)Scaling up Inference in MLNs with Spark2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622607(118-125)Online publication date: Dec-2018
  • (2018)Learning and Multi-Objective Optimization for Automatic Identity Linkage2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622581(4926-4931)Online publication date: Dec-2018
  • (2018)Using Information in Access Logs for Large Scale Identity Linkage2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622460(2906-2911)Online publication date: Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media