skip to main content
10.1145/2740908.2742750acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
other

Probabilistic Deduplication of Anonymous Web Traffic

Published: 18 May 2015 Publication History

Abstract

Cookies and log in-based authentication often provide incomplete data for stitching website visitors across multiple sources, necessitating probabilistic deduplication. We address this challenge by formulating the problem as a binary classification task for pairs of anonymous visitors. We compute visitor proximity vectors by converting categorical variables like IP addresses, product search keywords and URLs with very high cardinalities to continuous numeric variables using the Jaccard coefficient for each attribute. Our method achieves about 90% AUC and F-scores in identifying whether two cookies map to the same visitor, while providing insights on the relative importance of available features in Web analytics towards the deduplication process.

References

[1]
A. Dasgupta, M. Gurevich, L. Zhang, B. Tseng, and A. O. Thomas. Overcoming browser cookie churn with clustering. In WSDM '12, pages 83--92, 2012.
[2]
P. Eckersley. How unique is your web browser? In Privacy Enhancing Technologies, volume 6205 of LNCS, pages 1--18. Springer, 2010.

Cited By

View all
  • (2024)On online experimentation without device identifiersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693877(44394-44412)Online publication date: 21-Jul-2024
  • (2020)Entity Resolution in Dynamic Heterogeneous NetworksCompanion Proceedings of the Web Conference 202010.1145/3366424.3391264(662-668)Online publication date: 20-Apr-2020
  • (2019)node2bits: Compact Time- and Attribute-Aware Node Representations for User StitchingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-46150-8_29(483-506)Online publication date: 16-Sep-2019
  • Show More Cited By

Index Terms

  1. Probabilistic Deduplication of Anonymous Web Traffic

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
      May 2015
      1602 pages
      ISBN:9781450334730
      DOI:10.1145/2740908
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      • IW3C2: International World Wide Web Conference Committee

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 May 2015

      Check for updates

      Author Tags

      1. binary classification
      2. visitor deduplication
      3. web analytics

      Qualifiers

      • Other

      Conference

      WWW '15
      Sponsor:
      • IW3C2

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)On online experimentation without device identifiersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693877(44394-44412)Online publication date: 21-Jul-2024
      • (2020)Entity Resolution in Dynamic Heterogeneous NetworksCompanion Proceedings of the Web Conference 202010.1145/3366424.3391264(662-668)Online publication date: 20-Apr-2020
      • (2019)node2bits: Compact Time- and Attribute-Aware Node Representations for User StitchingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-46150-8_29(483-506)Online publication date: 16-Sep-2019
      • (2017)Probabilistic Visitor Stitching on Cross-Device Web LogsProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052711(1581-1589)Online publication date: 3-Apr-2017
      • (2015)Improving Marketing Interactions by Mining SequencesProceedings, Part I, of the 16th International Conference on Web Information Systems Engineering --- WISE 2015 - Volume 941810.1007/978-3-319-26190-4_19(277-292)Online publication date: 1-Nov-2015

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media