skip to main content
10.1145/3485983.3494859acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

User profiling by network observers

Published: 03 December 2021 Publication History

Abstract

Targeted online advertising is a multi-billion dollar business based on the ability of profiling and delivering targeted ads to a wide range of users. Due to the privacy erosion associated with such business, researchers are trying to understand how profiling works and anti-tracking applications are becoming popular among users. Both research and privacy-enhancing apps, however, target ad-networks or over-the-top providers that have unrestricted access to users' online activity. There seems to be little interest in potential profiling activities by "network observers" like ISPs or VPN providers. On the one side, this may be explained by the pervasiveness of TLS that secures connections end-to-end. On the other side, TLS does leak some information, and it is not clear what an eavesdropper can learn about a user, despite her traffic being encrypted.
In this paper, we show that a network observer can build accurate user profiles notwithstanding the limited visibility due to TLS. In particular, we introduce a technique based on representation learning algorithms that can build profiles by only using the hostnames of URLs requested by users. To evaluate the accuracy of the profiles built with our technique, we setup an experiment where we serve personalized ads to more than one thousand real users over a period of one month. We compare the click-through rate of ads served by our system with the one of ads served by ad-networks. We empirically show that the quality of profiles that a network observer could build is comparable to the quality of profiles available to ad-networks and over-the-top providers. This is particularly worrisome since current anti-tracking mechanisms cannot counter profiling activities by network observers, whereas effective mechanisms like TOR incur in a performance and usability penalty.

Supplementary Material

MP4 File (S6-1-3494859-presentation - Roberto Gonza?lez Sa?nchez.mp4)
Presentation Video

References

[1]
"Data Transparency Lab." http://datatransparencylab.org, 2015.
[2]
B. Liu, A. Sheth, U. Weinsberg, J. Chandrashekar, and R. Govindan, "Adreveal: improving transparency into online targeted advertising," in Twelfth ACM Workshop on Hot Topics in Networks (HotNets), pp. 12:1--12:7, 2013.
[3]
J. M. Carrascosa, J. Mikians, R. Cuevas, V. Erramilli, and N. Laoutaris, "I Always Fell Like Somebody's Watching Me. Measuring Online Behavioral Advertising," in CONEXT, 2015.
[4]
R. Gonzalez, C. Soriente, and N. Laoutaris, "User profiling in the time of https," in Proceedings of the 2016 Internet Measurement Conference, pp. 373--379, ACM, 2016.
[5]
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of tricks for efficient text classification," in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427--431, Association for Computational Linguistics, April 2017.
[6]
A. Lerner, A. K. Simpson, T. Kohno, and F. Roesner, "Internet Jones and the Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996 to 2016," in USENIX Security Symposium, 2016.
[7]
E. Steven and A. Narayanan, "Online Tracking: A 1-million-site Measurement and Analysis," in ACM CCS, 2016.
[8]
J. R. Mayer and J. C. Mitchell, "Third-Party Web Tracking: Policy and Technology," in IEEE Symposium on Security and Privacy, 2012.
[9]
N. Vallina-Rodriguez, J. Shah, A. Finamore, Y. Grunenberger, K. Papagiannaki, H. Haddadi, and J. Crowcroft, "Breaking for Commercials: Characterizing Mobile Advertising," 2012.
[10]
A. Razaghpanah, R. Nithyanand, N. Vallina-Rodriguez, S. Sundaresan, M. Allman, C. Kreibich, and P. Gill, "Apps, Trackers, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem," in NDSS, 2018.
[11]
C. Leung, J. Ren, D. Choffnes, and C. Wilson, "Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services," 2016.
[12]
B. Reuben, L. Ulrik, M. V. Kleek, J. Zhao, T. Libert, and N. Shadbolt, "Third Party Tracking in the Mobile Ecosystem," CoRR, 2018.
[13]
P. Vallina, A. Feal, J. Gamba, N. Vallina-Rodriguez, and A. F. Anta, "Tales from the porn: A comprehensive privacy analysis of the web porn ecosystem," in Proceedings of the Internet Measurement Conference, IMC '19, (New York, NY, USA), p. 245--258, Association for Computing Machinery, 2019.
[14]
M. Pachilakis, P. Papadopoulos, E. P. Markatos, and N. Kourtellis, "No more chasing waterfalls: A measurement study of the header bidding ad-ecosystem," in Proceedings of the Internet Measurement Conference, IMC '19, (New York, NY, USA), p. 280--293, Association for Computing Machinery, 2019.
[15]
R. Li, C. Wang, and K. C.-C. Chang, "User profiling in an ego network: Co-profiling attributes and relationships," in Proceedings of the 23rd International Conference on World Wide Web, WWW '14, 2014.
[16]
A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel, "You are who you know: inferring user profiles in online social networks," in Proceedings of the third ACM international conference on Web search and data mining, pp. 251--260, ACM, 2010.
[17]
V. Kumar, D. Khattar, S. Gupta, M. Gupta, and V. Varma, "User profiling based deep neural network for temporal news recommendation," in 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017.
[18]
G. Alotibi, N. Clarke, F. Li, and S. Furnell, "User profiling from network traffic via novel application-level interactions," in 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST), 2016.
[19]
L. Partners, "Display LUMAscape." https://lumapartners.com/content/lumascapes/display-ad-tech-lumascape/. "[Online; accessed 13-May-2019]".
[20]
A. Pastor, R. Cuevas, Á. Cuevas, and A. Azcorra, "Establishing trust in online advertising with signed transactions," IEEE Access, vol. 9, pp. 2401--2414, 2020.
[21]
A. Datta, M. C. Tschantz, and A. Datta, "Automated experiments on ad privacy settings," Proceedings on Privacy Enhancing Technologies, vol. 2015, no. 1, pp. 92--112, 2015.
[22]
Google AdWords. https://adwords.google.com/, 2018.
[23]
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[24]
O. Levy and Y. Goldberg, "Neural word embedding as implicit matrix factorization," in Advances in neural information processing systems, pp. 2177--2185, 2014.
[25]
F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, "Characterizing user behavior in online social networks," in Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp. 49--62, ACM, 2009.
[26]
M. Gutmann and A. Hyvärinen, "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models," in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297--304, 2010.
[27]
R. Řehůřek and P. Sojka, "Software Framework for Topic Modelling with Large Corpora," in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45--50, ELRA, 2010.
[28]
L. v. d. Maaten and G. Hinton, "Visualizing data using t-sne," Journal of machine learning research, vol. 9, no. Nov, pp. 2579--2605, 2008.
[29]
K. Volovich, "What's a Good Clickthrough Rate? New Benchmark Data for Google AdWords." https://blog.hubspot.com/agency/google-adwords-benchmark-data/. "[Online; accessed 13-May-2019]".
[30]
WordStream, "Average CTR (Click-Through Rate): Learn How Your CTR Compares." https://www.wordstream.com/average-ctr. "[Online; accessed 13-May-2019]".
[31]
R. Hof, "Study: Mobile Ads Actually Do Work - Especially In Apps." https://www.forbes.com/sites/roberthof/2014/08/27/study-mobile-ads-actually-do-work-especially-in-apps/. "[Online; accessed 13-May-2019]".
[32]
M. Marciel, R. Cuevas, A. Banchs, R. González, S. Traverso, M. Ahmed, and A. Azcorra, "Understanding the detection of view fraud in video content portals," in Proceedings of the 25th International Conference on World Wide Web, pp. 357--368, International World Wide Web Conferences Steering Committee, 2016.
[33]
S. Siby, M. Juarez, C. Diaz, N. Vallina-Rodriguez, and C. Troncoso, "Encrypted dns→ privacy," A Traffic Analysis Perspective (Proc. of the NDSS), 2020.
[34]
M. Ingram, "Here's Why Verizon Wants to Buy Yahoo So Badly." https://fortune.com/2016/04/19/verizon-yahoo/. "[Online; accessed 24-Sep-2019]".
[35]
AT&T, "AT&T Launches New Advertising Company, Xandr." https://about.att.com/story/2018/attlaunchesxandr.html. "[Online; accessed 24-Sep-2019]".
[36]
A. Johnson, C. Wacek, R. Jansen, M. Sherr, and P. Syverson, "Users get routed: Traffic correlation on tor by realistic adversaries," in Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pp. 337--348, ACM, 2013.
[37]
Y. Sun, A. Edmundson, L. Vanbever, O. Li, J. Rexford, M. Chiang, and P. Mittal, "Raptor: Routing attacks on privacy in tor," in 24th USENIX Security Symposium, pages=271--286, year=2015.

Cited By

View all
  • (2024)Cross-Network Embeddings Transfer for Traffic AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2023.332944221:3(2686-2699)Online publication date: Jun-2024
  • (2023)Going Incognito in the Metaverse: Achieving Theoretically Optimal Privacy-Usability Tradeoffs in VRProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606754(1-16)Online publication date: 29-Oct-2023
  • (2022)Towards a systematic multi-modal representation learning for network dataProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564108(181-187)Online publication date: 14-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CoNEXT '21: Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies
December 2021
507 pages
ISBN:9781450390989
DOI:10.1145/3485983
  • General Chairs:
  • Georg Carle,
  • Jörg Ott
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. advertising
  2. privacy
  3. user profiling

Qualifiers

  • Research-article

Funding Sources

Conference

CoNEXT '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 198 of 789 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)4
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cross-Network Embeddings Transfer for Traffic AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2023.332944221:3(2686-2699)Online publication date: Jun-2024
  • (2023)Going Incognito in the Metaverse: Achieving Theoretically Optimal Privacy-Usability Tradeoffs in VRProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606754(1-16)Online publication date: 29-Oct-2023
  • (2022)Towards a systematic multi-modal representation learning for network dataProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564108(181-187)Online publication date: 14-Nov-2022
  • (2022)Prediphant: Short Term Heavy User Prediction2022 13th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP)10.1109/CSNDSP54353.2022.9907909(704-709)Online publication date: 20-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media