skip to main content
10.1145/3127540.3127567acmconferencesArticle/Chapter ViewAbstractPublication PagesmswimConference Proceedingsconference-collections
research-article
Public Access

Tracking You through DNS Traffic: Linking User Sessions by Clustering with Dirichlet Mixture Model

Published: 21 November 2017 Publication History

Abstract

The Domain Name System (DNS), which does not encrypt domain names such as "bank.us" and "dentalcare.com", commonly accurately reflects the specific network services. Therefore, DNS-based behavioral analysis is extremely attractive for many applications such as forensics investigation and online advertisement. Traditionally, a user can be trivially and uniquely identified by the device's IP address if it is static (i.e., a desktop or a laptop). As more and more wireless and mobile devices are deeply ingrained in our lives and the dynamic IP address such as DHCP has been widely applied, it becomes almost impossible to use one IP address to identify a unique user. In this paper, we propose a new tracking method to identify individual users by the way they query DNS regardless of dynamic changing IP addresses and various types of devices. The method is applicable based on two observations. First, even though users may update IP addresses dynamically during different sessions, their query patterns can be stable across these sessions. Secondly, domain name look ups in sessions are different from users to users according to their personal behaviors. Specifically, we propose the constrained Dirichlet multinomial mixture (CDMM) clustering model to cluster DNS queries of different sessions into groups, each of which is considered being generated by a unique user. Compared with traditional supervised and unsupervised models, our model does not acquire any labeled user information that is very hard to obtain in real networks or the specification of the number of clusters, and meanwhile enforces the maximum number of session data in each cluster, which fits the DNS tracking problem nicely. Experimental results on DNS queries collected from real networks demonstrate that our method accomplishes a high clustering accuracy and outperforms the existing methods.

References

[1]
Bin Bi, Milad Shokouhi, Michal Kosinski, and Thore Graepel. 2013. Inferring the demographics of search users: social data meets search queries. In Proceedings of the 22nd International Conference on World Wide Web. 131--140.
[2]
Gang Chen, Haiying Zhang, and Caiming Xiong. 2016. Maximum margin Dirichlet process mixtures for clustering. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 1491--1497.
[3]
Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. 2010. Side-channel leaks in web applications: a reality today, a challenge tomorrow. In Proceedings of the 2010 IEEE Symposium on Security and Privacy. 191--206.
[4]
Mauro Conti, Luigi V Mancini, Riccardo Spolaor, and Nino Vincenzo Verde. 2015. Can't you hear me knocking: identification of user actions on Android apps via traffic analysis. In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy. 297--304.
[5]
Roberto Gonzalez, Claudio Soriente, and Nikolaos Laoutaris. 2016. User profiling in the time of HTTPS. In Proceedings of the 2016 ACM Internet Measurement Conference. 373--379.
[6]
Xiaodan Gu, Ming Yang, Jiaxuan Fei, Zhen Ling, and Junzhou Luo. 2015. A novel behavior-based tracking attack for user identification. In Proceedings of the Third International Conference on Advanced Cloud and Big Data. 227--233.
[7]
Jinjin Guo and Zhiguo Gong. 2016. A nonparametric model for event discovery in the geospatial-temporal space. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 499--508.
[8]
Dominik Herrmann, Christian Banse, and Hannes Federrath. 2013. Behaviorbased tracking: exploiting characteristic patterns in DNS traffic. Computers & Security 39 (2013), 17--33.
[9]
Sakshi Jain, Mobin Javed, and Vern Paxson. 2016. Towards mining latent client identifiers from network traffic. In Proceedings of Privacy Enhancing Technologies Symposium. 100--114.
[10]
Dae Wook Kim and Junjie Zhang. 2015. You are how you query: deriving behavioral fingerprints from DNS traffic. In Security and Privacy in Communication Networks, Bhavani Thuraisingham, Xiaofeng Wang, and Vinod Yegneswaran (Eds.). Springer International Publishing, 348--366.
[11]
Matthias Kirchler, Dominik Herrmann, Jens Lindemann, and Marius Kloft. 2016. Tracked without a trace: linking sessions of users by unsupervised learning of patterns in their DNS traffic. In Proceedings of the 2016 ACMWorkshop on Artificial Intelligence and Security. 23--34.
[12]
Srinivas Krishnan and Fabian Monrose. 2010. DNS prefetching and its privacy implications: When good things go bad. In Proceedings of the Third USENIX Conference on Large-scale Exploits and Emergent Threats: Botnets, Spyware,Worms, and More. 10--10.
[13]
Marc Liberatore and Brian Neil Levine. 2006. Inferring the source of encrypted HTTP connections. In Proceedings of the 13th ACM Conference on Computer and Communications Security. 255--263.
[14]
Takashi Matsunaka, Akira Yamada, and Ayumu Kubota. 2013. Passive OS fingerprinting by DNS traffic analysis. In Proceedings of the IEEE 27th International Conference on Advanced Information Networking and Applications. 243--250.
[15]
David Stillwell Michal Kosinski and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802--5805.
[16]
Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. 2010. You are who you know: inferring user profiles in online social networks. In Proceedings of the Third ACM International Conference on Web Search and Data Mining. 251--260.
[17]
Jeffrey Pang, Ben Greenstein, Ramakrishna Gummadi, Srinivasan Seshan, and David Wetherall. 2007. 802.11 user fingerprinting. In Proceedings of the 13th Annual ACM International Conference on Mobile Computing and Networking. 99--110.
[18]
William M Rand. 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66, 336 (1971), 846--850.
[19]
Qixiang Sun, Daniel R. Simon, Yi-Min Wang, Wilf Russell, Venkata N. Padmanabhan, and Lili Qiu. 2002. Statistical identification of encrypted web browsing traffic. In Proceedings of the 2002 IEEE Symposium on Security and Privacy. 19--30.
[20]
Charles V Wright, Lucas Ballard, Scott E Coull, Fabian Monrose, and Gerald M Masson. 2008. Spot me if you can: uncovering spoken phrases in encrypted VoIP conversations. In Proceedings of the IEEE Symposium on Security and Privacy. 35--49.
[21]
Charles V Wright, Lucas Ballard, Fabian Monrose, and Gerald M Masson. 2007. Language identification of encrypted VoIP traffic: Alejandra y Roberto or Alice and Bob. In Proceedings of the 16th USENIX Security Symposium. 1--12.
[22]
Jianhua Yin and Jianyong Wang. 2014. A Dirichlet multinomial mixture modelbased approach for short text clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 233--242.
[23]
Fan Zhang, Wenbo He, Xue Liu, and Patrick G. Bridges. 2011. Inferring users' online activities through traffic analysis. In Proceedings of the 4th ACM Conference on Wireless Network Security.
[24]
Elena Zheleva and Lise Getoor. 2009. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th International Conference on World Wide Web. 531--540.
[25]
Yuan Zhong, Nicholas Jing Yuan, Wen Zhong, Fuzheng Zhang, and Xing Xie. 2015. You are where you go: inferring demographic attributes from location check-ins. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining. 295--304.

Cited By

View all
  • (2022)Hide and Seek: Revisiting DNS-based User Tracking2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP53844.2022.00020(188-205)Online publication date: Jun-2022
  • (2021)Evaluating Public DNS Services in the Wake of Increasing Centralization of DNS2021 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking52078.2021.9472831(1-9)Online publication date: 21-Jun-2021
  • (2021)Measuring DNS over TLS from the Edge: Adoption, Reliability, and Response TimesPassive and Active Measurement10.1007/978-3-030-72582-2_12(192-209)Online publication date: 30-Mar-2021
  • Show More Cited By

Index Terms

  1. Tracking You through DNS Traffic: Linking User Sessions by Clustering with Dirichlet Mixture Model

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MSWiM '17: Proceedings of the 20th ACM International Conference on Modelling, Analysis and Simulation of Wireless and Mobile Systems
      November 2017
      340 pages
      ISBN:9781450351621
      DOI:10.1145/3127540
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 November 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering
      2. dirichlet mixture model
      3. dns behavior tracking

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MSWiM '17
      Sponsor:

      Acceptance Rates

      MSWiM '17 Paper Acceptance Rate 29 of 142 submissions, 20%;
      Overall Acceptance Rate 398 of 1,577 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)72
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Hide and Seek: Revisiting DNS-based User Tracking2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP53844.2022.00020(188-205)Online publication date: Jun-2022
      • (2021)Evaluating Public DNS Services in the Wake of Increasing Centralization of DNS2021 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking52078.2021.9472831(1-9)Online publication date: 21-Jun-2021
      • (2021)Measuring DNS over TLS from the Edge: Adoption, Reliability, and Response TimesPassive and Active Measurement10.1007/978-3-030-72582-2_12(192-209)Online publication date: 30-Mar-2021
      • (2020)How Many Users Behind A Local Recursive DNS Server? Estimated by Delta-Time Cluster Model2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS50907.2020.00057(465-474)Online publication date: Dec-2020
      • (2018)Associating Drives Based on Their Artifact and Metadata DistributionsDigital Forensics and Cyber Crime10.1007/978-3-030-05487-8_9(165-182)Online publication date: 30-Dec-2018

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media