Skip to main content
Log in

TPII: tracking personally identifiable information via user behaviors in HTTP traffic

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users’ devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users’ behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately. Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wire-speed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Falahrastegar M, Haddadi H, Uhlig S, Mortier R. Tracking personal identifiers across the Web. In: Proceedings of International Conference on Passive and Active Network Measurement. 2016, 30–41

  2. Felt A P, Ha E, Egelman S, Haney A, Chin E, Wagner D. Android permissions: user attention, comprehension, and behavior. In: Proceedings of the 8th Symposium on Usable Privacy and Security 2012, 1–14

  3. Liu Y B, Gummadi K P, Krishnamurthy B, Mislove A. Analyzing facebook privacy settings:user expectations vs. reality. In: Proceedings of ACM Sigcomm Conference on Internet Measurement Conference. 2011, 61–70

  4. Krishnamurthy B, Wills C E. On the leakage of personally identifiable information via online social networks. In: Proceedings of ACM Workshop on Online Social Networks. 2009, 7–12

  5. Krishnamurthy B, Wills C E. Privacy diffusion on the Web: a longitudinal perspective. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 541–550

  6. Krishnamurthy B, Naryshkin K, Wills C E. Privacy leakage vs. protection measures: the growing disconnect. In: Proceedings of the Web Workshop on Security & Privacy 2011, 2–11

  7. Roesner F, Kohno T, Wetherall D. Detecting and defending against third-party tracking on the web. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 12

  8. Felt A P, Chin E, Hanna S, Song D, Wagner D. Android permissions demystified. In: Proceedings of the 18th ACM Conference on Computer and Communications Security. 2011, 17–21

  9. Bartel A, Klein J, Traon Y L, Monperrus M. Automatically securing permission-based software by reducing the attack surface: an application to android. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 2012, 274–277

  10. Au K W Y, Zhou Y F, Huang Z, Lie D. Pscout: analyzing the android permission specification. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. 2012, 217–228

  11. Atzeni A, Su T, Baltatu M, D’Alessandro R. How dangerous is your android app? An evaluation methodology. In: Proceedings of the 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. 2014, 130–139

  12. Jeon J, Micinski K K, Vaughan J A, Fogel A, Reddy N, Foster J S. Dr. Android and Mr. Hide: fine-grained permissions in android application. In: Proceedings of ACM Workshop on Security and Privacy in Smartphones and Mobile Devices. 2012, 3–14

  13. Backes M, Gerling S, Hammer C, Maffei M, Styp-Rekowsky P V. App-Guard — fine-grained policy enforcement for untrusted android applications. In: Proceedings of International Workshop on Data Privacy Management and Autonomous Spontaneous Security. 2013, 213

    Chapter  Google Scholar 

  14. Xu R, Sadi H, Anderson R J. Aurasium: practical policy enforcement for android applications. In: Proceedings of Usenix Conference on Security Symposium. 2012, 27

  15. Sun M, Tan G. Nativeguard: protecting android applications from third-party native libraries. In: Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless & Mobile Networks. 2014, 165–176

  16. Gerber P, Volkamer M, Renaud K. Usability versus privacy instead of usable privacy: Google’s balancing act between usability and privacy. ACM Sigcas Computers & Society, 2015, 45(1): 16–21

    Article  Google Scholar 

  17. Schwartz E J, Avgerinos T, Brumley D. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: Proceedings of IEEE Symposium on Security and Privacy 2010, 317–331

  18. Cheng W, Ports D R K, Blankstein A, Cowling J. Abstractions for usable information flow control in aeolus. In: Proceedings of USENIX Annual Technical Conference. 2012, 139–151

  19. Gibler C, Crussell J, Erickson J, Hao C. AndroidLeaks: automatically detecting potential privacy leaks in android applications on a large scale. In: Proceedings of International Conference on Trust and Trustworthy Computing. 2012, 291–307

  20. Lu L, Li Z, Wu Z, Lee W, Jiang G. Chex: statically vetting android apps for component hijacking vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. 2012, 229–240

  21. Bichhawat A, Rajani V, Garg D, Hammer C. Information flow control in WebKit’s JavaScript bytecode. In: Proceedings of International Conference on Principles of Security and Trust. 2014, 159–178

  22. Efstathopoulos P, Krohn M, Vandebogart S, Frey C, Ziegler D, Kohler E. Labels and event processes in the asbestos operating system. ACM Transactions on Computer Systems, 2005, 39(5): 17–30

    Google Scholar 

  23. Zeldovich N, Boyd-Wickizer S, Kohler E, Mazieres D. Making information flow explicit in HiStar. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 2006, 263–278

  24. Enck W, Gilbert P, Chun B G, Cox L P, Jung J, Mcdaniel P, Sheth A. TaintDroid: an information flow tracking system for real-time privacy monitoring on smartphones. ACM Transactions on Computer Systems, 2010, 32(2): 1–29

    Article  Google Scholar 

  25. Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Traon Y L, Octeau D, Mcdaniel P. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. ACM Sigplan Notices, 2014, 49(6): 259–269

    Article  Google Scholar 

  26. King D, Hicks B, Hicks M, Jaeger T. Implicit flows: can’t live with’ em, can’t live without’ em. In: Proceedings of International Conference on Information Systems Security. 2008, 56–70

  27. Vallina-Rodriguez N, Shah J, Finamore A, Grunenberger Y, Papagiannaki K, Haddadi H. Breaking for commercials: characterizing mobile advertising. In: Proceedings of ACM Conference on Internet Measurement Conference. 2012, 343–356

  28. Gill P, Erramilli V, Chaintreau A, Krishnamurthy B, Rodriguez P. Follow the money: understanding economics of online aggregation and advertising. In: Proceedings of the 2013 Conference on Internet Measurement Conference. 2013, 141–148

  29. Ren J, Lindorfer M, Lindorfer M, Legout A, Choffnes D. Recon: revealing and controlling PII leaks in mobile network traffic. In: Proceedings of the 14th International Conference on Mobile Systems, Applications, and Services. 2016, 361–374

  30. Liu Y, Song H H, Bermudez I, Mislove A, Baldi M, Tongaonkar A. Identifying personal information in internet traffic. In: Proceedings of ACM Conference on Online Social Networks. 2015, 59–70

  31. Xia N, Song H H, Liao Y, Iliofotou M, Nucci A, Zhang Z L. Mosaic: quantifying privacy leakage in mobile networks. Computer Communication Review, 2013, 43(4): 279–290

    Article  Google Scholar 

  32. Lee S, Wong E L, Goel D, Dahlin M, Shmatikov V. πBox: a platform for privacy-preserving apps. In: Proceedings of the 10th Usenix Conference on Networked Systems Design and Implementation. 2013, 501–514

  33. Herbster R, Dellatorre S, Druschel P, Bhattacharjee B. Privacy capsules: preventing information leaks by mobile apps. In: Proceedings of International Conference on Mobile Systems, Applications, and Services. 2016, 399–411

  34. Song Y, Hengartner U. Privacyguard: a VPN-based platform to detect information leakage on android devices. In: Proceedings of the 5th ACM CCS Workshop on Security and Privacy in Smartphones and Mobile Devices. 2015, 15–26

  35. Le A, Varmarken J, Langhoff S, Shuba A, Gjoka M, Markopoulou A. AntMonitor: a system for monitoring from mobile devices. In: Proceedings of ACM SIGCOMM Workshop on Crowdsourcing and Crowdsharing of Big Data. 2015, 15–20

  36. Razaghpanah A, Vallinarodriguez N, Sundaresan S, Kreibich C, Gill P, Allman M. Haystack: a multi-purpose mobile vantage point in user space. Computer Science, 2015, 1–15

  37. Xu Q, Erman J, Gerber A, Mao Z M, Pang J, Venkataraman S. Identifying diverse usage behaviors of smartphone apps. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement Conference. 2011, 329–344

  38. Falaki H, Lymberopoulos D, Mahajan R, Kandula S, Estrin D. A first look at traffic on smartphones. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement. 2010, 281–287

  39. Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Veen V V D, Platzer C. Andrubis — 1,000,000 apps later: a view on current android malware behaviors. In: Proceedings of the 3rd International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. 2014, 3–17

  40. Mccallister E, Grance T, Scarfone K A. SP 800–122. Guide to protecting the confidentiality of personally identifiable information (PII). Washington: National Institute of Standards & Technology, 2010

    Google Scholar 

  41. Johnson L A, Dempsey K L, Bailey D. SP800-128. Guide for security-focused configuration management of information systems. Journal of Dairy Science, 2011, 77(6): 1604–1617

    Google Scholar 

  42. Greene S S. Security Program and Policies: Principles and Practices. Pearson Education, 2014, 349

  43. Dai S, Tongaonkar A, Wang X, Nucci A, Song D. NetworkProfiler: towards automatic fingerprinting of Android apps. In: Proceedings of IEEE INFOCOM. 2013, 809–817

  44. Han S, Jung J, Wetherall D. A study of third-party tracking by mobile apps in the wild. University of Washington: Technical Report UW-CSE-12-03-01, 2012

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation of China (Grant Nos. 61672101, U1636119, 61866038, 61962059), and 2018 College Students’ Innovation and Entrepreneurship Training Program (D2018127). The authors declare that they have no conflicts of interest regarding the publication of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tian Song.

Additional information

Yi Liu is a PhD candidate of Computer Science of Beijing Institute of Technology, China. He received a MS degree from Xi-dian University, China in 2010. He is now working at the network information center of Yan’an University, China. His research interests include network information security, network traffic analysis, and privacy protection on network.

Tian Song is an associate professor of Computer Science of Beijing Institute of Technology, China. He obtained his PhD degree from Tsinghua University, China in 2008. His research interests include network content security, next generation internet, and computer architecture.

Lejian Liao is a professor in School of Computer Sciences, Beijing Institute of Technology, China. He got his PhD degree in 1994 and MS degree in 1988 respectively from Institute of Computing Technology, Chinese Academy of Sciences, China. His current academic interest includes Web intelligence, semantic computing, ontology engineering, and constraint-based technologies. He has published more than 100 academic papers as first author or co-author.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Song, T. & Liao, L. TPII: tracking personally identifiable information via user behaviors in HTTP traffic. Front. Comput. Sci. 14, 143801 (2020). https://doi.org/10.1007/s11704-018-7451-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-018-7451-z

Keywords

Navigation