ABSTRACT
It is well-known that online services resort to various cookies to track users through users' online service identifiers (IDs) - in other words, when users access online services, various "fingerprints" are left behind in the cyberspace. As they roam around in the physical world while accessing online services via mobile devices, users also leave a series of "footprints" -- i.e., hints about their physical locations - in the physical world. This poses a potent new threat to user privacy: one can potentially correlate the "fingerprints" left by the users in the cyberspace with "footprints" left in the physical world to infer and reveal leakage of user physical world privacy, such as frequent user locations or mobility trajectories in the physical world - we refer to this problem as user physical world privacy leakage via user cyberspace privacy leakage. In this paper we address the following fundamental question: what kind - and how much - of user physical world privacy might be leaked if we could get hold of such diverse network datasets even without any physical location information. In order to conduct an in-depth investigation of these questions, we utilize the network data collected via a DPI system at the routers within one of the largest Internet operator in Shanghai, China over a duration of one month. We decompose the fundamental question into the three problems: i) linkage of various online user IDs belonging to the same person via mobility pattern mining; ii) physical location classification via aggregate user mobility patterns over time; and iii) tracking user physical mobility. By developing novel and effective methods for solving each of these problems, we demonstrate that the question of user physical world privacy leakage via user cyberspace privacy leakage is not hypothetical, but indeed poses a real potent threat to user privacy.
- Amazon. {n. d.}. Alexa's digital marketing tools. http://www.alexa.com/topsites/countries/CN.Google Scholar
- Randy Baden, Adam Bender, Neil Spring, Bobby Bhattacharjee, and Daniel Starin. 2009. Persona: an online social network with user-defined privacy Proc. ACM SIGCOMM. Google ScholarDigital Library
- Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: user movement in location-based social networks Proc. ACM SIGKDD. Google ScholarDigital Library
- Justin Cranshaw, Eran Toch, Jason Hong, Aniket Kittur, and Norman Sadeh. 2010. Bridging the gap between physical location and online social networks Proc. ACM Ubicomp. Google ScholarDigital Library
- Yves-Alexandre de Montjoye, César A Hidalgo, Michel Verleysen, and Vincent D. Blondel. 2013. Unique in the crowd: The privacy bounds of human mobility. Scientific reports Vol. 3 (2013).Google Scholar
- Yves-Alexandre de Montjoye, Laura Radaelli, and Vivek Kumar Singh. 2015. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science, Vol. 347, 6221 (2015), 536--539.Google Scholar
- Josep Domingo-Ferrer and Rolando Trujillo-Rasua. 2012. Microaggregation-and permutation-based anonymization of movement data. Information Sciences Vol. 208 (2012), 55--80. Google ScholarDigital Library
- Lujun Fang and Kristen LeFevre. 2010. Privacy wizards for social networking sites. In Proc. WWW. Google ScholarDigital Library
- Huiji Gao, Jiliang Tang, Xia Hu, and Huan Liu. 2013. Modeling temporal effects of human mobile behavior on location-based social networks Proc. ACM CIKM. Google ScholarDigital Library
- Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites Proc. WWW. Google ScholarDigital Library
- Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P. Gummadi. 2015. On the reliability of profile matching across large online social networks Proc. ACM SIGKDD. Google ScholarDigital Library
- Marta C Gonzalez, Cesar A Hidalgo, and Albert-Laszlo Barabasi. 2008. Understanding individual human mobility patterns. Nature, Vol. 453, 7196 (2008), 779--782.Google Scholar
- Google. {n. d.}. Chrome developer tools. https://developer.chrome.com/devtools/.Google Scholar
- Marco Gramaglia and Marco Fiore. 2015. Hiding Mobile Traffic Fingerprints with GLOVE. In Proc. ACM CoNEXT. Google ScholarDigital Library
- Ehsan Kazemi, S Hamed Hassani, and Matthias Grossglauser. 2015. Growing a graph matching from a handful of seeds. Proceedings of the VLDB Endowment Vol. 8, 10 (2015), 1010--1021. Google ScholarDigital Library
- Hidetoshi Kido, Yutaka Yanagisawa, and Tetsuji Satoh. 2005. Protection of location privacy using dummies for location-based services Proc. ICDE Workshops. Google ScholarDigital Library
- Nitish Korula and Silvio Lattanzi. 2014. An efficient reconciliation algorithm for social networks. Proceedings of the VLDB Endowment Vol. 7, 5 (2014), 377--388. Google ScholarDigital Library
- Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. PNAS, Vol. 110, 15 (2013), 5802--5805.Google ScholarCross Ref
- Balachander Krishnamurthy, Konstantin Naryshkin, and Craig E Wills. 2011. Privacy leakage vs. Protection measures: the growing disconnect Proc. W2SP.Google Scholar
- Balachander Krishnamurthy and Craig Wills. 2009 a. Privacy diffusion on the web: a longitudinal perspective Proc. WWW. Google ScholarDigital Library
- Balachander Krishnamurthy and Craig E. Wills. 2006. Generating a privacy footprint on the internet. In Proc. ACM IMC. Google ScholarDigital Library
- Balachander Krishnamurthy and Craig E Wills. 2009 b. On the leakage of personally identifiable information via online social networks Proc. ACM WOSN. Google ScholarDigital Library
- Stevens Le Blond, Chao Zhang, Arnaud Legout, Keith Ross, and Walid Dabbous. 2011. I know where you are and what you are sharing: exploiting P2P communications to invade users' privacy. In Proc. ACM IMC. Google ScholarDigital Library
- Xutao Li, Tuan-Anh Nguyen Pham, Gao Cong, Quan Yuan, Xiao-Li Li, and Shonali Krishnaswamy. 2015. Where you instagram? Associating your instagram photos with points of interest Proc. ACM CIKM. Google ScholarDigital Library
- Yabing Liu, Krishna P Gummadi, Balachander Krishnamurthy, and Alan Mislove. 2011. Analyzing facebook privacy settings: user expectations vs. reality Proc. ACM IMC. Google ScholarDigital Library
- Noman Mohammed, Benjamin Fung, and Mourad Debbabi. 2009. Walking in the crowd: anonymizing trajectory data for pattern analysis Proc. ACM CIKM. Google ScholarDigital Library
- Anna Monreale, Gennady L Andrienko, Natalia V Andrienko, Fosca Giannotti, Dino Pedreschi, Salvatore Rinzivillo, and Stefan Wrobel. 2010. Movement Data Anonymity through Generalization. Transactions on Data Privacy Vol. 3, 2 (2010), 91--121. Google ScholarDigital Library
- Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. Proc. IEEE SP. Google ScholarDigital Library
- Jahna Otterbacher. 2010. Inferring gender of movie reviewers: exploiting writing style, content and metadata Proc. ACM CIKM. Google ScholarDigital Library
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox.. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research Vol. 12, Oct (2011), 2825--2830. Google ScholarDigital Library
- David Martin Powers. 2011. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. (2011).Google Scholar
- Gyan Ranjan, Hui Zang, Zhi-Li Zhang, and Jean Bolot. 2012. Are call detail records biased for sampling human mobility? ACM SIGMOBILE Mobile Computing and Communications Review, Vol. 16, 3 (2012), 33--44. Google ScholarDigital Library
- Jingjing Ren, Ashwin Rao, Martina Lindorfer, Arnaud Legout, and David Choffnes. 2015. Recon: Revealing and controlling privacy leaks in mobile network traffic Proc. MobiSys. Google ScholarDigital Library
- Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking Users Across Domains with Location Data: Theory and Validation Proc. WWW. Google ScholarDigital Library
- Luca Rossi and Mirco Musolesi. 2014. It's the way you check-in: identifying users in location-based social networks Proc. ACM COSN. Google ScholarDigital Library
- Chaoming Song, Tal Koren, Pu Wang, and Albert-László Barabási. 2010 a. Modelling the scaling properties of human mobility. Nature Physics, Vol. 6, 10 (2010), 818--823.Google ScholarCross Ref
- Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. 2010 b. Limits of predictability in human mobility. Science, Vol. 327, 5968 (2010), 1018--1021.Google Scholar
- Yi Song, Daniel Dahlmeier, and Stéphane Bressan. 2014. Not So Unique in the Crowd: a Simple and Effective Algorithm for Anonymizing Location Data ACM PIR. 19--24.Google Scholar
- Yi Song, Panagiotis Karras, Sadegh Nobari, Giorgos Cheliotis, Mingqiang Xue, and Stéphane Bressan. 2012. Discretionary social network data revelation with a user-centric utility guarantee Proc. ACM CIKM. Google ScholarDigital Library
- Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters Vol. 9, 3 (1999), 293--300. Google ScholarDigital Library
- Ionut Trestian, Supranamaya Ranjan, Aleksandar Kuzmanovic, and Antonio Nucci. 2009. Measuring serendipity: connecting people, locations and interests in a mobile 3G network Proc. ACM IMC. Google ScholarDigital Library
- Chuang Wang, Xing Xie, Lee Wang, Yansheng Lu, and Wei-Ying Ma. 2005. Web resource geographic location classification and detection Proc. WWW. Google ScholarDigital Library
- Ning Xia, Han Hee Song, Yong Liao, Marios Iliofotou, Antonio Nucci, Zhi-Li Zhang, and Aleksandar Kuzmanovic. 2013. Mosaic: Quantifying privacy leakage in mobile networks Proc. ACM SIGCOMM. Google ScholarDigital Library
- Hui Zang and Jean Bolot. 2011. Anonymization of location data does not work: A large-scale measurement study Proc. ACM MobiCom. Google ScholarDigital Library
Index Terms
From Fingerprint to Footprint: Revealing Physical World Privacy Leakage by Cyberspace Cookie Logs
Recommendations
CPS-SPC 2018: Fourth Workshop on Cyber-Physical Systems Security and PrivaCy
CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications SecurityCyber-Physical Systems (CPS) are becoming increasingly critical for the well-being of society (e.g., electricity generation and distribution, water treatment, implantable medical devices etc.). While the convergence of computing, communications and ...
Privacy Preservation in the Publication of Trajectories
MDM '08: Proceedings of the The Ninth International Conference on Mobile Data ManagementWe study the problem of protecting privacy in the publication of location sequences. Consider a database of trajectories, corresponding to movements of people, captured by their transactions when they use credit or RFID debit cards. We show that, if ...
Private-HERMES: a benchmark framework for privacy-preserving mobility data querying and mining methods
EDBT '12: Proceedings of the 15th International Conference on Extending Database TechnologyMobility data sources feed larger and larger trajectory databases nowadays. Due to the need of extracting useful knowledge patterns that improve services based on users' and customers' behavior, querying and mining such databases has gained significant ...
Comments