ABSTRACT
Linking accounts of the same user across datasets -- even when personally identifying information is removed or unavailable -- is an important open problem studied in many contexts. Beyond many practical applications, (such as cross domain analysis, recommendation, and link prediction), understanding this problem more generally informs us on the privacy implications of data disclosure. Previous work has typically addressed this question using either different portions of the same dataset or observing the same behavior across thematically similar domains. In contrast, the general cross-domain case where users have different profiles independently generated from a common but unknown pattern raises new challenges, including difficulties in validation, and remains under-explored.
In this paper, we address the reconciliation problem for location-based datasets and introduce a robust method for this general setting. Location datasets are a particularly fruitful domain to study: such records are frequently produced by users in an increasing number of applications and are highly sensitive, especially when linked to other datasets. Our main contribution is a generic and self-tunable algorithm that leverages any pair of sporadic location-based datasets to determine the most likely matching between the users it contains. While making very general assumptions on the patterns of mobile users, we show that the maximum weight matching we compute is provably correct. Although true cross-domain datasets are a rarity, our experimental evaluation uses two entirely new data collections, including one we crawled, on an unprecedented scale. The method we design outperforms naive rules and prior heuristics. As it combines both sparse and dense properties of location-based data and accounts for probabilistic dynamics of observation, it can be shown to be robust even when data gets sparse.
- M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang. Algorithms for Large, Sparse Network Alignment Problems. Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on, pages 705--710, 2009. Google ScholarDigital Library
- A. Cecaj, M. Mamei, and N. Bicocchi. Re-identification of anonymized CDR datasets using social network data. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2014 IEEE International Conference on, pages 237--242. IEEE, 2014. Google ScholarCross Ref
- A. Cecaj, M. Mamei, and F. Zambonelli. Re-identification and information fusion between anonymized CDR and social network data. Journal of Ambient Intelligence and Humanized Computing, 7(1):1--14, 2015.Google Scholar
- E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1082--1090. ACM Request Permissions, 2011. Google ScholarDigital Library
- P. Christen. Data Matching. Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. Google ScholarDigital Library
- D. J. Crandall, L. Backstrom, D. Cosley, S. Suri, D. Huttenlocher, and J. M. Kleinberg. Inferring social ties from geographic coincidences. Proceedings of the National Academy of Sciences, 107(52):22436--22441, 2010. Google ScholarCross Ref
- Y.-A. de Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel. Unique in the Crowd: The privacy bounds of human mobility. Scientific Reports, 3, 2013. Google ScholarCross Ref
- Y.-A. de Montjoye, L. Radaelli, V. K. Singh, and A. S. Pentland. Unique in the shopping mall: on the reidentifiability of credit card metadata. Science, 347(6221):536--539, 2015. Google Scholar
- O. Goga, H. Lei, S. Parthasarathi, and G. Friedland. Exploiting innocuous activity for correlating users across sites. In WWW '13: Proceedings of the 22nd international conference on World Wide Web, pages 447--458, 2013. Google ScholarDigital Library
- O. Goga, P. Loiseau, R. Sommer, R. Teixeira, and K. Gummadi. On the Reliability of Profile Matching Across Large Online Social Networks. In KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1799--1808. ACM Request Permissions, 2015. Google ScholarDigital Library
- S. Ji, W. Li, M. Srivatsa, J. S. He, and R. Beyah. Structure Based Data De-Anonymization of Social Networks and Mobility Traces. In ISC Proceedings of the 17th International Information Security Conference, pages 237--254. Springer International Publishing, 2014. Google ScholarCross Ref
- E. Kazemi, S. H. Hassani, and M. Grossglauser. Growing a graph matching from a handful of seeds. Proceedings of the VLDB Endowment, 8(10):1010--1021, 2015. Google ScholarDigital Library
- N. Korula and S. Lattanzi. An efficient reconciliation algorithm for social networks. Proceedings of VLDB, 7(5):377--388, 2014. Google ScholarDigital Library
- D. Koutra, H. Tong, and D. Lubensky. BIG-ALIGN: Fast Bipartite Graph Alignment. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 389--398, 2013.Google ScholarCross Ref
- A. Narayanan and V. Shmatikov. Robust De-anonymization of Large Sparse Datasets. Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111--125, 2008. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. De-anonymizing Social Networks. Security and Privacy, 2009 30th IEEE Symposium on, pages 173--187, 2009. Google ScholarDigital Library
- P. Pedarsani and M. Grossglauser. On the privacy of anonymized networks. In KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1235--1243. ACM Request Permissions, 2011. Google ScholarDigital Library
- C. J. Riederer, S. Zimmeck, C. Phanord, A. Chaintreau, and S. M. Bellovin. I don't have a photograph, but you can have my footprints.: Revealing the Demographics of Location Data. In COSN '15: Proceedings of the third ACM conference on Online social networks, pages 185--195. ACM, 2015. Google ScholarDigital Library
- L. Rossi and M. Musolesi. It's the Way you Check-in: Identifying Users in Location-Based Social Networks. COSN '14: Proceedings of the 2nd ACM conference on Online social networks, pages 215--226, 2014. Google ScholarDigital Library
- M. Srivatsa and M. Hicks. Deanonymizing Mobility Traces: Using Social Networks as a Side-Channel. CCS '12: Proceedings of the 2012 ACM conference on Computer and communications security, pages 628--637, 2012. Google ScholarDigital Library
- L. Sweeney. k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002. Google ScholarDigital Library
- J. Unnikrishnan and F. M. Naini. De-anonymizing private data by matching statistics. In Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on, pages 1616--1623. IEEE, 2013. Google ScholarCross Ref
- L. Yartseva and M. Grossglauser. On the performance of percolation graph matching. In COSN '15: Proceedings of the third ACM conference on Online social networks, pages 119--130. ACM Request Permissions, 2013. Google ScholarDigital Library
- H. Zang and J. Bolot. Anonymization of location data does not work: a large-scale measurement study. In MobiCom '11: Proceedings of the 17th annual international conference on Mobile computing and networking, pages 145--156. ACM Request Permissions, 2011. Google ScholarDigital Library
- J. Zhang, X. Kong, and P. S. Yu. Transferring heterogeneous links across location-based social networks. In WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining, pages 303--312. ACM Request Permissions, 2014. Google ScholarDigital Library
- Y. Zhong, N. J. Yuan, W. Zhong, F. Zhang, and X. Xie. You Are Where You Go. In WSDM '15: Proceedings of the 8th ACM international conference on Web search and data mining, pages 295--304. ACM Press, 2015. Google ScholarDigital Library
Index Terms
- Linking Users Across Domains with Location Data: Theory and Validation
Recommendations
APPT: A privacy preserving transformation tool for micro data release
A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in IndiaOur aim is to generate a privacy preserving micro data table for release, from the original table. A Privacy Preserving Transformation (APPT) tool developed, transforms both the numerical and nominal sensitive attributes to preserve privacy while ...
Ensuring location diversity in privacy-preserving spatio-temporal data publishing
The rise of mobile technologies in the last decade has led to vast amounts of location information generated by individuals. From the knowledge discovery point of view, these data are quite valuable, but the inherent personal information in the data ...
Anonymizing user location and profile information for privacy-aware mobile services
LBSN '10: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social NetworksDue to the growing use of mobile devices, location-based services have become popular. A location service often requires the user's exact location to provide appropriate services and this brings the risk of threats to privacy. In this paper, we propose ...
Comments