skip to main content
10.1145/2872427.2883002acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Linking Users Across Domains with Location Data: Theory and Validation

Published:11 April 2016Publication History

ABSTRACT

Linking accounts of the same user across datasets -- even when personally identifying information is removed or unavailable -- is an important open problem studied in many contexts. Beyond many practical applications, (such as cross domain analysis, recommendation, and link prediction), understanding this problem more generally informs us on the privacy implications of data disclosure. Previous work has typically addressed this question using either different portions of the same dataset or observing the same behavior across thematically similar domains. In contrast, the general cross-domain case where users have different profiles independently generated from a common but unknown pattern raises new challenges, including difficulties in validation, and remains under-explored.

In this paper, we address the reconciliation problem for location-based datasets and introduce a robust method for this general setting. Location datasets are a particularly fruitful domain to study: such records are frequently produced by users in an increasing number of applications and are highly sensitive, especially when linked to other datasets. Our main contribution is a generic and self-tunable algorithm that leverages any pair of sporadic location-based datasets to determine the most likely matching between the users it contains. While making very general assumptions on the patterns of mobile users, we show that the maximum weight matching we compute is provably correct. Although true cross-domain datasets are a rarity, our experimental evaluation uses two entirely new data collections, including one we crawled, on an unprecedented scale. The method we design outperforms naive rules and prior heuristics. As it combines both sparse and dense properties of location-based data and accounts for probabilistic dynamics of observation, it can be shown to be robust even when data gets sparse.

References

  1. M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang. Algorithms for Large, Sparse Network Alignment Problems. Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on, pages 705--710, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Cecaj, M. Mamei, and N. Bicocchi. Re-identification of anonymized CDR datasets using social network data. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2014 IEEE International Conference on, pages 237--242. IEEE, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Cecaj, M. Mamei, and F. Zambonelli. Re-identification and information fusion between anonymized CDR and social network data. Journal of Ambient Intelligence and Humanized Computing, 7(1):1--14, 2015.Google ScholarGoogle Scholar
  4. E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1082--1090. ACM Request Permissions, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Christen. Data Matching. Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. J. Crandall, L. Backstrom, D. Cosley, S. Suri, D. Huttenlocher, and J. M. Kleinberg. Inferring social ties from geographic coincidences. Proceedings of the National Academy of Sciences, 107(52):22436--22441, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  7. Y.-A. de Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel. Unique in the Crowd: The privacy bounds of human mobility. Scientific Reports, 3, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  8. Y.-A. de Montjoye, L. Radaelli, V. K. Singh, and A. S. Pentland. Unique in the shopping mall: on the reidentifiability of credit card metadata. Science, 347(6221):536--539, 2015. Google ScholarGoogle Scholar
  9. O. Goga, H. Lei, S. Parthasarathi, and G. Friedland. Exploiting innocuous activity for correlating users across sites. In WWW '13: Proceedings of the 22nd international conference on World Wide Web, pages 447--458, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. O. Goga, P. Loiseau, R. Sommer, R. Teixeira, and K. Gummadi. On the Reliability of Profile Matching Across Large Online Social Networks. In KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1799--1808. ACM Request Permissions, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Ji, W. Li, M. Srivatsa, J. S. He, and R. Beyah. Structure Based Data De-Anonymization of Social Networks and Mobility Traces. In ISC Proceedings of the 17th International Information Security Conference, pages 237--254. Springer International Publishing, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  12. E. Kazemi, S. H. Hassani, and M. Grossglauser. Growing a graph matching from a handful of seeds. Proceedings of the VLDB Endowment, 8(10):1010--1021, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Korula and S. Lattanzi. An efficient reconciliation algorithm for social networks. Proceedings of VLDB, 7(5):377--388, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Koutra, H. Tong, and D. Lubensky. BIG-ALIGN: Fast Bipartite Graph Alignment. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 389--398, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Narayanan and V. Shmatikov. Robust De-anonymization of Large Sparse Datasets. Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111--125, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Narayanan and V. Shmatikov. De-anonymizing Social Networks. Security and Privacy, 2009 30th IEEE Symposium on, pages 173--187, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Pedarsani and M. Grossglauser. On the privacy of anonymized networks. In KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1235--1243. ACM Request Permissions, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. J. Riederer, S. Zimmeck, C. Phanord, A. Chaintreau, and S. M. Bellovin. I don't have a photograph, but you can have my footprints.: Revealing the Demographics of Location Data. In COSN '15: Proceedings of the third ACM conference on Online social networks, pages 185--195. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Rossi and M. Musolesi. It's the Way you Check-in: Identifying Users in Location-Based Social Networks. COSN '14: Proceedings of the 2nd ACM conference on Online social networks, pages 215--226, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Srivatsa and M. Hicks. Deanonymizing Mobility Traces: Using Social Networks as a Side-Channel. CCS '12: Proceedings of the 2012 ACM conference on Computer and communications security, pages 628--637, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Sweeney. k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Unnikrishnan and F. M. Naini. De-anonymizing private data by matching statistics. In Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on, pages 1616--1623. IEEE, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  23. L. Yartseva and M. Grossglauser. On the performance of percolation graph matching. In COSN '15: Proceedings of the third ACM conference on Online social networks, pages 119--130. ACM Request Permissions, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Zang and J. Bolot. Anonymization of location data does not work: a large-scale measurement study. In MobiCom '11: Proceedings of the 17th annual international conference on Mobile computing and networking, pages 145--156. ACM Request Permissions, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Zhang, X. Kong, and P. S. Yu. Transferring heterogeneous links across location-based social networks. In WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining, pages 303--312. ACM Request Permissions, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Zhong, N. J. Yuan, W. Zhong, F. Zhang, and X. Xie. You Are Where You Go. In WSDM '15: Proceedings of the 8th ACM international conference on Web search and data mining, pages 295--304. ACM Press, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Linking Users Across Domains with Location Data: Theory and Validation

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  WWW '16: Proceedings of the 25th International Conference on World Wide Web
                  April 2016
                  1482 pages
                  ISBN:9781450341431

                  Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

                  Publisher

                  International World Wide Web Conferences Steering Committee

                  Republic and Canton of Geneva, Switzerland

                  Publication History

                  • Published: 11 April 2016

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader