Abstract
Identity verification is essential in our mission to identify potential terrorists and criminals. It is not a trivial task because terrorists reportedly assume multiple identities using either fraudulent or legitimate means. A national identification card and biometrics technologies have been proposed as solutions to the identity problem. However, several studies show their inability to tackle the complex problem. We aim to develop data mining alternatives that can match identities referring to the same individual. Existing identity matching techniques based on data mining primarily rely on personal identity features. In this research, we propose a new identity matching technique that considers both personal identity features and social identity features. We define two groups of social identity features including social activities and social relations. The proposed technique is built upon a probabilistic relational model that utilizes a relational database structure to extract social identity features. Experiments show that the social activity features significantly improve the matching performance while the social relation features effectively reduce false positive and false negative decisions.
Similar content being viewed by others
References
Ananthakrishna, R., Chaudhuri, S., & Ganti, V. (2002). Eliminating Fuzzy Duplicates in Data Warehouses, Proceedings of the 28th VLDB Conference. Hong Kong, China.
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S., & Widom, J. (2009). Swoosh: a generic approach to entity resolution. The VLDB Journal, 18(1), 255–276.
Bhattacharya, I., & Getoor, L. (2006a). Entity Resolution in Graphs. In D. J. Cook & L. B. Holder (Eds.), Mining graph data (p. 311). Hoboken: Wiley.
Bhattacharya, I., & Getoor, L. (2006b). A Latent Dirichlet Model for Unsupervised Entity Resolution, Proceedings of the 6th SIAM Conference on Data Mining (SIAM SDM-06). Bethesda.
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., & Fienberg, S. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5), 16–23.
Brown, D. E., & Hagen, S. C. (2003). Data association methods with applications to law enforcement. Decision Support Systems, 34(4), 369–378.
Cheek, J. M., & Briggs, S. R. (1982). Self-consciousness and aspects of identity. Journal of Research in Personality, 16(4), 401–408.
Conyers, R., & Sensenbrenner, F. (2005). Real Id Act of 2005. Congressional Record House, 151, 14.
Culotta, A., & McCallum, A. (2005). Joint deduplication of multiple record types in relational data, Proceedings of the 14th ACM international conference on Information and knowledge management. Bremen: ACM.
Deaux, K., & Martin, D. (2003). Interpersonal networks and social categories: specifying levels of context in identity processes. Social Psychology Quarterly, 66(2), 101–117.
Dey, D., Sarkar, S., & De, P. (1998). A probabilistic decision model for entity matching in heterogeneous databases. Management Science, 44(10), 1379–1395.
Dey, D., Sarkar, S., & De, P. (2002). A distance-based approach to entity reconciliation in heterogeneous databases. IEEE Transactions on Knowledge and Data Engineering, 14(3), 567–582.
Elmagarmid, A., Ipeirotis, P., & Verykios, V. (2007). Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183–1210.
Finch, E. (2003). What a tangled web we weave: Identity theft and the internet. In Dot Cons: Crime, deviance, and identity on the internet (pp. 86-104). Collompton, England: Willan.
Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models, Proceedings of the 16th International Joint Conference on Artificial Intelligence (Vol. 16, pp. 1300–1309). Stockholm, Sweden: Citeseer.
Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2003). Learning probabilistic models of link structure. The Journal of Machine Learning Research, 3, 679–707.
Hernandez, M. A., & Stolfo, S. J. (1995). The Merge/Purge Problem for Large Databases. Paper presented at the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, CA.
Jonas, J. (2006). Identity Resolution: 23 Years of practical experience and observations at scale, Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. Chicago: ACM.
Kalashnikov, D., Mehrotra, S., & Chen, Z. (2005). Exploiting relationships for domain-independent data cleaning, Proceedings of SIAM International Conference on Data Mining. Newport Beach, CA.
Kean, T., Kojm, C., Zelikow, P., Thompson, J., Gorton, S., Roemer, T., et al. (2004). The 9/11 Commission Report.
Kent, S. T., & Millett, L. I. (2002). Ids–not that easy: Questions about nationwide identity systems. Washington, D.C: National Academy.
Langley, P., & Sage, S. (1994). Induction of Selective Bayesian Classifiers, the 10th Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA.
Marshall, B., Kaza, S., Xu, J., Atabakhsh, H., Petersen, T., Violette, C., et al. (2004). Cross-jurisdictional criminal activity networks to support border and transportation security. Paper presented at the the 7th Annual IEEE Conference on Intelligent Transportation Systems (ITSC 2004), Washington, D.C.
Matsumoto, T., Matsumoto, H., Yamada, K., & Hoshino, S. (2002). Impact of artificial gummy fingers on fingerprint systems, SPIE, Optical Security and Counterfeit Deterrence Techniques IV (Vol. 4677).
Monge, A. E. (2000). Matching algorithms within a duplicate detection system. IEEE Data Engineering Bulletin, 23(4), 14–20.
Mumford, E. (1999). Dangerous decisions—Problem solving in tomorrow’s world. New York: Kluwer Academic/Plenum.
Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Machine Learning, 39, 103–134.
O’Neil, P. (2005). Complexity and counterterrorism: thinking about biometrics. Studies in Conflict & Terrorism, 28(6), 547–566.
Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. Advances in Neural Information Processing Systems, 1425–1432.
Pistole, J. (2003). Fraudulent identification documents and the implications for homeland security. Statement for the Record Before the House Select Committee on Homeland Security. Retrieved October, 1, 2003.
Privacy International. (2004). Mistaken identity; Exploring the relationship between national identity cards & the prevention of terrorism. London: Privacy International.
Ravikumar, P., & Cohen, W. W. (2004). A hierarchical graphical model for record linkage, Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04). Banff, Canada: Banff Park Lodge.
Stryker, S., & Serpe, R. T. (1982). Commitment, identity salience, and role behavior: Theory and research example. New York: Springer-Verlag.
Tajfel, H., & Turner, J. C. (1986). The social identity theory of inter-group behavior. Chicago: Nelson-Hall.
Turner, J. C. (1999). Some current issues in research on social identity and self-categorization theories. Oxford: Blackwell.
U.S. Department of State. (2007). Country Reports on Terrorism 2006.
United Kingdom Home Office. (2002). Identity fraud: A study.
Wang, G., Chen, H., & Atabakhsh, H. (2004). Automatically detecting deceptive criminal identities. Communications of the ACM, 47(3), 71–76.
Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A multi-layer naïve bayes model for approximate identity matching. Lecture Notes in Computer Science, 3975, 479–484.
Wang, G. A., Chen, H. C., Xu, J. J., & Atabakhsh, H. (2006). Automatically detecting criminal identity deception: an adaptive detection algorithm. IEEE Transactions on Systems Man and Cybernetics Part a-Systems and Humans, 36(5), 988–999.
Went, P. C. (2007). The necessity of fuzzy logic for identity matching. In H. Chen, R. Santana, R. Ramesh, A. Vinze & D. Zeng (Eds.), National Security (Vol. 2, pp. 442): Elsevier Science.
Winkler, W. E. (2002). Methods for record linkage and Bayesian Networks. Paper presented at the Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, Virginia.
Zhang, H. (2005). Exploring conditions for the optimality of naïve Bayes. International Journal of Pattern Recognition and Artificial Intelligence, 19(2), 183–198.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, J., Wang, G.A. & Chen, H. Identity matching using personal and social identity features. Inf Syst Front 13, 101–113 (2011). https://doi.org/10.1007/s10796-010-9270-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-010-9270-0