Identity matching using personal and social identity features

Li, Jiexun; Wang, G. Alan; Chen, Hsinchun

doi:10.1007/s10796-010-9270-0

Identity matching using personal and social identity features

Published: 29 September 2010

Volume 13, pages 101–113, (2011)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Jiexun Li¹,
G. Alan Wang² &
Hsinchun Chen³

910 Accesses
26 Citations
Explore all metrics

Abstract

Identity verification is essential in our mission to identify potential terrorists and criminals. It is not a trivial task because terrorists reportedly assume multiple identities using either fraudulent or legitimate means. A national identification card and biometrics technologies have been proposed as solutions to the identity problem. However, several studies show their inability to tackle the complex problem. We aim to develop data mining alternatives that can match identities referring to the same individual. Existing identity matching techniques based on data mining primarily rely on personal identity features. In this research, we propose a new identity matching technique that considers both personal identity features and social identity features. We define two groups of social identity features including social activities and social relations. The proposed technique is built upon a probabilistic relational model that utilizes a relational database structure to extract social identity features. Experiments show that the social activity features significantly improve the matching performance while the social relation features effectively reduce false positive and false negative decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Ananthakrishna, R., Chaudhuri, S., & Ganti, V. (2002). Eliminating Fuzzy Duplicates in Data Warehouses, Proceedings of the 28th VLDB Conference. Hong Kong, China.
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S., & Widom, J. (2009). Swoosh: a generic approach to entity resolution. The VLDB Journal, 18(1), 255–276.
Article Google Scholar
Bhattacharya, I., & Getoor, L. (2006a). Entity Resolution in Graphs. In D. J. Cook & L. B. Holder (Eds.), Mining graph data (p. 311). Hoboken: Wiley.
Chapter Google Scholar
Bhattacharya, I., & Getoor, L. (2006b). A Latent Dirichlet Model for Unsupervised Entity Resolution, Proceedings of the 6th SIAM Conference on Data Mining (SIAM SDM-06). Bethesda.
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., & Fienberg, S. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5), 16–23.
Article Google Scholar
Brown, D. E., & Hagen, S. C. (2003). Data association methods with applications to law enforcement. Decision Support Systems, 34(4), 369–378.
Article Google Scholar
Cheek, J. M., & Briggs, S. R. (1982). Self-consciousness and aspects of identity. Journal of Research in Personality, 16(4), 401–408.
Article Google Scholar
Conyers, R., & Sensenbrenner, F. (2005). Real Id Act of 2005. Congressional Record House, 151, 14.
Google Scholar
Culotta, A., & McCallum, A. (2005). Joint deduplication of multiple record types in relational data, Proceedings of the 14th ACM international conference on Information and knowledge management. Bremen: ACM.
Deaux, K., & Martin, D. (2003). Interpersonal networks and social categories: specifying levels of context in identity processes. Social Psychology Quarterly, 66(2), 101–117.
Article Google Scholar
Dey, D., Sarkar, S., & De, P. (1998). A probabilistic decision model for entity matching in heterogeneous databases. Management Science, 44(10), 1379–1395.
Article Google Scholar
Dey, D., Sarkar, S., & De, P. (2002). A distance-based approach to entity reconciliation in heterogeneous databases. IEEE Transactions on Knowledge and Data Engineering, 14(3), 567–582.
Article Google Scholar
Elmagarmid, A., Ipeirotis, P., & Verykios, V. (2007). Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.
Article Google Scholar
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183–1210.
Article Google Scholar
Finch, E. (2003). What a tangled web we weave: Identity theft and the internet. In Dot Cons: Crime, deviance, and identity on the internet (pp. 86-104). Collompton, England: Willan.
Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models, Proceedings of the 16th International Joint Conference on Artificial Intelligence (Vol. 16, pp. 1300–1309). Stockholm, Sweden: Citeseer.
Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2003). Learning probabilistic models of link structure. The Journal of Machine Learning Research, 3, 679–707.
Article Google Scholar
Hernandez, M. A., & Stolfo, S. J. (1995). The Merge/Purge Problem for Large Databases. Paper presented at the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, CA.
Jonas, J. (2006). Identity Resolution: 23 Years of practical experience and observations at scale, Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. Chicago: ACM.
Google Scholar
Kalashnikov, D., Mehrotra, S., & Chen, Z. (2005). Exploiting relationships for domain-independent data cleaning, Proceedings of SIAM International Conference on Data Mining. Newport Beach, CA.
Kean, T., Kojm, C., Zelikow, P., Thompson, J., Gorton, S., Roemer, T., et al. (2004). The 9/11 Commission Report.
Kent, S. T., & Millett, L. I. (2002). Ids–not that easy: Questions about nationwide identity systems. Washington, D.C: National Academy.
Google Scholar
Langley, P., & Sage, S. (1994). Induction of Selective Bayesian Classifiers, the 10th Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA.
Marshall, B., Kaza, S., Xu, J., Atabakhsh, H., Petersen, T., Violette, C., et al. (2004). Cross-jurisdictional criminal activity networks to support border and transportation security. Paper presented at the the 7th Annual IEEE Conference on Intelligent Transportation Systems (ITSC 2004), Washington, D.C.
Matsumoto, T., Matsumoto, H., Yamada, K., & Hoshino, S. (2002). Impact of artificial gummy fingers on fingerprint systems, SPIE, Optical Security and Counterfeit Deterrence Techniques IV (Vol. 4677).
Monge, A. E. (2000). Matching algorithms within a duplicate detection system. IEEE Data Engineering Bulletin, 23(4), 14–20.
Google Scholar
Mumford, E. (1999). Dangerous decisions—Problem solving in tomorrow’s world. New York: Kluwer Academic/Plenum.
Google Scholar
Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Machine Learning, 39, 103–134.
Article Google Scholar
O’Neil, P. (2005). Complexity and counterterrorism: thinking about biometrics. Studies in Conflict & Terrorism, 28(6), 547–566.
Article Google Scholar
Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. Advances in Neural Information Processing Systems, 1425–1432.
Pistole, J. (2003). Fraudulent identification documents and the implications for homeland security. Statement for the Record Before the House Select Committee on Homeland Security. Retrieved October, 1, 2003.
Privacy International. (2004). Mistaken identity; Exploring the relationship between national identity cards & the prevention of terrorism. London: Privacy International.
Google Scholar
Ravikumar, P., & Cohen, W. W. (2004). A hierarchical graphical model for record linkage, Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04). Banff, Canada: Banff Park Lodge.
Google Scholar
Stryker, S., & Serpe, R. T. (1982). Commitment, identity salience, and role behavior: Theory and research example. New York: Springer-Verlag.
Google Scholar
Tajfel, H., & Turner, J. C. (1986). The social identity theory of inter-group behavior. Chicago: Nelson-Hall.
Google Scholar
Turner, J. C. (1999). Some current issues in research on social identity and self-categorization theories. Oxford: Blackwell.
Google Scholar
U.S. Department of State. (2007). Country Reports on Terrorism 2006.
United Kingdom Home Office. (2002). Identity fraud: A study.
Wang, G., Chen, H., & Atabakhsh, H. (2004). Automatically detecting deceptive criminal identities. Communications of the ACM, 47(3), 71–76.
Article Google Scholar
Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A multi-layer naïve bayes model for approximate identity matching. Lecture Notes in Computer Science, 3975, 479–484.
Article Google Scholar
Wang, G. A., Chen, H. C., Xu, J. J., & Atabakhsh, H. (2006). Automatically detecting criminal identity deception: an adaptive detection algorithm. IEEE Transactions on Systems Man and Cybernetics Part a-Systems and Humans, 36(5), 988–999.
Article Google Scholar
Went, P. C. (2007). The necessity of fuzzy logic for identity matching. In H. Chen, R. Santana, R. Ramesh, A. Vinze & D. Zeng (Eds.), National Security (Vol. 2, pp. 442): Elsevier Science.
Winkler, W. E. (2002). Methods for record linkage and Bayesian Networks. Paper presented at the Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, Virginia.
Zhang, H. (2005). Exploring conditions for the optimality of naïve Bayes. International Journal of Pattern Recognition and Artificial Intelligence, 19(2), 183–198.
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Technology, Drexel University, 3141 Chestnut Street, Philadelphia, PA, 19104, USA
Jiexun Li
Pamplin College of Business, Virginia Tech, Blacksburg, VA, 24060, USA
G. Alan Wang
Department of MIS, Eller College of Management, University of Arizona, 1130 E Helen Street Room 430, Tucson, AZ, 85721, USA
Hsinchun Chen

Authors

Jiexun Li
View author publications
You can also search for this author in PubMed Google Scholar
G. Alan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hsinchun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiexun Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Wang, G.A. & Chen, H. Identity matching using personal and social identity features. Inf Syst Front 13, 101–113 (2011). https://doi.org/10.1007/s10796-010-9270-0

Download citation

Published: 29 September 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s10796-010-9270-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Identity matching using personal and social identity features

Abstract

Access this article

Similar content being viewed by others

A framework of identity resolution: evaluating identity attributes and matching algorithms

A Hybrid Model for Linking Multiple Social Identities Across Heterogeneous Online Social Networks

Matching user accounts across social networks based on username and display name

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identity matching using personal and social identity features

Abstract

Access this article

Similar content being viewed by others

A framework of identity resolution: evaluating identity attributes and matching algorithms

A Hybrid Model for Linking Multiple Social Identities Across Heterogeneous Online Social Networks

Matching user accounts across social networks based on username and display name

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation