A Dictionary-Based Approach to Fast and Accurate Name Matching in Large Law Enforcement Databases

Kursun, Olcay; Koufakou, Anna; Chen, Bing; Georgiopoulos, Michael; Reynolds, Kenneth M.; Eaglin, Ron

doi:10.1007/11760146_7

Olcay Kursun²¹,
Anna Koufakou²²,
Bing Chen²²,
Michael Georgiopoulos²²,
Kenneth M. Reynolds²³ &
…
Ron Eaglin²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3975))

Included in the following conference series:

International Conference on Intelligence and Security Informatics

1842 Accesses
1 Citations

Abstract

In the presence of dirty data, a search for specific information by a standard query (e.g., search for a name that is misspelled or mistyped) does not return all needed information. This is an issue of grave importance in homeland security, criminology, medical applications, GIS (geographic information systems) and so on. Different techniques, such as soundex, phonix, n-grams, edit-distance, have been used to improve the matching rate in these name-matching applications. There is a pressing need for name matching approaches that provide high levels of accuracy, while at the same time maintaining the computational complexity of achieving this goal reasonably low. In this paper, we present ANSWER, a name matching approach that utilizes a prefix-tree of available names in the database. Creating and searching the name dictionary tree is fast and accurate and, thus, ANSWER is superior to other techniques of retrieving fuzzy name matches in large databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kim, W.: On Database Technology for US Homeland Security. Journal of Object Technology 1(5), 43–49 (2002)
Article Google Scholar
Taipale, K.A.: Data Mining & Domestic Security: Connecting the Dots to Make Sense of Data. The Columbia Science & Technology Law Review 5, 1–83 (2003)
Google Scholar
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intelligent Systems 18(5), 16–23 (2003)
Article Google Scholar
Wang, G., Chen, H., Atabakhsh, H.: Automatically detecting deceptive criminal identities. Communications of the ACM 47(3), 70–76 (2004)
Article Google Scholar
Pfeifer, U., Poersch, T., Fuhr, N.: Searching Proper Names in Databases. In: Proceedings of the Hypertext - Information Retrieval – Multimedia (HIM 1995), vol. 20, pp. 259–276 (1995)
Google Scholar
Winkler, W.E.: The state of record linkage and current research problems. In: Proceedings of the Section on Survey Methods of the Statistical Society of Canada (1999)
Google Scholar
Monge, A.E., Elkan, C.P.: An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In: Proceedings of the ACM-SIGMOD Workshop on Research Issues on Knowledge Discovery and Data Mining, Tucson, AZ (1997)
Google Scholar
Newcombe, H.B., Kennedy, J.M., Axford, S.J., James, A.P.: Automatic linkage of vital records. In: Science, vol. 3381, pp. 954–959 (1959)
Google Scholar
Levenshtein, V.L.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics, Doklady 10, 707–710 (1966)
MathSciNet Google Scholar
Jaro, M.A.: “UNIMATCH: A Record Linkage System: User’s Manual. Technical Report”, U.S. Bureau of the Census, Washington, DC (1976)
Google Scholar
Zobel, J., Dart, P.: Finding approximate matches in large lexicons. Software-Practice and Experience 25(3), 331–345 (1995)
Article Google Scholar
Wilcox, J.: Police Agencies Join Forces To Build Data-Sharing Networks: Local, State, and Federal Crimefighters Establish IT Posses, Government Computer News (September 1997)
Google Scholar
Maxwell, T.: Information, Data Mining, and National Security: False Positives and Unidentified Negatives. In: Proceedings of the 38th Hawaii International Conference on System Science (2005)
Google Scholar
Hernandez, M., Stolfo, S.: Real-world Data is Dirty: Data Cleansing and the Merge/purge Problems. Data Mining Knowledge Discovery 2, 9–37 (1998)
Article Google Scholar
Mihov, S., Schulz, K.U.: Fast Approximate Search in Large Dictionaries. Journal of Computational Linguistics 30(4), 451–477 (2004)
Article MathSciNet Google Scholar
Aoe, J., Morimoto, K., Shishibori, M., Park, K.: A Trie Compaction Algorithm for a Large Set of Keys. IEEE Transactions on Knowledge and Data Engineering 8(3), 476–491 (2001)
Article Google Scholar
Navarro, G.: A Guided Tour to Approximate String Matching. ACM Computing Surveys 33(1), 31–88 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering Technology, University of Central Florida, Orlando, FL, 32816
Olcay Kursun & Ron Eaglin
School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL, 32816
Anna Koufakou, Bing Chen & Michael Georgiopoulos
Department of Criminal Justice and Legal Studies, University of Central Florida, Orlando, FL, 32816
Kenneth M. Reynolds

Authors

Olcay Kursun
View author publications
You can also search for this author in PubMed Google Scholar
Anna Koufakou
View author publications
You can also search for this author in PubMed Google Scholar
Bing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Georgiopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth M. Reynolds
View author publications
You can also search for this author in PubMed Google Scholar
Ron Eaglin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information and Computer Science, University of California, Irvine
Sharad Mehrotra
MIS Department, University of Arizona, 85721, Tucson, AZ, USA
Daniel D. Zeng
Department of Management Information Systems, Eller College of Management, The University of Arizona, 85721, AZ, USA
Hsinchun Chen
University of Texas at Dallas,
Bhavani Thuraisingham
Chinese Academy of Sciences, 100190, Beijing, China
Fei-Yue Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kursun, O., Koufakou, A., Chen, B., Georgiopoulos, M., Reynolds, K.M., Eaglin, R. (2006). A Dictionary-Based Approach to Fast and Accurate Name Matching in Large Law Enforcement Databases. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, FY. (eds) Intelligence and Security Informatics. ISI 2006. Lecture Notes in Computer Science, vol 3975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11760146_7

Download citation

DOI: https://doi.org/10.1007/11760146_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34478-0
Online ISBN: 978-3-540-34479-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics