Skip to main content

A Dictionary-Based Approach to Fast and Accurate Name Matching in Large Law Enforcement Databases

  • Conference paper
Intelligence and Security Informatics (ISI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3975))

Included in the following conference series:

Abstract

In the presence of dirty data, a search for specific information by a standard query (e.g., search for a name that is misspelled or mistyped) does not return all needed information. This is an issue of grave importance in homeland security, criminology, medical applications, GIS (geographic information systems) and so on. Different techniques, such as soundex, phonix, n-grams, edit-distance, have been used to improve the matching rate in these name-matching applications. There is a pressing need for name matching approaches that provide high levels of accuracy, while at the same time maintaining the computational complexity of achieving this goal reasonably low. In this paper, we present ANSWER, a name matching approach that utilizes a prefix-tree of available names in the database. Creating and searching the name dictionary tree is fast and accurate and, thus, ANSWER is superior to other techniques of retrieving fuzzy name matches in large databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kim, W.: On Database Technology for US Homeland Security. Journal of Object Technology 1(5), 43–49 (2002)

    Article  Google Scholar 

  2. Taipale, K.A.: Data Mining & Domestic Security: Connecting the Dots to Make Sense of Data. The Columbia Science & Technology Law Review 5, 1–83 (2003)

    Google Scholar 

  3. Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intelligent Systems 18(5), 16–23 (2003)

    Article  Google Scholar 

  4. Wang, G., Chen, H., Atabakhsh, H.: Automatically detecting deceptive criminal identities. Communications of the ACM 47(3), 70–76 (2004)

    Article  Google Scholar 

  5. Pfeifer, U., Poersch, T., Fuhr, N.: Searching Proper Names in Databases. In: Proceedings of the Hypertext - Information Retrieval – Multimedia (HIM 1995), vol. 20, pp. 259–276 (1995)

    Google Scholar 

  6. Winkler, W.E.: The state of record linkage and current research problems. In: Proceedings of the Section on Survey Methods of the Statistical Society of Canada (1999)

    Google Scholar 

  7. Monge, A.E., Elkan, C.P.: An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In: Proceedings of the ACM-SIGMOD Workshop on Research Issues on Knowledge Discovery and Data Mining, Tucson, AZ (1997)

    Google Scholar 

  8. Newcombe, H.B., Kennedy, J.M., Axford, S.J., James, A.P.: Automatic linkage of vital records. In: Science, vol. 3381, pp. 954–959 (1959)

    Google Scholar 

  9. Levenshtein, V.L.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics, Doklady 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  10. Jaro, M.A.: “UNIMATCH: A Record Linkage System: User’s Manual. Technical Report”, U.S. Bureau of the Census, Washington, DC (1976)

    Google Scholar 

  11. Zobel, J., Dart, P.: Finding approximate matches in large lexicons. Software-Practice and Experience 25(3), 331–345 (1995)

    Article  Google Scholar 

  12. Wilcox, J.: Police Agencies Join Forces To Build Data-Sharing Networks: Local, State, and Federal Crimefighters Establish IT Posses, Government Computer News (September 1997)

    Google Scholar 

  13. Maxwell, T.: Information, Data Mining, and National Security: False Positives and Unidentified Negatives. In: Proceedings of the 38th Hawaii International Conference on System Science (2005)

    Google Scholar 

  14. Hernandez, M., Stolfo, S.: Real-world Data is Dirty: Data Cleansing and the Merge/purge Problems. Data Mining Knowledge Discovery 2, 9–37 (1998)

    Article  Google Scholar 

  15. Mihov, S., Schulz, K.U.: Fast Approximate Search in Large Dictionaries. Journal of Computational Linguistics 30(4), 451–477 (2004)

    Article  MathSciNet  Google Scholar 

  16. Aoe, J., Morimoto, K., Shishibori, M., Park, K.: A Trie Compaction Algorithm for a Large Set of Keys. IEEE Transactions on Knowledge and Data Engineering 8(3), 476–491 (2001)

    Article  Google Scholar 

  17. Navarro, G.: A Guided Tour to Approximate String Matching. ACM Computing Surveys 33(1), 31–88 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kursun, O., Koufakou, A., Chen, B., Georgiopoulos, M., Reynolds, K.M., Eaglin, R. (2006). A Dictionary-Based Approach to Fast and Accurate Name Matching in Large Law Enforcement Databases. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, FY. (eds) Intelligence and Security Informatics. ISI 2006. Lecture Notes in Computer Science, vol 3975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11760146_7

Download citation

  • DOI: https://doi.org/10.1007/11760146_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34478-0

  • Online ISBN: 978-3-540-34479-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics