Skip to main content

A Probabilistic Geocoding System Utilising a Parcel Based Address File

  • Chapter
Data Mining

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3755))

Abstract

It is estimated that between 80% and 90% of governmental data collections contain address information. Geocoding – the process of assigning geographic coordinates to addresses – is becoming increasingly important in application areas that involve the analysis and mining of such data. In many cases, address records are captured and/or stored in a free-form or inconsistent manner. This fact complicates the task of accurately matching such addresses to spatially-annotated reference data. In this paper we describe a geocoding system that is based on a comprehensive high-quality geocoded national address database. It uses a learning address parser based on hidden Markov models to segment free-form addresses into components, and a rule-based matching engine to determine the best matches to the reference database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boulos, M.N.K.: Towards evidence-based, GIS-driven national spatial health information infrastructure and surveillance services in the United Kingdom. International Journal of Health Geographics 2004 3(1) (2004), Available online at http://www.ij-healthgeographics.com/content/3/1/1

  2. Cayo, M.R., Talbot, T.O.: Positional error in automated geocoding of residential addresses. International Journal of Health Geographics 2(10) (2003), Available online at http://www.ij-healthgeographics.com/content/2/1/10

  3. Christen, P., Churches, T., Hegland, M.: A Parallel Open Source Data Linkage System. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 638–647. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Churches, T., Christen, P., Lim, K., Zhu, J.X.: Preparation of name and address data for record linkage using hidden Markov models. BioMed Central Medical Informatics and Decision Making 2(9) (December 2002), Available online at http://www.biomedcentral.com/1472-6947/2/9/

  5. Churches, T., Christen, P.: Some methods for blindfolded record linkage. BioMed Central Medical Informatics and Decision Making 4(9) (June 2004), Available online at http://www.biomedcentral.com/1472-6947/4/9/

  6. Shearer, C.: The CRISP-DM Model: The new blueprint for data mining. Journal of Data Warehousing 5(4), 13–22 (Fall 2000)

    Google Scholar 

  7. Ester, M., Kriegel, H.-P., Sander, J.: Spatial Data Mining: A Database Approach. In: Scholl, M.O., Voisard, A. (eds.) SSD 1997. LNCS, vol. 1262, pp. 48–66. Springer, Heidelberg (1997)

    Google Scholar 

  8. Fellegi, I., Sunter, A.: A Theory for Record Linkage. Journal of the American Statistical Society (1969)

    Google Scholar 

  9. Hok, P.: Development of a Blind Geocoding System. Honours thesis, Department of Computer Science, Australian National University, Canberra (November 2004)

    Google Scholar 

  10. AutoStan and AutoMatch, User’s Manuals, MatchWare Technologies, Kennebunk, Maine (1998)

    Google Scholar 

  11. Centre for Epidemiology and Research, NSW Department of Health. New South Wales Mothers and Babies 2001. NSW Public Health Bull. 13(S-4) (2002)

    Google Scholar 

  12. O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-Preserving Data Linkage Protocols. In: Proceedings of the Workshop on Privacy in the Electronic Society (WPES 2004), Washington, DC (October 2004)

    Google Scholar 

  13. Paull, D.L.: A geocoded National Address File for Australia: The G-NAF What, Why, Who and When? PSMA Australia Limited, Griffith, ACT, Australia (2003), Available online at http://www.g-naf.com.au/

  14. Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., San Francisco (1999)

    Google Scholar 

  15. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2) (February 1989)

    Google Scholar 

  16. Rahm, E., Do, H.H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin (2000)

    Google Scholar 

  17. US Federal Geographic Data Committee. Homeland Security and Geographic Information Systems – How GIS and mapping technology can save lives and protect property in post-September 11th America. Public Health GIS News and Information (52), 21–23 (May 2003)

    Google Scholar 

  18. Winkler, W.E.: The State of Record Linkage and Current Research Problems. RR99/03, US Bureau of the Census (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Christen, P., Willmore, A., Churches, T. (2006). A Probabilistic Geocoding System Utilising a Parcel Based Address File. In: Williams, G.J., Simoff, S.J. (eds) Data Mining. Lecture Notes in Computer Science(), vol 3755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677437_11

Download citation

  • DOI: https://doi.org/10.1007/11677437_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32547-5

  • Online ISBN: 978-3-540-32548-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics