Abstract
It is estimated that between 80% and 90% of governmental data collections contain address information. Geocoding – the process of assigning geographic coordinates to addresses – is becoming increasingly important in application areas that involve the analysis and mining of such data. In many cases, address records are captured and/or stored in a free-form or inconsistent manner. This fact complicates the task of accurately matching such addresses to spatially-annotated reference data. In this paper we describe a geocoding system that is based on a comprehensive high-quality geocoded national address database. It uses a learning address parser based on hidden Markov models to segment free-form addresses into components, and a rule-based matching engine to determine the best matches to the reference database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boulos, M.N.K.: Towards evidence-based, GIS-driven national spatial health information infrastructure and surveillance services in the United Kingdom. International Journal of Health Geographics 2004 3(1) (2004), Available online at http://www.ij-healthgeographics.com/content/3/1/1
Cayo, M.R., Talbot, T.O.: Positional error in automated geocoding of residential addresses. International Journal of Health Geographics 2(10) (2003), Available online at http://www.ij-healthgeographics.com/content/2/1/10
Christen, P., Churches, T., Hegland, M.: A Parallel Open Source Data Linkage System. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 638–647. Springer, Heidelberg (2004)
Churches, T., Christen, P., Lim, K., Zhu, J.X.: Preparation of name and address data for record linkage using hidden Markov models. BioMed Central Medical Informatics and Decision Making 2(9) (December 2002), Available online at http://www.biomedcentral.com/1472-6947/2/9/
Churches, T., Christen, P.: Some methods for blindfolded record linkage. BioMed Central Medical Informatics and Decision Making 4(9) (June 2004), Available online at http://www.biomedcentral.com/1472-6947/4/9/
Shearer, C.: The CRISP-DM Model: The new blueprint for data mining. Journal of Data Warehousing 5(4), 13–22 (Fall 2000)
Ester, M., Kriegel, H.-P., Sander, J.: Spatial Data Mining: A Database Approach. In: Scholl, M.O., Voisard, A. (eds.) SSD 1997. LNCS, vol. 1262, pp. 48–66. Springer, Heidelberg (1997)
Fellegi, I., Sunter, A.: A Theory for Record Linkage. Journal of the American Statistical Society (1969)
Hok, P.: Development of a Blind Geocoding System. Honours thesis, Department of Computer Science, Australian National University, Canberra (November 2004)
AutoStan and AutoMatch, User’s Manuals, MatchWare Technologies, Kennebunk, Maine (1998)
Centre for Epidemiology and Research, NSW Department of Health. New South Wales Mothers and Babies 2001. NSW Public Health Bull. 13(S-4) (2002)
O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-Preserving Data Linkage Protocols. In: Proceedings of the Workshop on Privacy in the Electronic Society (WPES 2004), Washington, DC (October 2004)
Paull, D.L.: A geocoded National Address File for Australia: The G-NAF What, Why, Who and When? PSMA Australia Limited, Griffith, ACT, Australia (2003), Available online at http://www.g-naf.com.au/
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., San Francisco (1999)
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2) (February 1989)
Rahm, E., Do, H.H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin (2000)
US Federal Geographic Data Committee. Homeland Security and Geographic Information Systems – How GIS and mapping technology can save lives and protect property in post-September 11th America. Public Health GIS News and Information (52), 21–23 (May 2003)
Winkler, W.E.: The State of Record Linkage and Current Research Problems. RR99/03, US Bureau of the Census (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Christen, P., Willmore, A., Churches, T. (2006). A Probabilistic Geocoding System Utilising a Parcel Based Address File. In: Williams, G.J., Simoff, S.J. (eds) Data Mining. Lecture Notes in Computer Science(), vol 3755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677437_11
Download citation
DOI: https://doi.org/10.1007/11677437_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32547-5
Online ISBN: 978-3-540-32548-2
eBook Packages: Computer ScienceComputer Science (R0)