skip to main content
10.1145/2494188.2494193acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesi-knowConference Proceedingsconference-collections
research-article

Extraction of Address Data from Unstructured Text using Free Knowledge Resources

Published: 04 September 2013 Publication History

Abstract

The Web is populated with many Web sites containing unstructured textual information. These Web sites are a source of knowledge for various interests. As semantic annotations are only rarely used on Web sites, an automated harvesting of the knowledge without additional effort is not possible. Thus, elaborated approaches for information extraction are required. In our work we face the challenge of identifying business address data on Web sites since we see the need for this data in various applications. In order to accomplish our aim, we have developed a hybrid approach combining patterns and gazetteers obtained from freely available knowledge resources such as OpenStreetMap. Experimental evaluation on a corpus of heterogeneous Web sites shows a high recall and precision. The approach can be adapted for identification of addresses considering the different formats in various countries.

References

[1]
D. Ahlers and S. Boll. Retrieving Address-based Locations from the Web. In Proceedings of the 2nd international workshop on Geographic information retrieval, GIR '08, pages 27--34, New York, NY, USA, 2008. ACM.
[2]
S. Asadi, G. Yang, X. Zhou, Y. Shi, B. Zhai, and W.-R. Jiang. Pattern-Based Extraction of Addresses from Web Page Content. In Y. Zhang, G. Yu, E. Bertino, and G. Xu, editors, Progress in WWW Research and Development, volume 4976 of Lecture Notes in Computer Science, pages 407--418. Springer Berlin Heidelberg, 2008.
[3]
W. Cai, S. Wang, and Q. Jiang. Address extraction: Extraction of location-based information from the web. In Y. Zhang, K. Tanaka, J. Yu, S. Wang, and M. Li, editors, Web Technologies Research and Development - APWeb 2005, volume 3399 of Lecture Notes in Computer Science, pages 925--937. Springer Berlin Heidelberg, 2005.
[4]
F. Dawson and T. Howes. vCard MIME Directory Profile. RFC 2426, IETF, September 1998.
[5]
T. Kagehiro, M. Koga, H. Sako, and H. Fujisawa. Address-block extraction by Bayesian rule. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 2, pages 582--585 Vol.2, Aug.
[6]
T. Lee, J. Hendler, O. Lassila, et al. The semantic web. Scientific American, 284(5):34--43, 2001.
[7]
B. Loos and C. Biemann. Supporting Web-based Address Extraction with Unsupervised Tagging. In C. Preisach, H. Burkhardt, L. Schmidt-Thieme, and R. Decker, editors, Data Analysis, Machine Learning and Applications, Studies in Classification, Data Analysis, and Knowledge Organization, pages 577--584. Springer Berlin Heidelberg, 2008.
[8]
A. Luberg, P. Järv, K. Schoefegger, and T. Tammet. Context-aware and Multilingual Information Extraction for a Tourist Recommender System. In Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies, i-KNOW '11, pages 13:1--13:8, New York, NY, USA, 2011. ACM.
[9]
H. Mühleisen and C. Bizer. Web Data Commons - Extracting Structured Data from Two Large Web Corpora. In Proceedings of the 5th Workshop on Linked Data on the Web, 2012.
[10]
Netcraft. April 2012 Web Server Survey. http://news.netcraft.com/archives/2012/04/04/april-2012-web-server-survey.html, 2012. {Online; accessed 27-February-2013}.
[11]
K. Zickuhr. Three-quarters of smartphone owners use location-based services. Pew Internet & American Life Project, 2012.

Cited By

View all
  • (2024)Quantifying Geospatial in the Common Crawl CorpusProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691286(585-588)Online publication date: 29-Oct-2024
  • (2022)Spatial, Temporal, and Semantic Crime Analysis Using Information Extraction From Online News2022 8th International Conference on Web Research (ICWR)10.1109/ICWR54782.2022.9786256(40-46)Online publication date: 11-May-2022
  • (2022)A Text Structural Analysis Model for Address ExtractionNatural Language Processing and Information Systems10.1007/978-3-031-08473-7_23(255-266)Online publication date: 13-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
i-Know '13: Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
September 2013
271 pages
ISBN:9781450323000
DOI:10.1145/2494188
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Graz University of Technology: Graz University of Technology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information extraction
  2. address extraction
  3. knowledge discovery

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

i-Know '13

Acceptance Rates

i-Know '13 Paper Acceptance Rate 27 of 87 submissions, 31%;
Overall Acceptance Rate 77 of 238 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Quantifying Geospatial in the Common Crawl CorpusProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691286(585-588)Online publication date: 29-Oct-2024
  • (2022)Spatial, Temporal, and Semantic Crime Analysis Using Information Extraction From Online News2022 8th International Conference on Web Research (ICWR)10.1109/ICWR54782.2022.9786256(40-46)Online publication date: 11-May-2022
  • (2022)A Text Structural Analysis Model for Address ExtractionNatural Language Processing and Information Systems10.1007/978-3-031-08473-7_23(255-266)Online publication date: 13-Jun-2022
  • (2021)Postal address extraction from the web: a comprehensive surveyArtificial Intelligence Review10.1007/s10462-021-09983-1Online publication date: 14-Mar-2021
  • (2020)Towards a Social-media Driven Multi-Drone Tasking platform2020 International Conference on Unmanned Aircraft Systems (ICUAS)10.1109/ICUAS48674.2020.9213846(573-581)Online publication date: Sep-2020
  • (2019)Constructing a Comprehensive Events Database from the WebProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357986(229-238)Online publication date: 3-Nov-2019
  • (2019)A Spatiotemporal Semantic Search Engine For Cultural Events2019 5th International Conference on Web Research (ICWR)10.1109/ICWR.2019.8765287(117-122)Online publication date: Apr-2019
  • (2018)Automatic Chinese Postal Address Block Location Using Proximity Descriptors and Cooperative Profit Random ForestsIEEE Transactions on Industrial Electronics10.1109/TIE.2017.276486665:5(4401-4412)Online publication date: May-2018
  • (2016)Geographical localization of web domains and organization addresses recognition by employing natural language processing, Pattern Matching and clusteringEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.01.01151:C(202-211)Online publication date: 1-May-2016
  • (2014)Ge(o)Lo(cator)Proceedings of the 2014 9th International Workshop on Semantic and Social Media Adaptation and Personalization10.1109/SMAP.2014.27(60-65)Online publication date: 6-Nov-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media