Enhancing POI search on maps via online address extraction and associated information segmentation

Chang, Chia-Hui; Chuang, Hsiu-Min; Huang, Chia-Yi; Su, Yueng-Sheng; Li, Shu-Ying

doi:10.1007/s10489-015-0707-5

Enhancing POI search on maps via online address extraction and associated information segmentation

Published: 15 October 2015

Volume 44, pages 539–556, (2016)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chia-Hui Chang¹,
Hsiu-Min Chuang¹,
Chia-Yi Huang¹,
Yueng-Sheng Su¹ &
…
Shu-Ying Li¹

492 Accesses
10 Citations
Explore all metrics

Abstract

With the popularity of wireless networks and mobile devices, we have seen rapid growth in mobile applications and services, especially location-based services. However, most existing location-based services like Google Maps and Wikimapia rely on crowd-sourcing or business-data providers to maintain their points-of-interest (POI) databases, which are slow and insufficient. Because most updated information can be found on the Web, the insufficiency of current POI databases can be complemented by automatically extracting POIs and their descriptions from general webpages. In this study, we enhance location-based search on maps via online address extraction and associated information segmentation. Given a POI query that cannot be found on a map, we propose a method for extracting the address from search snippets of the query to exploit information from the Web. We demonstrate the application of sequence labeling to Chinese postal-address extraction and compare the performance with and without Chinese word segmentation. Meanwhile, we also present a novel algorithm for associated information segmentation by making use of a document-object model (DOM) tree structure based on the farthest distinguishable ancestor (FDA) of each address. The FDA algorithm is able to locate associated information for each Chinese address resulting in an improvement from an F-measure of 0.811 to 0.964.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Geo-Tagging Framework for Address Extraction from Web Pages

Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose City

Article 13 March 2023

Postal address extraction from the web: a comprehensive survey

Article 14 March 2021

Notes

http://searchengineland.com/study-43-percent-of-total-google-search-queries-have-local-intent-135428
http://www.gvo.com.tw/
If an address is scattered over multiple nodes, these nodes will be combined into one terminal node.
A ratio of 1:10 means that the quantity of testing data is 10 times that of the training data.
(PowerPOI), https://play.google.com/store/apps/details?id=com.widmlab.powerpoi

References

Ahlers D, Boll S (2007) Location-based web search, the geospatial web. Springer, pp 55–66
Ahlers D, Boll S (2008) Retrieving address-based locations from the web. In: GIR, pp 27–34
Ahlers D (2013) Business entity retrieval and data provision for yellow pages by local search. In: ECIR
Asadi S, Yang G, Zhou X, Shi Y, Zhai B, Jiang W (2008) Pattern-based extraction of addresses from web page content. In: APWeb, pp 407–418
Baum L E, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6):1554–1563
Article MathSciNet MATH Google Scholar
Borges K A V, Laender A H F, Medeiros C B, Davis C A (2007) Discovering geographic locations in web pages using urban addresses. In: GIR, pp 31–36
Borkar V R, Deshmukh K, Sarawagi S (2000) Automatically extracting structure from free text addresses. IEEE Data Eng Bull 23(4):27–32
Google Scholar
Buttler D, Liu L, Pu C (2001) A fully automated object extraction system for the world wide web. In: ICDCS, pp 361– 370
Cafarella M J, Madhavan J, Halevy A (2008) Web-scale extraction of structured data. ACM SIGMOD 34(4):55–61
Google Scholar
Cai W, Wang S, Jiang Q (2005) Address extraction: extraction of location-based information from the web. In: APWeb, pp 925– 937
Chang C-H, Lui S-C (2001) IEPAD: information extraction based on pattern discovery. In: Proceedings of the 10th international conference on World Wide Web (WWW ’01). ACM, New York, NY, USA, pp 681–688
Chang C-H, Kayed M (2006) A survey of web information extraction systems. IEEE Trans Knowl Data Eng 18:1411– 1428
Article Google Scholar
Chang C-H, Li S-Y (2010) MapMarker: extraction of postal addresses and associated information for general web pages. In: WI, pp 105–111
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
MATH Google Scholar
CRF++ (2005) Yet Another CRF toolkit. Available from: http://crfpp.sourceforge.net/
Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th anniversary meeting of the association for computational linguistics (ACL)
Laender A H F, Ribeiro-Neto B A, da Silva A S, Teixeira J S (2002) A brief survey of web data extraction tools. SIGMOD Record 31(2):84–93
Article Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp 282–289
Lin C, Zhang Q, Meng X, Liu W (2005) Postal address detection from web documents. In: WIRI, pp 40–45
Liu B, Grossman R L, Zhai Y (2003) Mining data records in web pages. In: SIGKDD, pp 601–606
McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. In: ICML, pp 591–598
McCallum A (2003) Efficiently inducing features of conditional random fields. In: UAI, pp 403–410
Nagabhushan P, Angadi S A, Anami B S (2006) A fuzzy symbolic inference system for postal address component extraction labeling. In: FSKD, pp 937–946
Ourioupina O (2002) Extracting geographical knowledge from the Internet. In: ICDMAM, pp 108–113
Pasternack J, Roth D (2009) Extracting article text from the web with maximum subsequence segmentation. In: WWW, pp 971–980
Raggett D (2008) HTML Tidy Library Project. Available from: http://tidy.sourceforge.net/
Ruzzo W L, Tompa M (1999) A linear time algorithm for finding all maximal scoring subsequences
Sleiman H A, Corchuelo R (2013). IEEE Trans Knowl Data Eng 25(9):1960–1981
Article Google Scholar
Stirling G (2014) Study: 78 percent of local-mobile searches result in offline purchases, Search Engine Land, Apr. 9, 2014
Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning, Introduction to Statistical Relational Learning. MIT Press
Uryupina O (2003) Semi-supervised learning of geographical gazetteers from the internet. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada, pp 18–25
Yu Z (2007) High accuracy postal address extraction from web pages. Dalhousie University
Zhai Y, Liu B (2005) Web data extraction based on partial tree alignment. In: WWW, pp 76–85
Zhao H, Meng W, Yu C (2006) Automatic extraction of dynamic record sections from search engine result pages

Download references

Acknowledgments

This work is partially sponsored by the Ministry of Science and Technology, Taiwan under grant 103-2221-E-008-.

Author information

Authors and Affiliations

Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
Chia-Hui Chang, Hsiu-Min Chuang, Chia-Yi Huang, Yueng-Sheng Su & Shu-Ying Li

Authors

Chia-Hui Chang
View author publications
You can also search for this author in PubMed Google Scholar
Hsiu-Min Chuang
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Yi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yueng-Sheng Su
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Ying Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chia-Hui Chang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, CH., Chuang, HM., Huang, CY. et al. Enhancing POI search on maps via online address extraction and associated information segmentation. Appl Intell 44, 539–556 (2016). https://doi.org/10.1007/s10489-015-0707-5

Download citation

Published: 15 October 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s10489-015-0707-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing POI search on maps via online address extraction and associated information segmentation

Abstract

Access this article

Similar content being viewed by others

A Geo-Tagging Framework for Address Extraction from Web Pages

Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose City

Postal address extraction from the web: a comprehensive survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhancing POI search on maps via online address extraction and associated information segmentation

Abstract

Access this article

Similar content being viewed by others

A Geo-Tagging Framework for Address Extraction from Web Pages

Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose City

Postal address extraction from the web: a comprehensive survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation