Efficient sampling methods for characterizing POIs on maps based on road networks

Zhou, Ziting; Zhao, Pengpeng; Sheng, Victor S.; Xu, Jiajie; Li, Zhixu; Wu, Jian; Cui, Zhiming

doi:10.1007/s11704-016-6146-6

Efficient sampling methods for characterizing POIs on maps based on road networks

Research Article
Published: 11 May 2018

Volume 12, pages 582–592, (2018)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Ziting Zhou¹,
Pengpeng Zhao¹,
Victor S. Sheng^2,3,
Jiajie Xu¹,
Zhixu Li¹,
Jian Wu¹ &
…
Zhiming Cui¹

48 Accesses
Explore all metrics

Abstract

With the rapid development of location-based services, a particularly important aspect of start-up marketing research is to explore and characterize points of interest (PoIs) such as restaurants and hotels on maps. However, due to the lack of direct access to PoI databases, it is necessary to rely on existing APIs to query PoIs within a region and calculate PoI statistics. Unfortunately, public APIs generally impose a limit on the maximum number of queries. Therefore, we propose effective and efficient sampling methods based on road networks to sample PoIs on maps and provide unbiased estimators for calculating PoI statistics. In general, the more intense the roads, the denser the distribution of PoIs is within a region. Experimental results show that compared with state-of-the-art methods, our sampling methods improve the efficiency of aggregate statistical estimations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group Trip Planning Queries on Road Networks Using Geo-Tagged Textual Information

Social media knows what road it is: quantifying road characteristics with geo-tagged posts

Article Open access 09 November 2017

User Multi-behavior Enhanced POI Recommendation with Efficient and Informative Negative Sampling

References

Dalvi N, Kumar R, Machanavajjhala A, Rastogi V. Sampling hidden objects using nearest-neighbor oracles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, 1325–1333
Google Scholar
Li Y H, Steiner M, Wang LM, Zhang Z L, Bao J. Dissecting foursquare venue popularity via random region sampling. In: Proceedings of ACM Conference on CoNEXT Student Workshop. 2012, 21–22
Google Scholar
Wang P H, He W B, Liu X. An efficient sampling method for characterizing points of interests on maps. In: Proceeding of the 30th IEEE International Conference on Data Engineering. 2014, 1012–1023
Google Scholar
Bar-Yossef Z, Gurevich M. Efficient search engine measurements. In: Proceedings of the 16th International Conference onWorld Wide Web. 2007, 401–410
Google Scholar
Bar-Yossef Z, Gurevich M. Mining search engine query logs via suggestion sampling. Proceedings of the VLDB Endowment, 2008, 1(1): 54–65
Article Google Scholar
Bar-Yossef Z, Gurevich M. Random sampling from a search engine’s index. Journal of the ACM, 2008, 55(5): 24
Article MathSciNet MATH Google Scholar
Brin S, Page L. Reprint of: the anatomy of a large-scale hypertextual Web search engine. Computer networks, 2012, 56(18): 3825–3833
Article Google Scholar
Gulli A, Signorini A. The indexable Web is more than 11.5 billion pages. In: Proceeding of the 14th International Conference on World Wide Web. 2005, 902–903
Google Scholar
Henzinger M R, Heydon A, Mitzenmacher M, Najork M. On nearuniform URL sampling. Computer Networks, 2000, 33(1): 295–308
Article Google Scholar
Rusmevichientong P, Pennock D M, Lawrence S, Giles C L. Methods for sampling pages uniformly from the World Wide Web. In: Proceeding of AAAI Fall Symposium on Using Uncertainty Within Computation. 2001, 121–128
Google Scholar
Zhang M Y, Zhang N, Das G. Mining a search engine’s corpus: efficient yet unbiased sampling and aggregate estimation. In: Proceedings of ACM SIGMOD International Conference on Management of data. 2011, 793–804
Google Scholar
Agichtein E, Ipeirotis P, Gravano L. Modeling query-based access to text databases. In: Proceeding of WebDB. 2003
Google Scholar
Barbosa L, Freire J. Siphoning hidden-Web data through keywordbased interfaces. Journal of Information and Data Management, 2010, 1(1): 133
Google Scholar
Barbosa L, Freire J. Searching for hidden-Web databases. In: Proceeding of WebDB. 2005
Google Scholar
Callan J, Connell M. Query-based sampling of text databases. ACM Transactions on Information Systems, 2001, 19(2): 97–130
Article Google Scholar
Ipeirotis P G, Gravano L. Distributed search over the hidden Web: hierarchical database sampling and selection. In: Proceedings of the 28th International Conference on Very Large Data Bases. 2002, 394–405
Chapter Google Scholar
Jin X, Zhang N, Das G. Attribute domain discovery for hidden Web databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2011, 553–564
Google Scholar
Lu J G. Ranking bias in deep Web size estimation using capture recapture method. Data & Knowledge Engineering, 2010, 69(8): 866–879
Article Google Scholar
Zerfos P, Cho J, Ntoulas A. Downloading textual hidden Web content through keyword queries. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries. 2005, 100–109
Google Scholar
ĺćlvarez M, Raposo J, Pan A, Cacheda F, Bellas F, Carneiro V. Crawling the content hidden behind Web forms. In: Proceeding of International Conference on Computational Science and Its Applications. 2007, 322–333
Google Scholar
Bruno N, Gravano L, Marian A. Evaluating top-k queries over Webaccessible databases. In: Proceedings of the 18th International Conference on Data Engineering. 2002, 369–380
Chapter Google Scholar
Chang K C C, He B, Li C K, Patel M, Zhang Z. Structured databases on the Web: observations and implications. ACM SIGMOD Record, 2004, 33(3): 61–70
Article Google Scholar
Raghavan S, Garcia-Molina H. Crawling the hidden Web. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001
Google Scholar
Sheng C, Zhang N, Tao Y F, Jin X. Optimal algorithms for crawling a hidden database in the Web. Proceedings of the VLDB Endowment, 2012, 5(11): 1112–1123
Article Google Scholar
Thirumuruganathan S, Zhang N, Das G. Digging deeper into deep Web databases by breaking through the top-k barrier. 2012, ArXiv Preprint ArXiv: 1208.3876
Google Scholar
Hedley Y L, Younas M, James A, Sanderson M. A two-phase sampling technique for information extraction from hidden Web databases. In: Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management. 2004, 1–8
Google Scholar
Hedley Y L, Younas M, James A, Sanderson M. Sampling, information extraction and summarisation of hidden Web databases. Data & Knowledge Engineering, 2006, 59(2): 213–230
Article Google Scholar
Shokouhi M, Zobel J, Scholer F, Tahaghoghi S M. Capturing collection size for distributed non-cooperative retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 316–323
Google Scholar
Dasgupta A, Zhang N, Das G. Leveraging count information in sampling hidden databases. In: Proceeding of the 25th IEEE International Conference on Data Engineering. 2009, 329–340
Google Scholar
Dasgupta A, Jin X, Jewell B, Zhang N, Das G. Unbiased estimation of size and other aggregates over hidden Web databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2010, 855–866
Google Scholar
Dasgupta A, Das G, Mannila H. A random walk approach to sampling hidden databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2007, 629–640
Google Scholar
Dasgupta A, Zhang N, Das G. Turbo-charging hidden database samplers with overflowing queries and skew reduction. In: Proceedings of the 13th International Conference on Extending Database Technology. 2010, 51–62
Chapter Google Scholar
Drummond A J, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology, 2007, 7(1): 1
Article Google Scholar
Wang F, Agrawal G. Effective and efficient sampling methods for deep Web aggregation queries. In: Proceedings of the 14th International Conference on Extending Database Technology. 2011, 425–436
Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 61170020, 61402311, 61440053), and the US National Science Foundation (IIS-1115417).

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Ziting Zhou, Pengpeng Zhao, Jiajie Xu, Zhixu Li, Jian Wu & Zhiming Cui
Department of Computer Science, University of Central Arkansas, Conway, AR, 72035, USA
Victor S. Sheng
Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, 210044, China
Victor S. Sheng

Authors

Ziting Zhou
View author publications
Search author on:PubMed Google Scholar
Pengpeng Zhao
View author publications
Search author on:PubMed Google Scholar
Victor S. Sheng
View author publications
Search author on:PubMed Google Scholar
Jiajie Xu
View author publications
Search author on:PubMed Google Scholar
Zhixu Li
View author publications
Search author on:PubMed Google Scholar
Jian Wu
View author publications
Search author on:PubMed Google Scholar
Zhiming Cui
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Pengpeng Zhao.

Additional information

Ziting Zhou is a master candidate in the Department of Computer Science and Technology at Soochow University, China. She got a bachelor’s degree in computer science from Soochow University in 2014. Her main research interests are spatial data processing, recommendation system, and data mining.

Pengpeng Zhao is an associate professor in the Department of Computer Science and Technology at Soochow University, China. He received his PhD degree in computer science from Soochow University in 2008. His main research interests are in the area of data integration, spatial data processing, data Mining, machine learning, and crowd-sourcing.

Victor S. Sheng is an assistant professor of computer science at the University of Central Arkansas, USA and the founding director of Data Analytics Lab (DAL). He received his master’s degree in computer science from the University of New Brunswick, Canada in 2003, and his PhD degree in computer science from the Western University, Ontario, Canada in 2007. His research interests include data mining, machine learning, and related applications. He was an associate research scientist and NSERC postdoctoral follow in information systems at Stern Business School, New York University, USA after he obtained his PhD. Dr. Sheng is a member of the IEEE and a lifetime member of ACM. He received the best paper award runner-up from KDD’ 08, and the best paper award from ICDM’ 11. He is a PC member for a number of international conferences and a reviewer for several international journals.

Jiajie Xu is an associate professor at the School of Computer Science and Technology, Soochow University, China. He got his PhD and master’s degrees from the Swinburne University of Technology, Australia and the University of Queensland, Australia in 2011 and 2006 respectively. Before joining Soochow University in 2013, he worked as an assistant professor in the Institute of Software, Chinese Academy of Sciences, China. His research interests mainly include spatio-temporal database systems, big data analytics, and workflow systems.

Zhixu Li is an associate professor at the School of Computer Science and Technology in Soochow University, China. He got his PhD degree in computer science from the University of Queensland, Australia in 2013, and his master and bachelor degrees from Renmin University of China in 2009 and 2006 respectively. Before joining Soochow University, he has worked at King Abdullah University of Science and Technology, Saudi Arabia as a postdoc fellow from 2013 to 2014. His research interests mainly include data cleaning, data integration, web mining, knowledge discovery text, spatial data processing, data mining, machine learning, and crowd-sourcing.

Jian Wu is an assistant professor in the Institute of Intelligent Information Processing and Application at Soochow University, China. He received the MS and PhD degrees in computer science from Soochow University in 2004 and 2012 respectively. His research interests include computer vision, image processing, and pattern recognition. He has published several articles in computer vision, data mining, image processing and pattern recognition. He is a PC member for several international conferences and a reviewer for several international journals.

Zhiming Cui is a professor in the Institute of Intelligent Information Processing and Application at Soochow University, China. He is an outstanding expert of Jiangsu Province (China). He presided four National Natural Science Foundation of China. He has published several articles in computer vision, data mining, image processing, and pattern recognition. His research interests include deep Web, computer vision, image processing, and pattern recognition.

Electronic supplementary material

Supplementary material, approximately 248 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Z., Zhao, P., Sheng, V.S. et al. Efficient sampling methods for characterizing POIs on maps based on road networks. Front. Comput. Sci. 12, 582–592 (2018). https://doi.org/10.1007/s11704-016-6146-6

Download citation

Received: 12 March 2016
Accepted: 21 July 2016
Published: 11 May 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11704-016-6146-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient sampling methods for characterizing POIs on maps based on road networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Group Trip Planning Queries on Road Networks Using Geo-Tagged Textual Information

Social media knows what road it is: quantifying road characteristics with geo-tagged posts

User Multi-behavior Enhanced POI Recommendation with Efficient and Informative Negative Sampling

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 248 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now