Skip to main content

Empirical Study of POI-Oriented Focused Crawler

  • Conference paper
  • First Online:
Semantic Web and Web Science

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

  • 1778 Accesses

Abstract

Focused crawler is the core of the focused search engine, and the POI-oriented user need is a kind of new focused object which has not been well solved in previous studies. In this paper, we design and realize a POI-oriented focused crawler. The proposed focused crawler adopts classifiers to make relevant judgment and considers both current page’s relevance and the URL link information to make the URLs’ priority judgment. Experiments were conducted with two kinds of classification algorithms of Naive Bayes (NB) and Support Vector Machines (SVMs) on four sites, respectively. Experimental results show that the focused crawler with NB classifier obtains the average harvest of 95.97%, higher than the one with SVMs by 45.53%, but the focused crawler with SVMs attains the higher recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hazman, M.: A survey of focused crawler approaches. J. Global Res. Comput. Sci. 3(4), 68–72 (2012)

    Google Scholar 

  2. Zhou, L., Lin, L.: Survey on the research of focused crawling technique. J. Comput. Appl. 25(9), 1965–1969 (2005)

    Google Scholar 

  3. Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. In: Proceedings of the 8th International World Wide Web Conference, pp. 1623–1640. Elsevier Science, New York (1999)

    Google Scholar 

  4. Balaji, S., Sarumathi, S.: TOPCRAWL—community mining in web search engines with emphasize on topical crawling. In: Proceedings of the International Conference on Pattern Recognition, Informatics and Medical Engineering, Salem, Tamilnadu, 2012, pp. 20–24

    Google Scholar 

  5. Chen, H., Chung, Y.M., Marshall, R., Yang, C.C.: An intelligent personal spider(agent)for dynamic Internet searching. Decision Support Syst. 23(1), 41–58 (1998)

    Article  Google Scholar 

  6. Liu, G., Kang, L., Luo, C.: Focused crawling strategy based on genetic algorithm. J. Comput. Appl. 27(12), 172–174 (2007)

    Google Scholar 

  7. Chen, Y., Zhang, Z., Zhang, T.: A searching strategy in topic crawler using ant colony algorithm. Microcomput. Appl. 30(1), 53–56 (2011)

    Google Scholar 

  8. Zheng, S.: Genetic and ant algorithms based focused crawler design. In: 2011 Second International Conference on Innovations in Bio-inspired Computing and Applications, Kaohsiung, Taiwan 2011

    Google Scholar 

  9. Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Washington, 1998, pp. 307–318

    Google Scholar 

  10. Johnson, J., Tsioutsiouliklis, K., Giles, C.L.: Evolving strategies for focused web crawling. In: Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, 2003

    Google Scholar 

  11. Pant, G., Srinivasan, P.: Link contexts in classifier-guided topical crawlers. IEEE Trans. Knowl. Data Mining (2006)

    Google Scholar 

  12. Yuvarani, M., Iyengar, N.C.S.N., Kannan, A.: LSCrawler: a framework for an enhanced focused Web crawler based on link semantics. In: The 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI’06), Hong Kong, 2006

    Google Scholar 

  13. Jalilian, O., Khotanlou, H.: A new fuzzy-based method to weigh the related concepts in semantic focused web crawlers. In: 2011 3rd International Conference on Computer Research and Development (ICCRD), Shanghai, 2011, pp. 23–27

    Google Scholar 

  14. Peng, H., Wang, Y.: Real-time page classification oriented algorithm on topic extraction. Comput. Modern. 8–11 (2008)

    Google Scholar 

  15. Taylan, D., Poyraz, M., Akyokuş, S., Ganiz, M.C.: Intelligent focused crawler: learning which links to crawl. In: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Istanbul, 2011, pp. 504–508

    Google Scholar 

  16. Yuan, F.-y., Yin, C.-x., Liu, J.: Improvement of PageRank for focused crawler. In: SNPD 2007: 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2007

    Google Scholar 

  17. Zhang, X., Li, Z., Hu, C.: Adaptive focused crawler based on tunneling and link analysis. In: 11th International Conference on Advanced Communication Technology, Gangwon-Do, 2009, pp. 2225–2230

    Google Scholar 

  18. Batsakis, S., Petrakis, E., Milios, E.: Improving the performance of focused web crawlers. Data Knowl. Eng. 68(10), 1001–1013 (2009)

    Article  Google Scholar 

  19. Liu, P., Feng, J.: An improved Naive Bayes text categorization algorithm. Microcomput. Inform. 26(93), 187–188 (2010)

    MATH  Google Scholar 

  20. Tan, S.: Research on High-Performance Text Categorization. Institute of Computing Technology, Chinese Academy of Sciences, Beijing (2006)

    Google Scholar 

Download references

Acknowledgments

This research is supported by project 61073119 under the National Natural Science Foundation of China and project BK2010547 under the Jiangsu Natural Science Foundation of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Fan, X., Zhou, Js., Cheng, Cy., Zhou, Yc., Yin, D. (2013). Empirical Study of POI-Oriented Focused Crawler. In: Li, J., Qi, G., Zhao, D., Nejdl, W., Zheng, HT. (eds) Semantic Web and Web Science. Springer Proceedings in Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6880-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6880-6_25

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6879-0

  • Online ISBN: 978-1-4614-6880-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics