Skip to main content

Learnable Focused Crawling Based on Ontology

  • Conference paper
Information Retrieval Technology (AIRS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4993))

Included in the following conference series:

Abstract

Focused crawling is proposed to selectively seek out pages that are relevant to a predefined set of topics. Since an ontology is a well-formed knowledge representation, ontology-based focused crawling approaches have come into research. However, since these approaches apply manually predefined concept weights to calculate the relevance scores of web pages, it is difficult to acquire the optimal concept weights to maintain a stable harvest rate during the crawling process. To address this issue, we propose a learnable focused crawling approach based on ontology. An ANN (Artificial Neural Network) is constructed by using a domain-specific ontology and applied to the classification of web pages. Experiments have been performed, and the results show that our approach outperforms the breadth-first search crawling approach, the simple keyword-based crawling approach, and the focused crawling approach using only the domain-specific ontology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Al-Garawi, F., Yu, P.S.: Intelligent crawling on the world wide web with arbitrary predicates. In: WWW 10: Proceedings of the 10th international conference on World Wide Web, pp. 96–105. ACM Press, New York (2001)

    Chapter  Google Scholar 

  2. Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucl. Acids Res. 32(suppl. 1), D267–270 (2004)

    Google Scholar 

  3. Can, A.B., Baykal, N.: Medicoport: A medical search engine for all. Comput. Methods Prog. Biomed. 86(1), 73–86 (2007)

    Article  Google Scholar 

  4. Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks 31(11–16), 1623–1640 (1999)

    Article  Google Scholar 

  5. Diligenti, M., Coetzee, F., Lawrence, S., LeeGiles, C., Gori, M.: Focused crawling using context graphs. In: 26th International Conference on Very LargeDatabases, Cairo, Egypt, pp. 527–534 (2000)

    Google Scholar 

  6. Ehrig, M., Maedche, A.: Ontology-focused crawling of web documents. In: SAC 2003: Proceedings of the 2003 ACM symposium on Applied computing, pp. 1174–1178. ACM Press, New York (2003)

    Google Scholar 

  7. Hsu, C.-C., Wu, F.: Topic-specific crawling on the web with the measurements of the relevancy context graph. Inf. Syst. 31(4), 232–246 (2006)

    Article  Google Scholar 

  8. Maedche, A., Ehrig, M., Handschuh, S., Stojanovic, L., Volz, R.: Ontology-focused crawling of documents and relational metadata. In: Proceedings of the Eleventh International World Wide Web Conference WWW-2002, Hawaii (2002)

    Google Scholar 

  9. Mitchell, T.: Machine Learning. McGraw-Hill Science Engineering, New York (1997)

    MATH  Google Scholar 

  10. Rennie, J., McCallum, A.K.: Using reinforcement learning to spider the Web efficiently. In: Bratko, I., Dzeroski, S. (eds.) Proceedings of ICML 1999, 16th International Conference on Machine Learning, Bled, SL, pp. 335–343. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  11. Su, C., Gao, Y., Yang, J., Luo, B.: An efficient adaptive focused crawler based on ontology learning. In: Hybrid Intelligent Systems, 2005. HIS 2005. Fifth International Conference, November 6-9 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hang Li Ting Liu Wei-Ying Ma Tetsuya Sakai Kam-Fai Wong Guodong Zhou

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zheng, HT., Kang, BY., Kim, HG. (2008). Learnable Focused Crawling Based on Ontology. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68636-1_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68633-0

  • Online ISBN: 978-3-540-68636-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics