Abstract
Focused crawling is proposed to selectively seek out pages that are relevant to a predefined set of topics. Since an ontology is a well-formed knowledge representation, ontology-based focused crawling approaches have come into research. However, since these approaches apply manually predefined concept weights to calculate the relevance scores of web pages, it is difficult to acquire the optimal concept weights to maintain a stable harvest rate during the crawling process. To address this issue, we propose a learnable focused crawling approach based on ontology. An ANN (Artificial Neural Network) is constructed by using a domain-specific ontology and applied to the classification of web pages. Experiments have been performed, and the results show that our approach outperforms the breadth-first search crawling approach, the simple keyword-based crawling approach, and the focused crawling approach using only the domain-specific ontology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Al-Garawi, F., Yu, P.S.: Intelligent crawling on the world wide web with arbitrary predicates. In: WWW 10: Proceedings of the 10th international conference on World Wide Web, pp. 96–105. ACM Press, New York (2001)
Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucl. Acids Res. 32(suppl. 1), D267–270 (2004)
Can, A.B., Baykal, N.: Medicoport: A medical search engine for all. Comput. Methods Prog. Biomed. 86(1), 73–86 (2007)
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks 31(11–16), 1623–1640 (1999)
Diligenti, M., Coetzee, F., Lawrence, S., LeeGiles, C., Gori, M.: Focused crawling using context graphs. In: 26th International Conference on Very LargeDatabases, Cairo, Egypt, pp. 527–534 (2000)
Ehrig, M., Maedche, A.: Ontology-focused crawling of web documents. In: SAC 2003: Proceedings of the 2003 ACM symposium on Applied computing, pp. 1174–1178. ACM Press, New York (2003)
Hsu, C.-C., Wu, F.: Topic-specific crawling on the web with the measurements of the relevancy context graph. Inf. Syst. 31(4), 232–246 (2006)
Maedche, A., Ehrig, M., Handschuh, S., Stojanovic, L., Volz, R.: Ontology-focused crawling of documents and relational metadata. In: Proceedings of the Eleventh International World Wide Web Conference WWW-2002, Hawaii (2002)
Mitchell, T.: Machine Learning. McGraw-Hill Science Engineering, New York (1997)
Rennie, J., McCallum, A.K.: Using reinforcement learning to spider the Web efficiently. In: Bratko, I., Dzeroski, S. (eds.) Proceedings of ICML 1999, 16th International Conference on Machine Learning, Bled, SL, pp. 335–343. Morgan Kaufmann, San Francisco (1999)
Su, C., Gao, Y., Yang, J., Luo, B.: An efficient adaptive focused crawler based on ontology learning. In: Hybrid Intelligent Systems, 2005. HIS 2005. Fifth International Conference, November 6-9 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zheng, HT., Kang, BY., Kim, HG. (2008). Learnable Focused Crawling Based on Ontology. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-68636-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68633-0
Online ISBN: 978-3-540-68636-1
eBook Packages: Computer ScienceComputer Science (R0)