skip to main content
10.1145/1062745.1062933acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Focused crawling by exploiting anchor text using decision tree

Published: 10 May 2005 Publication History

Abstract

Focused crawlers are considered as a promising way to tackle the scalability problem of topic-oriented or personalized search engines. To design a focused crawler, the choice of strategy for prioritizing unvisited URLs is crucial. In this paper, we propose a method using a decision tree on anchor texts of hyperlinks. We conducted experiments on the real data sets of four Japanese universities and verified our approach.

References

[1]
Soumen Chakrabarti, Martin van den Berg, and Byron Dom. Focused crawling: a new approach to topic-specific Web resource discovery. In The Eighth International World Wide Web Conference, Toronto, Canada, May 1999.
[2]
M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused Crawling Using Context Graphs. In Proc. of the 26th VLDB Conf., pp. 527--534, 2000.
[3]
J. R. Quinlan. Induction of decision trees. Machine Learning, Vol. 1, No. 1, pp. 81--106, 1986.

Cited By

View all
  • (2024)Weakly supervised learning for an effective focused web crawlerEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107944132:COnline publication date: 18-Jul-2024
  • (2022) A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix Expert Systems10.1111/exsy.1299340:4Online publication date: 28-Mar-2022
  • (2022)Amelioration of linguistic semantic classifier with sentiment classifier manacle for the focused web crawlerInternational Journal of Information Technology10.1007/s41870-022-01139-w15:2(1137-1149)Online publication date: 27-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web
May 2005
454 pages
ISBN:1595930515
DOI:10.1145/1062745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anchor text
  2. decision tree learning
  3. focused crawling
  4. shortest path

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Weakly supervised learning for an effective focused web crawlerEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107944132:COnline publication date: 18-Jul-2024
  • (2022) A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix Expert Systems10.1111/exsy.1299340:4Online publication date: 28-Mar-2022
  • (2022)Amelioration of linguistic semantic classifier with sentiment classifier manacle for the focused web crawlerInternational Journal of Information Technology10.1007/s41870-022-01139-w15:2(1137-1149)Online publication date: 27-Dec-2022
  • (2022)An efficient focused crawler using LSTM-CNN based deep learningInternational Journal of System Assurance Engineering and Management10.1007/s13198-022-01808-w14:1(391-407)Online publication date: 19-Dec-2022
  • (2019)An Improved Focused Web Crawler based on Hybrid SimilarityInternational Journal of Performability Engineering10.23940/ijpe.19.10.p10.2645265615:10(2645)Online publication date: 2019
  • (2019)Multilingual Focused Crawler System based on Web Content Extraction and Path ConfigurationIOP Conference Series: Materials Science and Engineering10.1088/1757-899X/569/5/052030569(052030)Online publication date: 9-Aug-2019
  • (2017)Ontology-based approach for unsupervised and adaptive focused crawlingProceedings of The International Workshop on Semantic Big Data10.1145/3066911.3066912(1-6)Online publication date: 19-May-2017
  • (2017)Predictive and evolutive cross-referencing for web textual sources2017 Computing Conference10.1109/SAI.2017.8252230(1114-1122)Online publication date: Jul-2017
  • (2017)Efficient Topical Focused Crawling Through Neighborhood FeatureNew Generation Computing10.1007/s00354-017-0029-836:2(95-118)Online publication date: 15-Dec-2017
  • (2017)A survey of Web crawlers for information retrievalWIREs Data Mining and Knowledge Discovery10.1002/widm.12187:6Online publication date: 7-Aug-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media