Skip to main content

A Novel Shark-Search Algorithm for Theme Crawler

  • Conference paper
Web Information Systems and Mining (WISM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7529))

Included in the following conference series:

Abstract

The shark-search algorithm is a classical content-based theme crawling algorithm. However, it has some disadvantages on crawling scope, including the viscousness phenomenon. To avoid this shortcoming of the original shark-search algorithm, an improved shark-search algorithm combining URL-analysis algorithm and host-control strategy is proposed in this paper. The accessed frequency of a host is considered in this new algorithm. The experimental results show that the proposed algorithm can overcome shortages of the original shark-search algorithm and improve the efficiency of a theme crawler.

This research is supported in parts by Youth Fund Project of Humanities and Social Sciences Research from the Chinese Ministry of Education(No.12YJCZH201) and National Natural Science Fund (No.61103101).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Panidis, A., Poulos, G.K.C., Pitas, I.: Combining Text and Link Analysis for Focused Crawling-an Application for Vertical Search Engines. Information System 32(6), 886–908 (2007)

    Article  Google Scholar 

  2. Menczer, F., Pant, G., Srinivasan, P.: Topical web crawlers: evaluating adaptive algorithms. ACM Transactions on Internet Technology 4(4), 378–419 (2004)

    Article  Google Scholar 

  3. Herseovici, M., Jacov, M., Maarek, Y.S.: The Shark-Search Algorithm-An Application: Tailored Web Site Mapping. Computer Networks and ISDN Systems 30, 317–326 (1998)

    Article  Google Scholar 

  4. Ouyang, L.-B., Li, X.-Y., Li, G.-H., et al.: A survey of web spiders searching strategies of topic-specific search engine. Computer Engineering 30(13), 32–46 (2004)

    Google Scholar 

  5. Bra, D.P., Post, R.: Searching for arbitrary information in the WWW: the fish-search for mosaic. In: Second WWW Conference, pp. 45–51. ACM Press, Chicago (1994)

    Google Scholar 

  6. Page, L., Brin, S., Motwani, R.: The PageRank Citation Ranking: Bring Order to the Web. Stanford University (1998)

    Google Scholar 

  7. Kleinberg, J.: Authoritative Sources in A Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Liu, Y.-F.: Focus crawler researching in search engine. SUN Yat-Sen University, Guangzhou (2005)

    Google Scholar 

  9. Liu, P., Lin, H., Gao, D.-W.: Research on crawling strategy of subject searching spider by content-based and hyperlink-based analysis. Computer & Digital Engineering, 22–24 (January 2009)

    Google Scholar 

  10. Chen, Y.-F., Zhao, H.-K., Yu, X.-Q., Wan, W.-G.: Improvement of focused crawling strategy based on genetic algorithm. Computer Simulation 27(17), 87–90 (2010)

    Google Scholar 

  11. Liu, S.-M., Xia, L., Xu, N.-S.: Search strategy and achieve of the topic search engine crawler. Computer System & Applications 19(3), 49–52 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luo, L., Wang, Rb., Huang, Xx., Chen, Zq. (2012). A Novel Shark-Search Algorithm for Theme Crawler. In: Wang, F.L., Lei, J., Gong, Z., Luo, X. (eds) Web Information Systems and Mining. WISM 2012. Lecture Notes in Computer Science, vol 7529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33469-6_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33469-6_75

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33468-9

  • Online ISBN: 978-3-642-33469-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics