Skip to main content

A Selection Algorithm for Focused Crawlers Incorporating Semantic Metadata

  • Conference paper
Book cover Distributed Computing and Internet Technology (ICDCIT 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7753))

  • 2376 Accesses

Abstract

The search results offered currently by majority of search portals are horizontal by nature. This denotes that these search engines intend to index as much web pages as possible and present search results based on these web pages. These results often offer generalized results. Focused Crawlers were built to download web pages relevant only to a pre-specified topic. Searching on these kinds of pages is called as Vertical Search, as it attempts to drill down on a single topic, rather than exploring a plethora of other pages on web which are related to search query in one way or another. In this paper, we propose an algorithm which helps a focused crawler decide whether a web page should be downloaded on not. The selection algorithm proposed in this paper makes use of semantic properties of the content to arrive at a decision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Johnson, J., Tsioutsiouliklis, K., Giles, C.L.: Evolving Strategies for Focused Web Crawling. In: Machine Learning–International Conference, vol. 20, Part 1, pp. 298–305 (2003)

    Google Scholar 

  2. Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks 31(11-16), 1623–1640 (1999)

    Article  Google Scholar 

  3. Ehrig, M., Maedche, A.: Ontology-focused crawling of Web documents. In: Proceedings of the 2003 ACM Symposium on Applied Computing (SAC 2003), pp. 1174–1178. ACM, New York (2003)

    Chapter  Google Scholar 

  4. de Assis, G.T., Laender, A.H.F., Gonçalves, M.A., da Silva, A.S.: Exploiting Genre in Focused Crawling. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 62–73. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Almpanidis, G., Kotropoulos, C., Pitas, I.: Focused Crawling Using Latent Semantic Indexing - An Application for Vertical Search Engines. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 402–413. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Kozanidis, L.: An Ontology-Based Focused Crawler. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 376–379. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Wadwekar, S., Mukhopadhyay, D.: A Ranking Algorithm integrating Vector Space Model with Semantic Metadata. In: CUBE 2012 Proceedings, Pune, India, September 3-5, pp. 623–628. ACM Digital Library, USA (2012)

    Google Scholar 

  8. Bergmark, D., Lagoze, C., Sbityakov, A.: Focused Crawls, Tunneling, and Digital Libraries. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 91–106. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Ding, L., Finin, T., Joshi, A., Pan, R., Scott Cost, R., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management (CIKM 2004), pp. 652–659. ACM, New York (2004)

    Chapter  Google Scholar 

  10. Flouris, G., Plexousakis, D., Antoniou, G.: Evolving Ontology Evolution. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 14–29. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wadwekar, S., Mukhopadhyay, D. (2013). A Selection Algorithm for Focused Crawlers Incorporating Semantic Metadata. In: Hota, C., Srimani, P.K. (eds) Distributed Computing and Internet Technology. ICDCIT 2013. Lecture Notes in Computer Science, vol 7753. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36071-8_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36071-8_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36070-1

  • Online ISBN: 978-3-642-36071-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics