Skip to main content

Extensible Web Crawler – Towards Multimedia Material Analysis

  • Conference paper
Book cover Multimedia Communications, Services and Security (MCSS 2011)

Abstract

Methods of Web pages content monitoring come increasingly in the interest of law enforcement services, searching for Web pages contain symptoms of criminal activities. The information can be hidden from indexing systems by embedding in multimedia materials. Finding such materials is a large challenge of contemporary criminal analysis. A concept of integrating a large scale Web crawling system with a multimedia materials analysis algorithms is described in this paper. The Web crawling system, which is processing a few hundred pages per second, provides a mechanism for plugin inclusion. A plugin can analyze processed resources and detect references to multimedia materials. The references are passed to a component, which implements an algorithm for image or video analysis. Several approaches to the integration are described and some exemplary implementation assumptions are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Opalinski, A., Turek, W.: Information retrieval and identity analysis. In: Metody Sztucznej Inteligencji w Dzialaniach na Rzecz Bezpieczenstwa Publicznego, pp. 173–194 (2009); ISBN: 978-83-7464-268-2

    Google Scholar 

  2. Miller, R.C., Bharat, K.: SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers. In: Proceedings of WWW 2007, Brisbane Australia (1998)

    Google Scholar 

  3. Shoberg, J.: Building Search Applications with Lucine and Nutch. APress (2006); ISBN: 978-1590596876

    Google Scholar 

  4. Sigursson, K.: Incremental crawling with Heritrix. In: Proceedings of the 5th International Web Archiving Workshop (2005)

    Google Scholar 

  5. Marrs, T., Davis, S.: JBoss At Work: A Practical Guide. O’Reilly, Sebastopol (2005); ISBN: 0596007345

    Google Scholar 

  6. Alpert, J., Hajaj, N.: We knew the web was big... The Official Google Blog (2008)

    Google Scholar 

  7. Korus, P., Glowacz, A.: A system for automatic face indexing. Przeglad Telekomunikacyjny, Wiadomosci Telekomunikacyjne 81(8-9), 1304–1312 (2008); ISSN 1230-3496

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Turek, W., Opalinski, A., Kisiel-Dorohinicki, M. (2011). Extensible Web Crawler – Towards Multimedia Material Analysis. In: Dziech, A., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2011. Communications in Computer and Information Science, vol 149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21512-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21512-4_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21511-7

  • Online ISBN: 978-3-642-21512-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics