Extensible Web Crawler – Towards Multimedia Material Analysis

Turek, Wojciech; Opalinski, Andrzej; Kisiel-Dorohinicki, Marek

doi:10.1007/978-3-642-21512-4_22

Wojciech Turek³,
Andrzej Opalinski³ &
Marek Kisiel-Dorohinicki³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 149))

Included in the following conference series:

International Conference on Multimedia Communications, Services and Security

941 Accesses
8 Citations

Abstract

Methods of Web pages content monitoring come increasingly in the interest of law enforcement services, searching for Web pages contain symptoms of criminal activities. The information can be hidden from indexing systems by embedding in multimedia materials. Finding such materials is a large challenge of contemporary criminal analysis. A concept of integrating a large scale Web crawling system with a multimedia materials analysis algorithms is described in this paper. The Web crawling system, which is processing a few hundred pages per second, provides a mechanism for plugin inclusion. A plugin can analyze processed resources and detect references to multimedia materials. The references are passed to a component, which implements an algorithm for image or video analysis. Several approaches to the integration are described and some exemplary implementation assumptions are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Opalinski, A., Turek, W.: Information retrieval and identity analysis. In: Metody Sztucznej Inteligencji w Dzialaniach na Rzecz Bezpieczenstwa Publicznego, pp. 173–194 (2009); ISBN: 978-83-7464-268-2
Google Scholar
Miller, R.C., Bharat, K.: SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers. In: Proceedings of WWW 2007, Brisbane Australia (1998)
Google Scholar
Shoberg, J.: Building Search Applications with Lucine and Nutch. APress (2006); ISBN: 978-1590596876
Google Scholar
Sigursson, K.: Incremental crawling with Heritrix. In: Proceedings of the 5th International Web Archiving Workshop (2005)
Google Scholar
Marrs, T., Davis, S.: JBoss At Work: A Practical Guide. O’Reilly, Sebastopol (2005); ISBN: 0596007345
Google Scholar
Alpert, J., Hajaj, N.: We knew the web was big... The Official Google Blog (2008)
Google Scholar
Korus, P., Glowacz, A.: A system for automatic face indexing. Przeglad Telekomunikacyjny, Wiadomosci Telekomunikacyjne 81(8-9), 1304–1312 (2008); ISSN 1230-3496
Google Scholar

Download references

Author information

Authors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Wojciech Turek, Andrzej Opalinski & Marek Kisiel-Dorohinicki

Authors

Wojciech Turek
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Opalinski
View author publications
You can also search for this author in PubMed Google Scholar
Marek Kisiel-Dorohinicki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Telecommunications, AGH University of Science and Technology, al. Mickiewicza 30, 30-059, Krakow, Poland
Andrzej Dziech
Multimedia Systems Department, Gdansk University of Technology, Narutowicza 11/22, 80-233, Gdansk, Poland
Andrzej Czyżewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Turek, W., Opalinski, A., Kisiel-Dorohinicki, M. (2011). Extensible Web Crawler – Towards Multimedia Material Analysis. In: Dziech, A., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2011. Communications in Computer and Information Science, vol 149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21512-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-21512-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21511-7
Online ISBN: 978-3-642-21512-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics