Skip to main content

Multi-lingual Detection of Web Terrorist Content

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 135))

Abstract

The role of the Internet in the infrastructure of the global terrorist organizations is increasing dramatically. Beyond propaganda, the WWW is being heavily used for practical training, fundraising, communication, and other purposes. Terrorism experts are interested in identifying who is behind the material posted on terrorist web sites and online forums and what links they have to active terror groups. The current number of known terrorist sites is so large and their URL addresses are so volatile that a continuous manual monitoring of their multilingual content is definitely out of question. Moreover, terrorist web sites and forums often try to conceal their real identity. This is why automated multi-lingual detection methods are so important in the cyber war against the international terror. In this chapter, we describe a classification-based approach to multi-lingual detection and categorization of terrorist documents. The proposed approach builds upon the recently developed graph-based web document representation model combined with the popular C4.5 decision-tree classification algorithm. Two case studies are performed on collections of web documents in Arabic and English languages respectively. The first case study demonstrates that documents downloaded from several known terrorist sites in Arabic can be reliably discriminated from the content of Arabic news reports using a compact set of filtering rules. In the second study, we induce an accurate classification model that can distinguish between the English content posted by two different Middle-Eastern terrorist organizations (Hamas in the Palestinian Authority and Hezbollah in Lebanon).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbasi, A., Chen, H.: Applying Authorship Analysis to Arabic Web Content. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 183–197. Springer, Heidelberg (2005)

    Google Scholar 

  2. Ahmed, A., James, C., David, F., William, O.: UCLIR: A Multilingual Information Retrieval tool. In: Proceedings of the Workshop on Multilingual Information Access and Natural Language Processing, pp. 89–96 (2002)

    Google Scholar 

  3. Aljlayl, M., Frieder, O.: Effective Arabic-English Cross-Language Information Retrieval via Machine-Readable Dictionaries and Machine Translation. In: Tenth International Conference on Information and Knowledge Management (2001)

    Google Scholar 

  4. Corera, G.: Web Wise Terror Network. BBC NEWS: 2004/10/06 (2004), http://news.bbc.co.uk/go/pr/fr/-/1/hi/world/3716908.stm

  5. Corriere della Sera, September 24 (2004)

    Google Scholar 

  6. Debat, A.: Al Qaeda’s Web of Terror. ABC News, March 10 (2006)

    Google Scholar 

  7. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  8. Harding, B.: 29 charged in Madrid train bombings, New York Times (April 11, 2006)

    Google Scholar 

  9. Kuramochi, M., Karypis, G.: An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Transactions on Knowledge and Data Engineering 16(9), 1038–1051 (2004)

    Article  Google Scholar 

  10. Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving Stemming for Arabic Infor-mation Retrieval: Light Stemming and Co-occurrence Analysis. In: Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002. ACM Press, New York (2002)

    Google Scholar 

  11. Last, M., Markov, A., Kandel, A.: Multi-Lingual Detection of Terrorist Content on the Web. In: Chen, H., Wang, F.-Y., Yang, C.C., Zeng, D., Chau, M., Chang, K. (eds.) WISI 2006. LNCS, vol. 3917, pp. 16–30. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Lipton, E., Lichtblau, E.: Even Near Home, a New Front Is Opening in the Terror Battle. New York Times (September 23, 2004)

    Google Scholar 

  13. Lyall, S.: London Bombers Tied to Internet, Not Al Qaeda, Newspaper Says. New York Times (April 11, 2006)

    Google Scholar 

  14. Markov, A., Last, M.: A Simple, Structure-Sensitive Approach for Web Document Classification. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 293–298. Springer, Heidelberg (2005)

    Google Scholar 

  15. Markov, A., Last, M., Kandel, A.: Model-Based Classification of Web Documents Represented by Graphs. In: Proceedings of Web KDD 2006 Workshop on Knowledge Discovery on the Web at KDD 2006, pp. 31–38 (2006)

    Google Scholar 

  16. Mitchell, T.M.: Machine Learning. McGraw-Hill, Boston (1997)

    MATH  Google Scholar 

  17. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  18. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  19. Reid, E., Qin, J., Zhou, Y., Lai, G., Sageman, M., Weimann, G., Chen, H.: Collecting and Analyzing the Presence of Terrorists on the Web: A Case Study of Jihad Websites. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 402–411. Springer, Heidelberg (2005)

    Google Scholar 

  20. Ripplinger, B.: The Use of NLP Techniques in CLIR. In: The Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation (2000)

    Google Scholar 

  21. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  22. Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1971)

    Article  Google Scholar 

  23. Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for Web Content Mining. Series in Machine Perception and Artificial Intelligence, vol. 62. World Scientific, Singapore (2005)

    MATH  Google Scholar 

  24. Schenker, A., Last, M., Bunke, H., Kandel, A.: Classification of Web Documents Us-ing Graph Matching. International Journal of Pattern Recognition and Artificial Intel-ligence, Special Issue on Graph Matching in Computer Vision and Pattern Recognition 18(3), 475–496 (2004)

    Google Scholar 

  25. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (1999/2002)

    Article  Google Scholar 

  26. Talbot, D.: Terror’s Server. Technology Review (2005), http://www.technologyreview.com/articles/05/02/issue/feature_terror.asp

  27. Yan, X., Han, J.: GSpan: Graph-Based Substructure Pattern Mining. In: IEEE International Conference on Data Mining (ICDM 2002) (2002)

    Google Scholar 

  28. Yang, Y., Slattery, S., Ghani, R.: A Study of Approaches to Hypertext Categorization. Journal of Intelligent Information Systems 18(2-3), 219–241 (2002)

    Article  Google Scholar 

  29. Zhou, Y., Reid, E., Qin, J., Chen, H., Lai, G.: US Domestic Extremist Groups on the Web: Link and Content Analysis. IEEE Intelligent Systems, special issue on AI for Homeland Security 20(5), 44–51 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hsinchun Chen Christopher C. Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Last, M., Markov, A., Kandel, A. (2008). Multi-lingual Detection of Web Terrorist Content. In: Chen, H., Yang, C.C. (eds) Intelligence and Security Informatics. Studies in Computational Intelligence, vol 135. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69209-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69209-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69207-2

  • Online ISBN: 978-3-540-69209-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics