Multi-lingual Detection of Web Terrorist Content

Last, Mark; Markov, Alex; Kandel, Abraham

doi:10.1007/978-3-540-69209-6_5

Multi-lingual Detection of Web Terrorist Content

Mark Last¹,
Alex Markov¹ &
Abraham Kandel²

Chapter

1067 Accesses
6 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 135))

Abstract

The role of the Internet in the infrastructure of the global terrorist organizations is increasing dramatically. Beyond propaganda, the WWW is being heavily used for practical training, fundraising, communication, and other purposes. Terrorism experts are interested in identifying who is behind the material posted on terrorist web sites and online forums and what links they have to active terror groups. The current number of known terrorist sites is so large and their URL addresses are so volatile that a continuous manual monitoring of their multilingual content is definitely out of question. Moreover, terrorist web sites and forums often try to conceal their real identity. This is why automated multi-lingual detection methods are so important in the cyber war against the international terror. In this chapter, we describe a classification-based approach to multi-lingual detection and categorization of terrorist documents. The proposed approach builds upon the recently developed graph-based web document representation model combined with the popular C4.5 decision-tree classification algorithm. Two case studies are performed on collections of web documents in Arabic and English languages respectively. The first case study demonstrates that documents downloaded from several known terrorist sites in Arabic can be reliably discriminated from the content of Arabic news reports using a compact set of filtering rules. In the second study, we induce an accurate classification model that can distinguish between the English content posted by two different Middle-Eastern terrorist organizations (Hamas in the Palestinian Authority and Hezbollah in Lebanon).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbasi, A., Chen, H.: Applying Authorship Analysis to Arabic Web Content. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 183–197. Springer, Heidelberg (2005)
Google Scholar
Ahmed, A., James, C., David, F., William, O.: UCLIR: A Multilingual Information Retrieval tool. In: Proceedings of the Workshop on Multilingual Information Access and Natural Language Processing, pp. 89–96 (2002)
Google Scholar
Aljlayl, M., Frieder, O.: Effective Arabic-English Cross-Language Information Retrieval via Machine-Readable Dictionaries and Machine Translation. In: Tenth International Conference on Information and Knowledge Management (2001)
Google Scholar
Corera, G.: Web Wise Terror Network. BBC NEWS: 2004/10/06 (2004), http://news.bbc.co.uk/go/pr/fr/-/1/hi/world/3716908.stm
Corriere della Sera, September 24 (2004)
Google Scholar
Debat, A.: Al Qaeda’s Web of Terror. ABC News, March 10 (2006)
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Harding, B.: 29 charged in Madrid train bombings, New York Times (April 11, 2006)
Google Scholar
Kuramochi, M., Karypis, G.: An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Transactions on Knowledge and Data Engineering 16(9), 1038–1051 (2004)
Article Google Scholar
Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving Stemming for Arabic Infor-mation Retrieval: Light Stemming and Co-occurrence Analysis. In: Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002. ACM Press, New York (2002)
Google Scholar
Last, M., Markov, A., Kandel, A.: Multi-Lingual Detection of Terrorist Content on the Web. In: Chen, H., Wang, F.-Y., Yang, C.C., Zeng, D., Chau, M., Chang, K. (eds.) WISI 2006. LNCS, vol. 3917, pp. 16–30. Springer, Heidelberg (2006)
Chapter Google Scholar
Lipton, E., Lichtblau, E.: Even Near Home, a New Front Is Opening in the Terror Battle. New York Times (September 23, 2004)
Google Scholar
Lyall, S.: London Bombers Tied to Internet, Not Al Qaeda, Newspaper Says. New York Times (April 11, 2006)
Google Scholar
Markov, A., Last, M.: A Simple, Structure-Sensitive Approach for Web Document Classification. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 293–298. Springer, Heidelberg (2005)
Google Scholar
Markov, A., Last, M., Kandel, A.: Model-Based Classification of Web Documents Represented by Graphs. In: Proceedings of Web KDD 2006 Workshop on Knowledge Discovery on the Web at KDD 2006, pp. 31–38 (2006)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, Boston (1997)
MATH Google Scholar
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Reid, E., Qin, J., Zhou, Y., Lai, G., Sageman, M., Weimann, G., Chen, H.: Collecting and Analyzing the Presence of Terrorists on the Web: A Case Study of Jihad Websites. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 402–411. Springer, Heidelberg (2005)
Google Scholar
Ripplinger, B.: The Use of NLP Techniques in CLIR. In: The Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation (2000)
Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1971)
Article Google Scholar
Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for Web Content Mining. Series in Machine Perception and Artificial Intelligence, vol. 62. World Scientific, Singapore (2005)
MATH Google Scholar
Schenker, A., Last, M., Bunke, H., Kandel, A.: Classification of Web Documents Us-ing Graph Matching. International Journal of Pattern Recognition and Artificial Intel-ligence, Special Issue on Graph Matching in Computer Vision and Pattern Recognition 18(3), 475–496 (2004)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (1999/2002)
Article Google Scholar
Talbot, D.: Terror’s Server. Technology Review (2005), http://www.technologyreview.com/articles/05/02/issue/feature_terror.asp
Yan, X., Han, J.: GSpan: Graph-Based Substructure Pattern Mining. In: IEEE International Conference on Data Mining (ICDM 2002) (2002)
Google Scholar
Yang, Y., Slattery, S., Ghani, R.: A Study of Approaches to Hypertext Categorization. Journal of Intelligent Information Systems 18(2-3), 219–241 (2002)
Article Google Scholar
Zhou, Y., Reid, E., Qin, J., Chen, H., Lai, G.: US Domestic Extremist Groups on the Web: Link and Content Analysis. IEEE Intelligent Systems, special issue on AI for Homeland Security 20(5), 44–51 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems Engineering, Ben-Gurion University of the Negev, Israel
Mark Last & Alex Markov
Department of Computer Science, Engineering, University of South Florida, USA
Abraham Kandel

Authors

Mark Last
View author publications
You can also search for this author in PubMed Google Scholar
Alex Markov
View author publications
You can also search for this author in PubMed Google Scholar
Abraham Kandel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hsinchun Chen Christopher C. Yang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Last, M., Markov, A., Kandel, A. (2008). Multi-lingual Detection of Web Terrorist Content. In: Chen, H., Yang, C.C. (eds) Intelligence and Security Informatics. Studies in Computational Intelligence, vol 135. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69209-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-69209-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69207-2
Online ISBN: 978-3-540-69209-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics