Abstract
The role of the Internet in the infrastructure of the global terrorist organizations is increasing dramatically. Beyond propaganda, the WWW is being heavily used for practical training, fundraising, communication, and other purposes. Terrorism experts are interested in identifying who is behind the material posted on terrorist web sites and online forums and what links they have to active terror groups. The current number of known terrorist sites is so large and their URL addresses are so volatile that a continuous manual monitoring of their multilingual content is definitely out of question. Moreover, terrorist web sites and forums often try to conceal their real identity. This is why automated multi-lingual detection methods are so important in the cyber war against the international terror. In this chapter, we describe a classification-based approach to multi-lingual detection and categorization of terrorist documents. The proposed approach builds upon the recently developed graph-based web document representation model combined with the popular C4.5 decision-tree classification algorithm. Two case studies are performed on collections of web documents in Arabic and English languages respectively. The first case study demonstrates that documents downloaded from several known terrorist sites in Arabic can be reliably discriminated from the content of Arabic news reports using a compact set of filtering rules. In the second study, we induce an accurate classification model that can distinguish between the English content posted by two different Middle-Eastern terrorist organizations (Hamas in the Palestinian Authority and Hezbollah in Lebanon).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abbasi, A., Chen, H.: Applying Authorship Analysis to Arabic Web Content. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 183–197. Springer, Heidelberg (2005)
Ahmed, A., James, C., David, F., William, O.: UCLIR: A Multilingual Information Retrieval tool. In: Proceedings of the Workshop on Multilingual Information Access and Natural Language Processing, pp. 89–96 (2002)
Aljlayl, M., Frieder, O.: Effective Arabic-English Cross-Language Information Retrieval via Machine-Readable Dictionaries and Machine Translation. In: Tenth International Conference on Information and Knowledge Management (2001)
Corera, G.: Web Wise Terror Network. BBC NEWS: 2004/10/06 (2004), http://news.bbc.co.uk/go/pr/fr/-/1/hi/world/3716908.stm
Corriere della Sera, September 24 (2004)
Debat, A.: Al Qaeda’s Web of Terror. ABC News, March 10 (2006)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Harding, B.: 29 charged in Madrid train bombings, New York Times (April 11, 2006)
Kuramochi, M., Karypis, G.: An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Transactions on Knowledge and Data Engineering 16(9), 1038–1051 (2004)
Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving Stemming for Arabic Infor-mation Retrieval: Light Stemming and Co-occurrence Analysis. In: Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002. ACM Press, New York (2002)
Last, M., Markov, A., Kandel, A.: Multi-Lingual Detection of Terrorist Content on the Web. In: Chen, H., Wang, F.-Y., Yang, C.C., Zeng, D., Chau, M., Chang, K. (eds.) WISI 2006. LNCS, vol. 3917, pp. 16–30. Springer, Heidelberg (2006)
Lipton, E., Lichtblau, E.: Even Near Home, a New Front Is Opening in the Terror Battle. New York Times (September 23, 2004)
Lyall, S.: London Bombers Tied to Internet, Not Al Qaeda, Newspaper Says. New York Times (April 11, 2006)
Markov, A., Last, M.: A Simple, Structure-Sensitive Approach for Web Document Classification. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 293–298. Springer, Heidelberg (2005)
Markov, A., Last, M., Kandel, A.: Model-Based Classification of Web Documents Represented by Graphs. In: Proceedings of Web KDD 2006 Workshop on Knowledge Discovery on the Web at KDD 2006, pp. 31–38 (2006)
Mitchell, T.M.: Machine Learning. McGraw-Hill, Boston (1997)
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Reid, E., Qin, J., Zhou, Y., Lai, G., Sageman, M., Weimann, G., Chen, H.: Collecting and Analyzing the Presence of Terrorists on the Web: A Case Study of Jihad Websites. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 402–411. Springer, Heidelberg (2005)
Ripplinger, B.: The Use of NLP Techniques in CLIR. In: The Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation (2000)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1971)
Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for Web Content Mining. Series in Machine Perception and Artificial Intelligence, vol. 62. World Scientific, Singapore (2005)
Schenker, A., Last, M., Bunke, H., Kandel, A.: Classification of Web Documents Us-ing Graph Matching. International Journal of Pattern Recognition and Artificial Intel-ligence, Special Issue on Graph Matching in Computer Vision and Pattern Recognition 18(3), 475–496 (2004)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (1999/2002)
Talbot, D.: Terror’s Server. Technology Review (2005), http://www.technologyreview.com/articles/05/02/issue/feature_terror.asp
Yan, X., Han, J.: GSpan: Graph-Based Substructure Pattern Mining. In: IEEE International Conference on Data Mining (ICDM 2002) (2002)
Yang, Y., Slattery, S., Ghani, R.: A Study of Approaches to Hypertext Categorization. Journal of Intelligent Information Systems 18(2-3), 219–241 (2002)
Zhou, Y., Reid, E., Qin, J., Chen, H., Lai, G.: US Domestic Extremist Groups on the Web: Link and Content Analysis. IEEE Intelligent Systems, special issue on AI for Homeland Security 20(5), 44–51 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Last, M., Markov, A., Kandel, A. (2008). Multi-lingual Detection of Web Terrorist Content. In: Chen, H., Yang, C.C. (eds) Intelligence and Security Informatics. Studies in Computational Intelligence, vol 135. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69209-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-69209-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69207-2
Online ISBN: 978-3-540-69209-6
eBook Packages: EngineeringEngineering (R0)