Skip to main content

Multi-lingual Detection of Terrorist Content on the Web

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3917))

Abstract

Since the web is increasingly used by terrorist organizations for propaganda, disinformation, and other purposes, the ability to automatically detect terrorist-related content in multiple languages can be extremely useful. In this paper we describe a new, classification-based approach to multi-lingual detection of terrorist documents. The proposed approach builds upon the recently developed graph-based web document representation model combined with the popular C4.5 decision-tree classification algorithm. Evaluation is performed on a collection of 648 web documents in Arabic language. The results demonstrate that documents downloaded from several known terrorist sites can be reliably discriminated from the content of Arabic news reports using a simple decision tree.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)

    MATH  Google Scholar 

  2. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  3. Aljlayl, M., Frieder, O.: Effective Arabic-English Cross-Language Information Retrieval via Machine-Readable Dictionaries and Machine Translation. In: Tenth International Conference on Information and Knowledge Management (October 2001)

    Google Scholar 

  4. Larkey, L.S., Feng, F., Connell, M., Lavrenko, V.: Language-Specific Models in Multilingual Topic Tracking. In: 27th Annual International Conference on Research and Development in Information Retrieval (July 2004)

    Google Scholar 

  5. Larson, R., Gey, F., Chen, A.: Harvesting Translingual Vocabulary Mappings for Multilingual Digital Libraries. In: 2nd ACM/IEEE-CS joint conference on Digital libraries (July 2002)

    Google Scholar 

  6. Markov, A., Last, M.: A Simple, Structure-Sensitive Approach for Web Document Classification. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 293–298. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Ramakrishna, K., Tan, S.S. (eds.): After Bali, the Threat of Terrorism in Southeast Asia. World Scientific, Singapore (2003)

    Google Scholar 

  8. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys (1999)

    Google Scholar 

  9. Maria, N., Silva, M.J.: Theme-based Retrieval of Web news. In: 23rd Annual International ACM SIGIR Conference on Research and Development In Information Retrieval (July 2000)

    Google Scholar 

  10. Carreira, R., Crato, J.M., Gonçalves, D., Jorge, J.A.: Evaluating Adaptive User Profiles for News Classification. In: 9th International Conference on Intelligent User Interface (January 2004)

    Google Scholar 

  11. McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI–1998 Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  12. Reis, D., Golgher, P., Leander, A., Silva, A.: Automatic Web News Extraction Using Tree Edit Distance. In: 13th International Conference on World Wide Web (2004)

    Google Scholar 

  13. Amati, G., Crestani, F.: Probabilistic Learning for Selective Dissemination of Information. Information Processing and Management 35(5), 633–654 (1999)

    Article  Google Scholar 

  14. Tauritz, D., Kok, J., Sprinkhuizen-Kuyper, I.: Adaptive Information Filtering Using Evolutionary Computation. Information Sciences 122(2–4), 121–140 (2000)

    Article  MATH  Google Scholar 

  15. Dumais, S., Chen, H.: Hierarchical classification of Web content. In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (July 2000)

    Google Scholar 

  16. Eirinaki, M., Vazirgiannis, M.: Web Mining for Web Personalization. In: ACM Transactions on Internet Technology (TOIT) (February 2003)

    Google Scholar 

  17. Mulvenna, M., Anands, S., Buchner, A.: Personalization on the Net Using Web Mining. Communications of the ACM (August 2000)

    Google Scholar 

  18. Eirinaki, M., Vazirgiannis, M., Varlamis, I.: Sewep: Using Site Semantics and a Taxonomy to Enhance the Web Personalization Process. In: Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (August 2003)

    Google Scholar 

  19. Weiss, S.M., Apte, C., Damerau, F.J., Johnson, D.E., Oles, F.J., Goetz, T., Hampp, T.: Maximizing Text-Mining Performance. IEEE Intelligent Systems 14(4), 63–69 (1999)

    Article  Google Scholar 

  20. Salton, G., Wong, A., Yang, C.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1971)

    Article  MATH  Google Scholar 

  21. Tzeras, K., Hartmann, S.: Automatic Indexing Based on Bayesian Inference Networks. In: 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (July 1993)

    Google Scholar 

  22. Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval, Technical Report: TR87-881 (1987)

    Google Scholar 

  23. Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for Web Content Mining. Series in Machine Perception and Artificial Intelligence, vol. 62. World Scientific, Singapore (2005)

    MATH  Google Scholar 

  24. Schenker, M., Last, H., Bunke, A.: Classification of Web Documents Using Graph Matching. International Journal of Pattern Recognition and Artificial Intelligence, Special Issue on Graph Matching in Computer Vision and Pattern Recognition 18(3), 475–496 (2004)

    Article  Google Scholar 

  25. Kuramochi, M., Karypis, G.: An Efficient Algorithm for Discovering Frequent Subgraphs, Technical Report TR# 02-26, Dept. of Computer Science and Engineering, University of Minnesota (2002)

    Google Scholar 

  26. Yang, Y., Slattery, S., Ghani, R.: A Study of Approaches to Hypertext Categorization. Journal of Intelligent Information Systems (March 2002)

    Google Scholar 

  27. Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: IEEE International Conference on Data Mining (ICDM 2002) (December 2002)

    Google Scholar 

  28. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  29. Quinlan, J.R.: C4.5: Programs for Machine Learning (1993)

    Google Scholar 

  30. Ahmed, C.J., David, F., William, O.: UCLIR: a Multilingual Information Retrieval tool. Multilingual Information Access and Natural Language Processing (November 2002)

    Google Scholar 

  31. Ripplinger, B.: The Use of NLP Techniques in CLIR. In: Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation (September 2000)

    Google Scholar 

  32. Maimon, O., Last, M.: Knowledge Discovery and Data Mining – The Info-Fuzzy Network (IFN) Methodology. Massive Computing Series. Kluwer Academic Publishers, Dordrecht (2000)

    MATH  Google Scholar 

  33. Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis. In: SIGIR (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Last, M., Markov, A., Kandel, A. (2006). Multi-lingual Detection of Terrorist Content on the Web. In: Chen, H., Wang, FY., Yang, C.C., Zeng, D., Chau, M., Chang, K. (eds) Intelligence and Security Informatics. WISI 2006. Lecture Notes in Computer Science, vol 3917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11734628_3

Download citation

  • DOI: https://doi.org/10.1007/11734628_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33361-6

  • Online ISBN: 978-3-540-33362-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics