Skip to main content

Modified Frequency-Based Term Weighting Scheme for Accurate Dark Web Content Classification

  • Conference paper
Information Retrieval Technology (AIRS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8870))

Included in the following conference series:

Abstract

Security informatics and intelligence computation plays a vital role in detecting and classifying terrorism contents in the web. Accurate web content classification using the computational intelligence and security informatics will increase the opportunities of the early detection of the potential terrorist activities. In this paper, we propose a modified frequency-based term weighting scheme for accurate Dark Web content classification. The proposed term weighting scheme is compared to the common techniques used in text classification such as Term Frequency (TF), Term Frequency-Inverse Document Frequency (TF-IFD), and Term Frequency- Relative Frequency (tf.rf), on a dataset selected from Dark Web Portal Forum. The experimental results show that the classification accuracy and other evaluation measures based on the proposed scheme outperforms other term weighting techniques based classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbasi, A., Chen, H.: Affect Intensity Analysis of Dark Web Forums. In: IEEE International Conference on Intelligence and Security Informatics, pp. 282–288. IEEE Press, New York (2007)

    Google Scholar 

  2. Zhou, Y., et al.: Exploring the Dark Side of the Web: Collection and Analysis of U.S. Extremist Online Forums. In: 4th IEEE International Conference on Intelligence and Security Informatics, pp. 621–626. IEEE Press, New York (2006)

    Chapter  Google Scholar 

  3. Choi, D., et al.: Text Analysis for Detecting Terrorism-Related Articles on the Web. Journal of Network and Computer Applications (2013)

    Google Scholar 

  4. Fu, T., Abbasi, A., Che, H.: A Focused Crawler for Dark Web Forums. Journal of the American Society for Information Science and Technology 61(6), 1213–1231 (2010)

    Google Scholar 

  5. Corbin, J.: Al-Qaeda: In Search of the Terror Network That Threatens the World. Thunder Mouth Press/Nation Books (2003)

    Google Scholar 

  6. Abbasi, A., Chen, H.: Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace. ACM Transactions on Information Systems 26(2), 7 (2008)

    Article  Google Scholar 

  7. Abbasi, A., Chen, H.: Applying Authorship Analysis to Extremist-Group Web Forum Messages. IEEE Intelligent Systems 20(5), 67–75 (2005)

    Article  Google Scholar 

  8. Zheng, R., et al.: A Framework for Authorship Identification of Online Messages: Writing-style Features and Classification Techniques. Journal of the American Society for Information Science and Technology 57(3), 378–393 (2006)

    Article  Google Scholar 

  9. Huang, C., Fu, T., Chen, H.: Text-Based Video Content Classification for Online Video-Sharing Sites. J. Am. Soc. Inf. Sci. Technol. 61(5), 891–906 (2010)

    Article  Google Scholar 

  10. Tianjun, F., Chun-Neng, H., Hsinchun, C.: Identification of eExtremist Videos in Online Video Sharing Sites. In: IEEE International Conference on Intelligence and Security Informatics, pp. 179–181. IEEE Press, New York (2009)

    Google Scholar 

  11. Choi, D., et al.: Building Knowledge Domain n-gram Model for Mobile Devices. Information 14(11), 3583–3590 (2011)

    Google Scholar 

  12. Ran, L., Xianjiu, G.: An Improved Algorithm to Term Weighting in Text Classification. In: International Conference on Multimedia Technology, pp. 1–3. IEEE Press, New York (2010)

    Google Scholar 

  13. Greevy, E., Smeaton, A.F.: Classifying Racist Texts using a Support Vector Machine. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Rretrieval, pp. 468–469. ACM, New York (2004)

    Google Scholar 

  14. Selamat, A., Omatu, S.: Web Page Feature Selection and Classification using Neural Networks. Inf. Sci. Inf. Comput. Sci. 158(1), 69–88 (2004)

    MathSciNet  Google Scholar 

  15. Crestani, F., et al.: Short Queries, Natural Language and Spoken Document Retrieval: Experiments at Glasgow University. In: Voorhees, E.M., Harman, D.K. (eds.) The Sixth Text Retrieval Conference (TREC-6), pp. 667–686. [NIST Special Publication 500–240], http://trec.nist.gov/pubs/trec6/papers/glasgow.ps.gz (accessed 15 December 2013)

  16. Lan, M., Tan, C.-L., Low, H.-B.: Proposing a New Term Weighting Scheme for Text Categorization. In: 21st National Conference on Artificial Intelligence, pp. 763–768. AAAI Press, Boston (2006)

    Google Scholar 

  17. Man, L., et al.: Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)

    Article  Google Scholar 

  18. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: 14th International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers Inc. (1997)

    Google Scholar 

  19. Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  20. Chiang, D.-A., et al.: The Chinese Text Categorization System with Association Rule and Category Priority. Expert Systems with Applications 35(1-2), 102–110 (2008)

    Article  Google Scholar 

  21. Sanderson, M., Ruthven, I.: Report on the Glasgow IR group (glair4) submission. In: The Fifth Text REtrieval Conference (TREC-5), Gaithersburg, Maryland, pp. 517–520 (1996)

    Google Scholar 

  22. Anwar, T., Abulaish, M.: Identifying Cliques in Dark Web Forums - An Agglomerative Clustering Approach. In: 10th IEEE International Conference on Intelligence and Security Informatics, pp. 171–173. IEEE Press, New York (2012)

    Google Scholar 

  23. Rios, S.A., Munoz, R.: Dark Web Portal Overlapping Community Detection based on Topic Models. In: ACM SIGKDD Workshop on Intelligence and Security Informatics, pp. 1–7. ACM, New York (2012)

    Google Scholar 

  24. Yang, C.C., Tang, X., Gong, X.: Identifying Dark Web Clusters with Temporal Coherence Analysis. In: IEEE International Conference on Intelligence and Security Informatics, pp. 167–172. IEEE Press, New York (2011)

    Google Scholar 

  25. L’Huillier, G., et al.: Topic-based Social Network Analysis for Virtual Communities of Interests in the Dark Web. In: ACM SIGKDD Workshop on Intelligence and Security Informatics, pp. 66–73. ACM, New York (2010)

    Google Scholar 

  26. Yang, C.C., Tang, X., Thuraisingham, B.M.: An Analysis of User Influence Ranking Algorithms on Dark Web Forums. In: ACM SIGKDD Workshop on Intelligence and Security Informatic, pp. 1–7. ACM, New York (2010)

    Google Scholar 

  27. Kramer, S.: Anomaly Detection in Extremist Web Forums using ADynamical Systems Approach. In: ACM SIGKDD Workshop on Intelligence and Security Informatics, pp. 1–10. ACM, New York (2010)

    Google Scholar 

  28. Sabbah, T., Selamat, A.: Revealing Terrorism Contents form Web Page Using Frequency Weighting Techniques. In: The International Conference on Artificial Life and Robotics (2014)

    Google Scholar 

  29. Aknine, S., Slodzian, A., Quenum, J.G.: Web personalisation for users protection: A multi-agent method. In: Mobasher, B., Anand, S.S. (eds.) ITWP 2003. LNCS (LNAI), vol. 3169, pp. 306–323. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  30. Saad, M.K., Ashour, W.: OSAC.: Open Source Arabic Corpora. In: 6th International Symposium on Electrical and Electronics Engineering and Computer Science, Cyprus, pp. 118–123 (2010)

    Google Scholar 

  31. Chen, H.: Exploring extremism and terrorism on the web: The dark web project. In: Yang, C.C., et al. (eds.) PAISI 2007. LNCS, vol. 4430, pp. 1–20. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  32. Lee, L., et al.: An Enhanced Support Vector Machine Classification Framework by Using Euclidean Distance Function for Text Document Categorization. Applied Intelligence 37(1), 80–99 (2012)

    Article  Google Scholar 

  33. Chisholm, E., Kolda, T.G.: New Term Weighting Formulas for the Vector Space Method in Information Retrieval. Computer Science and Mathematics Division, Oak Ridge National Laboratory (1999)

    Google Scholar 

  34. Last, M., Markov, A., Kandel, A.: Multi-lingual detection of terrorist content on the web. In: Chen, H., Wang, F.-Y., Yang, C.C., Zeng, D., Chau, M., Chang, K. (eds.) WISI 2006. LNCS, vol. 3917, pp. 16–30. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  35. Gohary, A.F.E., et al.: A Computational Approach for Analyzing and Detecting Emotions in Arabic Text. International Journal of Engineering Research and Applications (IJERA) 3(3), 100–107 (2013)

    Google Scholar 

  36. Ceri, S., et al.: An Introduction to Information Retrieval. In: Web Information Retrieval, pp. 3–11. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  37. Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)

    Article  Google Scholar 

  38. Zimbra, D. and H. Chen.: Scalable Sentiment Classification Across Multiple Dark Web Forums. In: 10th IEEE International Conference on Intelligence and Security Informatics, PP. 78-83. IEEE Computer Society (2012)

    Google Scholar 

  39. Xianshan, Z., Guangzhu, Y.: Finding Criminal Suspects by Improving the Accuracy of Similarity Measurement. In: 9th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1145–1149. IEEE Press, New York (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sabbah, T., Selamat, A. (2014). Modified Frequency-Based Term Weighting Scheme for Accurate Dark Web Content Classification. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12844-3_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12843-6

  • Online ISBN: 978-3-319-12844-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics