Skip to main content

Content Based Fraudulent Website Detection Using Supervised Machine Learning Techniques

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 734))

Abstract

Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites, yet none of them managed to offer an efficient solution to suppress these fraudulent activities. With this regard, this research proposes a fraudulent website detection model based on sentiment analysis of the textual contents of a given website, natural language processing and supervised machine learning techniques. The proposed model consists of four primary phases which are data acquisition phase, preprocessing phase, feature extraction phase and classification phase. Crawler is used to obtained data from Internet and data was cleaned to remove non-discriminative noises and reshape into desired format. Later, meaningful and discriminative patterns are extracted. Finally classification phase consists of supervised machine learning techniques to construct the fraudulent website detection model. This research employs 10-fold stratified cross validation technique in order to validate the performance of the proposed model. Experimental results show that the proposed fraudulent website detection model with cross validated accuracy of 97.67% and FPR of 3.49% achieved satisfactory results and served the aim of this research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Perner, P.: Advances in Data Mining: Applications and Theoretical Aspects. In: Proceedings of 10th Industrial Conference, ICDM 2010, 12–14 July 2010, vol. 6171. Springer, Heidelberg (2010)

    Google Scholar 

  2. Abbasi, A., Chen, H.: A comparison of tools for detecting fake websites. Computer 42(10), 78–86 (2009)

    Article  Google Scholar 

  3. Abbasi, A., Chen, H.: Detecting fake escrow websites using rich fraud cues and kernel based methods. In: Annual Workshop on Information Technologies and Systems, pp. 1–6 (2007)

    Google Scholar 

  4. Mohammad, R.M., Thabtah, F., McCluskey, L.: Tutorial and critical analysis of phishing websites methods. Sci. Rev. 17, 1–24 (2015)

    MathSciNet  Google Scholar 

  5. Phua, C., Lee, V., Smith, K., Gayler, R.: A comprehensive survey of data mining-based fraud detection research. In: 2010 International Conference on Intelligent Computation Technology and Automation, ICICTA 2010, vol. 1, pp. 50–53 (2010)

    Google Scholar 

  6. Le, A. and Markopoulou, A.: PhishDef: url names say it all. In: INFOCOM Proceedings IEEE, pp. 191–195 (2010)

    Google Scholar 

  7. Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., Nunamaker Jr., J.F.: Detecting fake websites: the contribution of statistical learning theory. MIS Q. 34(3), 435–461 (2010)

    Article  Google Scholar 

  8. Martines-romo, J., Araujo, L.: Web spam identification through language model analysis. In: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 21–28 (2009)

    Google Scholar 

  9. Urvoy, T., Lavergne, T., Filoche, P.: Tracking web spam with hidden style similarity. In: AIRWeb, pp. 25–31 (2006)

    Google Scholar 

  10. Ntoulas, A., Hall., B., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of 15th International Conference on World Wide Web, pp. 83–92 (2006)

    Google Scholar 

  11. Shen, G., Gao, B. Liu, T. Y., Feng, G., Song, S., Li, H.: Detecting link spam using temporal information. In: Proceedings of IEEE International Conference on Data Mining, ICDM, vol. 49, pp. 1049–1053 (2006)

    Google Scholar 

  12. Becchetti, L., Donato, D., Baeza-yates, R., Leonardi, S.: Link analysis for web spam detection. ACM Trans. Web. 2(1), 1–42 (2007)

    Article  Google Scholar 

  13. Drost, I., Scheffer, T.: Thwarting the nigritude ultramarine: learning to identify link spam. In: European Conference on Machine Learning. LNCS(LNAI), vol. 3720, pp. 96–107 (2005)

    Chapter  Google Scholar 

  14. Abbasi, A.: Detecting fake medical web sites using recursive trust labeling. ACM Trans. Inf. Syst. 30(4), 1–22 (2012)

    Article  Google Scholar 

  15. Liu, W., Deng, X., Huang, G., Fu, A.Y.: An antiphishing strategy based on visual similarity assessment. IEEE Internet Comput. 10(2), 58–65 (2006)

    Article  Google Scholar 

  16. Chou, N., Ledesma, R., Teraguchi, Y. Boneh, D., Mitchell, J.C., Ca, S.: Client-side defense against web-based identity theft. In: NDSS, pp. 1–16 (2004)

    Google Scholar 

  17. Abbasi, A., Zhang, Z., Chen., H.: A Statistical Learning Based System for Fake Website Detection, no. 4, pp. 3–4 (2008)

    Google Scholar 

  18. Ignatow, G., Mihalcea, R.: Text Mining: A Guidebook for the Social Sciences. Sage Publication, Los Angeles (2016)

    Google Scholar 

  19. Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the workshop on Speech and Natural Language 1992, pp. 112–116 (1992)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the Ministry of Higher Education (MOHE) and Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under Fundamental Research Grant (FRGS) VOT R.J130000.7828.4F809.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anazida Zainal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maktabar, M., Zainal, A., Maarof, M.A., Kassim, M.N. (2018). Content Based Fraudulent Website Detection Using Supervised Machine Learning Techniques. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Hybrid Intelligent Systems. HIS 2017. Advances in Intelligent Systems and Computing, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-319-76351-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76351-4_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76350-7

  • Online ISBN: 978-3-319-76351-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics