Content Based Fraudulent Website Detection Using Supervised Machine Learning Techniques

Maktabar, Mahdi; Zainal, Anazida; Maarof, Mohd Aizaini; Kassim, Mohamad Nizam

doi:10.1007/978-3-319-76351-4_30

Content Based Fraudulent Website Detection Using Supervised Machine Learning Techniques

Mahdi Maktabar¹⁸,
Anazida Zainal¹⁸,
Mohd Aizaini Maarof¹⁸ &
…
Mohamad Nizam Kassim¹⁹

Conference paper
First Online: 16 March 2018

993 Accesses
9 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 734))

Abstract

Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites, yet none of them managed to offer an efficient solution to suppress these fraudulent activities. With this regard, this research proposes a fraudulent website detection model based on sentiment analysis of the textual contents of a given website, natural language processing and supervised machine learning techniques. The proposed model consists of four primary phases which are data acquisition phase, preprocessing phase, feature extraction phase and classification phase. Crawler is used to obtained data from Internet and data was cleaned to remove non-discriminative noises and reshape into desired format. Later, meaningful and discriminative patterns are extracted. Finally classification phase consists of supervised machine learning techniques to construct the fraudulent website detection model. This research employs 10-fold stratified cross validation technique in order to validate the performance of the proposed model. Experimental results show that the proposed fraudulent website detection model with cross validated accuracy of 97.67% and FPR of 3.49% achieved satisfactory results and served the aim of this research.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Perner, P.: Advances in Data Mining: Applications and Theoretical Aspects. In: Proceedings of 10th Industrial Conference, ICDM 2010, 12–14 July 2010, vol. 6171. Springer, Heidelberg (2010)
Google Scholar
Abbasi, A., Chen, H.: A comparison of tools for detecting fake websites. Computer 42(10), 78–86 (2009)
Article Google Scholar
Abbasi, A., Chen, H.: Detecting fake escrow websites using rich fraud cues and kernel based methods. In: Annual Workshop on Information Technologies and Systems, pp. 1–6 (2007)
Google Scholar
Mohammad, R.M., Thabtah, F., McCluskey, L.: Tutorial and critical analysis of phishing websites methods. Sci. Rev. 17, 1–24 (2015)
MathSciNet Google Scholar
Phua, C., Lee, V., Smith, K., Gayler, R.: A comprehensive survey of data mining-based fraud detection research. In: 2010 International Conference on Intelligent Computation Technology and Automation, ICICTA 2010, vol. 1, pp. 50–53 (2010)
Google Scholar
Le, A. and Markopoulou, A.: PhishDef: url names say it all. In: INFOCOM Proceedings IEEE, pp. 191–195 (2010)
Google Scholar
Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., Nunamaker Jr., J.F.: Detecting fake websites: the contribution of statistical learning theory. MIS Q. 34(3), 435–461 (2010)
Article Google Scholar
Martines-romo, J., Araujo, L.: Web spam identification through language model analysis. In: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 21–28 (2009)
Google Scholar
Urvoy, T., Lavergne, T., Filoche, P.: Tracking web spam with hidden style similarity. In: AIRWeb, pp. 25–31 (2006)
Google Scholar
Ntoulas, A., Hall., B., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of 15th International Conference on World Wide Web, pp. 83–92 (2006)
Google Scholar
Shen, G., Gao, B. Liu, T. Y., Feng, G., Song, S., Li, H.: Detecting link spam using temporal information. In: Proceedings of IEEE International Conference on Data Mining, ICDM, vol. 49, pp. 1049–1053 (2006)
Google Scholar
Becchetti, L., Donato, D., Baeza-yates, R., Leonardi, S.: Link analysis for web spam detection. ACM Trans. Web. 2(1), 1–42 (2007)
Article Google Scholar
Drost, I., Scheffer, T.: Thwarting the nigritude ultramarine: learning to identify link spam. In: European Conference on Machine Learning. LNCS(LNAI), vol. 3720, pp. 96–107 (2005)
Chapter Google Scholar
Abbasi, A.: Detecting fake medical web sites using recursive trust labeling. ACM Trans. Inf. Syst. 30(4), 1–22 (2012)
Article Google Scholar
Liu, W., Deng, X., Huang, G., Fu, A.Y.: An antiphishing strategy based on visual similarity assessment. IEEE Internet Comput. 10(2), 58–65 (2006)
Article Google Scholar
Chou, N., Ledesma, R., Teraguchi, Y. Boneh, D., Mitchell, J.C., Ca, S.: Client-side defense against web-based identity theft. In: NDSS, pp. 1–16 (2004)
Google Scholar
Abbasi, A., Zhang, Z., Chen., H.: A Statistical Learning Based System for Fake Website Detection, no. 4, pp. 3–4 (2008)
Google Scholar
Ignatow, G., Mihalcea, R.: Text Mining: A Guidebook for the Social Sciences. Sage Publication, Los Angeles (2016)
Google Scholar
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the workshop on Speech and Natural Language 1992, pp. 112–116 (1992)
Google Scholar

Download references

Acknowledgement

This work is supported by the Ministry of Higher Education (MOHE) and Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under Fundamental Research Grant (FRGS) VOT R.J130000.7828.4F809.

Author information

Authors and Affiliations

Faculty of Computing, Universiti Teknologi Malaysia, Johor, Malaysia
Mahdi Maktabar, Anazida Zainal & Mohd Aizaini Maarof
Cyber Security Responsive Services, CyberSecurity Malaysia, Seri Kembangan, Malaysia
Mohamad Nizam Kassim

Authors

Mahdi Maktabar
View author publications
You can also search for this author in PubMed Google Scholar
Anazida Zainal
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Aizaini Maarof
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Nizam Kassim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anazida Zainal .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, Auburn, WA, USA
Ajith Abraham
Department of Computer Science, South Asian University, Chanakyapuri, Delhi, India
Pranab Kr. Muhuri
Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Durian Tunggal, Melaka, Malaysia
Azah Kamilah Muda
Machine Intelligence Research Labs, Auburn, WA, USA
Niketa Gandhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maktabar, M., Zainal, A., Maarof, M.A., Kassim, M.N. (2018). Content Based Fraudulent Website Detection Using Supervised Machine Learning Techniques. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Hybrid Intelligent Systems. HIS 2017. Advances in Intelligent Systems and Computing, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-319-76351-4_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-76351-4_30
Published: 16 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76350-7
Online ISBN: 978-3-319-76351-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics