Abstract
Gambling and porn websites are more and more harmful to the health and growth of the youth with the rapid development of the Internet, however, the text contents and URLs based website classification methods could not get satisfying on gambling and porn websites detection because domain names of them change fast. Meanwhile, the visual based website classification has gotten perfect results in phishing website detection which encourages us. Therefore, we introduce the visual feature to identify gambling websites and porn websites in this paper. Firstly, we develop a website screenshot tool which could save the full contents of a website to be a image, Secondly, the effective feature is chosen by BoW model to recognize the screenshots of gambling websites and porn websites, and the appropriate parameters are chosen to promote the efficiency of classification. Finally, experimental results on our collected gambling websites and porn website datasets demonstrate that our proposed method is able to recognize the gambling and porn websites and gets satisfying results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhalla, V.K., Kumar, N.: An efficient scheme for automatic web pages categorization using the support vector machine. New Rev. Hypermedia Multimed. 22, 223–242 (2016)
Zheng, Y., Sun, C., Zhu, C.: LWCS: a large-scale web page classification system based on anchor graph hashing. In: IEEE International Conference on Software Engineering and Service Science, pp. 90–94 (2015)
Sarode, S., Gadge, J.: Hybrid dimensionality reduction approach for web page classification. In: International Conference on Communication, Information and Computing Technology (2015)
Sirageldin, A., Baharudin, B.B., Jung, L.T.: Malicious web page detection: a machine learning approach. In: Jeong, H.Y., Obaidat, M.S., Yen, N.Y., Park, J.J.J.H. (eds.) Advances in Computer Science and its Applications. LNEE, vol. 279, pp. 217–224. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-41674-3_32
Rajalakshmi, R., Aravindan, C.: Web page classification using n-gram based URL features. In: International Conference on Advanced Computing, pp. 15–21 (2013)
Maurer, M.-E., Höfer, L.: Sophisticated phishers make more spelling mistakes: using URL similarity against phishing. In: Xiang, Y., Lopez, J., Kuo, C.-C.J., Zhou, W. (eds.) CSS 2012. LNCS, vol. 7672, pp. 414–426. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35362-8_31
Zhou, Y., Zhang, Y., Xiao, J., Wang, Y., Lin, W.: Visual similarity based anti-phishing with the combination of local and global features. In: International Conference on Trust, Security and Privacy in Computing and Communications, pp. 189–196 (2014)
Rao, R.S., Ali, S.T.: A computer vision technique to detect phishing attacks. In: Fifth International Conference on Communication Systems and Network Technologies (2015)
Afroz, S., Greenstadt, R.: PhishZoo: detecting phishing websites by looking at them. In: Fifth IEEE International Conference on Semantic Computing, pp. 368–375 (2011)
Bozkir, A.S., Sezer, E.A.: Use of HOG descriptors in phishing detection (2016)
Cao, Z., Xiong, G., Zhao, Y., Li, Z., Guo, L.: A survey on encrypted traffic classification. In: Batten, L., Li, G., Niu, W., Warren, M. (eds.) ATIS 2014. CCIS, vol. 490, pp. 73–81. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45670-5_8
Dong, K., Guo, L., Fu, Q.: An adult image detection algorithm based on bag-of-visual-words and text information. In: International Conference on Natural Computation, pp. 556–560 (2014)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines, pp. 389–396 (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Yao, N., Bai, T.C., Chen, J.: Improved fast corner detection based on Harris algorithm for Chinese characters, pp. 767–770 (2013)
Bay, H., Tuytelaars, T., Gool, L.V.: SURF: speeded up robust features. Comput. Vis. Image Underst. 110(3), 404–417 (2006)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(60), 91–110 (2004)
Agrawal, M., Konolige, K., Blas, M.R.: CenSurE: Center surround extremas for realtime feature detection and matching. In: European Conference on Computer Vision, pp. 102–115. IEEE (2008)
Rublee, E., Rabaud, V., Konolige, K, Bradski, G.: ORB: an efficient alternative to SIFT or SURF. vol. 58, pp. 2564–2571 (2011)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. In: European Conference on Computer Vision, pp. 778–792. IEEE (2010)
Acknowledgements
This work is supported by The National Natural Science Foundation of China (No. 61602472, No. U1636217), The National Key Research and Development Program of China (NO. 2016YFB0801200).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Li, L., Gou, G., Xiong, G., Cao, Z., Li, Z. (2018). Identifying Gambling and Porn Websites with Image Recognition. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10736. Springer, Cham. https://doi.org/10.1007/978-3-319-77383-4_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-77383-4_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77382-7
Online ISBN: 978-3-319-77383-4
eBook Packages: Computer ScienceComputer Science (R0)