Abstract
The paper is devoted to the investigation of web pages classification algorithms for protection against inappropriate information on the Internet. The approach for combining of classification algorithms based on different aspects of the source data and different machine learning methods is proposed. The experiments results of this approach application for website classification is presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Open directory project (dmoz). https://dmoztools.net/. Accessed 17 July 2019
Shalla’s blacklists. http://www.shallalist.de/. Accessed 17 July 2019
Khonji, M., Iraqi, Y., Jones, A.: Enhancing phishing e-mail classifiers: a lexical url analysis approach. Int. J. Inf. Secur. Res. (IJISR) 2(1/2), 40 (2012)
Kotenko, I., Chechulin, A., Komashinsky, D.: Evaluation of text classification techniques for inappropriate web content blocking. In: 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), vol. 1, pp. 412–417. IEEE (2015)
Kotenko, I., Chechulin, A., Komashinsky, D.: Categorisation of web pages for protection against inappropriate content in the internet. Int. J. Internet Protoc. Technol. (IJIPT) 1(10), 61–71 (2017)
Novozhilov, D., Kotenko, I., Chechulin, A.: Improving the categorization of web sites by analysis of html-tags statistics to block inappropriate content. In: Intelligent Distributed Computing IX, pp. 257–263. Springer (2016)
Patil, A.S., Pawar, B.: Automated classification of web sites using naive Bayesian algorithm. In: Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1, pp. 14–16 (2012)
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. (CSUR) 41(2), 12 (2009)
Sara-Meshkizadeh, D., Masoud-Rahmani, A.: Webpage classification based on compound of using html features & url features and features of sibling pages. Int. J. Adv. Comput. Technol. 2(4), 36–46 (2010)
Acknowledgements
The work is performed by the grant of RSF 18-11-00302 in SPIIRAS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gaifulina, D., Chechulin, A. (2020). Development of the Complex Algorithm for Web Pages Classification to Detection Inappropriate Information on the Internet. In: Kotenko, I., Badica, C., Desnitsky, V., El Baz, D., Ivanovic, M. (eds) Intelligent Distributed Computing XIII. IDC 2019. Studies in Computational Intelligence, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-32258-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-32258-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32257-1
Online ISBN: 978-3-030-32258-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)