skip to main content
10.1145/3491396.3506562acmconferencesArticle/Chapter ViewAbstractPublication PagesiceaConference Proceedingsconference-collections
research-article

An Abused Webpage Detection Method Based on Screenshots Text Recognition

Authors Info & Claims
Published:07 January 2022Publication History

ABSTRACT

With the rapid development of the Internet, webpages containing abused information such as pornography and gambling have emerged in an endless stream. These webpages are using various methods to evade traditional detection methods and which seriously make the Internet environment worse. Thus, how to accurately identify these webpages are becoming more and more significant. In response to this problem, by combining text recognition and text classification, this paper proposes an abused webpage detection method based on screenshots, which can efficiently detect and classify webpages by acquiring the user's real visible webpage information. Also, this paper uses the traditional web crawler method to conduct a comparative experiment, and the accuracy and the advantage of the method have been verified. This work will provide technical support for fighting against illegal activities and purifying the Internet environment.

References

  1. Farman Ali, Pervez Khan, Kashif Riaz, Daehan Kwak, Tamer Abuhmed, Daeyoung Park, and Kyung Sup Kwak. 2017. A fuzzy ontology and SVM-based Web content classification system. IEEE Access 5 (2017), 25781--25797.Google ScholarGoogle ScholarCross RefCross Ref
  2. China Internet Network Information Center. 2021. The 47th "Statistical Report on the Internet Development in China". Technical Report. China Internet Network Information Center.Google ScholarGoogle Scholar
  3. Zhou Fa, Guang-Gang Geng, Zhi-Wei Yan, and Xiao-Dong Lee. 2017. A robust internet abuse detection method. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 1712--1715.Google ScholarGoogle ScholarCross RefCross Ref
  4. Byeong Woo Han and Ji Won Yoon. 2016. Illegal and Harmful Information Detection Technique Using Combination of Search Words. Journal of the Korea Institute of Information Security & Cryptology 26, 2 (2016), 397--404.Google ScholarGoogle ScholarCross RefCross Ref
  5. He Han. 2019. Introduction to Natural Language Processing. The People's Posts and Telecommunications Press.Google ScholarGoogle Scholar
  6. Zhang Han-Long, Shen Bei-Jun, and Wang Yong-Jian. 2015. Illegal Website Identification Method Based on Template Detection. Journal of Nanjing University of Science and Technology 3 (2015), 266--271.Google ScholarGoogle Scholar
  7. Mahdi Hashemi. 2020. Web page classification: a survey of perspectives, gaps, and future directions. Multimedia Tools and Applications (2020), 1--25.Google ScholarGoogle Scholar
  8. Seok-Woo Jang and Sang-Hong Lee. 2018. Harmful content detection based on cascaded adaptive boosting. Journal of Sensors 2018 (2018).Google ScholarGoogle Scholar
  9. Zhang Jia-Liang, Lu Jiang-Bo, Zhang Ming-Liang, and Jia Yu. 2019. A Method for Identifying Harmful Information on webpages Based on Machine Learning.Google ScholarGoogle Scholar
  10. Longxi Li, Gaopeng Gou, Gang Xiong, Zigang Cao, and Zhen Li. 2017. Identifying Gambling and Porn Websites with Image Recognition. In Pacific Rim Conference on Multimedia. Springer, 488--497.Google ScholarGoogle Scholar
  11. Xiyan Liu, Gaofeng Meng, and Chunhong Pan. 2019. Scene text detection and recognition with advances in deep learning: a survey. International Journal on Document Analysis and Recognition (IJDAR) 22, 2 (2019), 143--162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39, 11 (2016), 2298--2304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Junghoon Shin, Sangjun Lee, and Taehyung Wang. 2014. Semantic Approach for Identifying Harmful Sites Using the Link Relations. In 2014 IEEE International Conference on Semantic Computing. IEEE, 256--257.Google ScholarGoogle Scholar
  14. Qiang Song and Gang Li. 2009. The Research on the Measurement of China Internet Illegal and Harmful Content. In 2009 Fifth International Conference on Information Assurance and Security, Vol. 1. IEEE, 705--709.Google ScholarGoogle Scholar
  15. Xiao-Ping Tian, Guang-Gang Geng, and Hong-Tao Li. 2010. A framework for multi-features based web harmful information identification. In 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), Vol. 11. IEEE, V11-614.Google ScholarGoogle Scholar
  16. Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao. 2019. Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9336--9345.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jônatas Wehrmann, Gabriel S Simões, Rodrigo C Barros, and Victor F Cavalcante. 2018. Adult content detection in videos with convolutional and recurrent neural networks. Neurocomputing 272 (2018), 432--438.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hao Yang, Kun Du, Yubao Zhang, Shuang Hao, Zhou Li, Mingxuan Liu, Haining Wang, Haixin Duan, Yazhou Shi, Xiaodong Su, et al. 2019. Casino royale: a deep exploration of illegal online gambling. In Proceedings of the 35th Annual Computer Security Applications Conference. 500--513.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Abused Webpage Detection Method Based on Screenshots Text Recognition

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ACM ICEA '21: Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging Applications
            December 2021
            241 pages
            ISBN:9781450391603
            DOI:10.1145/3491396

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 January 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader