skip to main content
10.1145/3373477.3373497acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaissConference Proceedingsconference-collections
research-article

Deep URL: design of adult URL classifier using deep neural network

Published: 15 January 2020 Publication History

Abstract

Nowadays, many people rely on internet for various information needs, due to the development of advanced technologies. The internet has unlimited web resources, but some contents are not appropriate for all the age groups, especially children under 18. The number of adult websites increases every day thereby posing challenge for existing content-based / black listing approaches, which require entire web page contents for classification purpose / frequent database updates. To overcome the above issues, we propose an URL based deep learning model that not only avoids the unnecessary content downloads, but also handles the dynamic nature of web. As the URL is a sequence of characters, a novel embedding method is proposed for effective URL representation. A Recurrent Convolutional Neural Network based approach is also proposed that can classify the Adult websites by learning the significant features derived only from URLs. By conducting various experiments on the benchmark ODP dataset, we have analyzed the performance of the proposed approach. From the experimental results, it is shown that an accuracy of 87.6% has been achieved which is a significant improvement over the existing approaches.

References

[1]
H. Zuo, W. Hu and O. Wu(2010), Patch-based skin color detection and its application to pornography image filtering, Proceedings of the 19th international conference on World wide web, pp: 1227--1228.
[2]
M. Chau and H. Chen(2008), A machine learning approach to web page filtering using content and structure analysis, Decis. Support Syst., vol. 44, pp. 482--494, 1 2008.
[3]
T. A. Abdallah and B. La Iglesia (2015), "URL-Based Web Page Classification: With n-Gram Language Models," in Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol 14, pp:14--21.
[4]
R., Rajalakshmi, Hans Tiwari, Jay Patel, Rameshkannan R. and Karthik R. (2019), Bidirectional GRU-Based Attention Model for Kid-Specific URL Classification." Deep Learning Techniques and Optimization Strategies in Big Data Analytics. IGI Global, 2020. 78--90. Web. 26.
[5]
R. Rajalakshmi and C. Aravindan (2018), A Naive Bayes approach for URL classification with supervised feature selection and rejection framework," Computational Intelligence, vol. 34, pp. 363--396.
[6]
R. Rajalakshmi and S. Ramraj (2019), A deep learning approach for URL based health information search, International Journal of Innovative Technology and Exploring Engineering, vol. 8, pp. 642--646.
[7]
R. Rajalakshmi, S. Ramraj and R. Kannan(2018), Transfer Learning Approach for Identification of Malicious Domain Names: In Book Series of Communications in Computer and Information Science (CCIS-SSCC 2018), pp. 656--666.
[8]
E. Baykan, M. Henzinger, L. Marian and I. Weber (2011), A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification," ACM Trans. Web, vol. 5, pp. 15:1--15:29.
[9]
S. Sivakumar and R. Rajalakshmi (2019), Comparative Evaluation of Various Feature Weighting Methods on Movie Reviews, pp. 721--730.
[10]
J. Saxe and K. Berlin (2017), eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys," CoRR, vol. abs/1702.08568.
[11]
A. Jacovi, O. S. Shalom and Y. Goldberg (2018), Understanding Convolutional Neural Networks for Text Classification," CoRR, vol. abs/1809.08037.
[12]
J. Y. Lee and F. Dernoncourt (2016), Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks, CoRR, vol. abs/1603.03827.
[13]
X. Zhang, J. J. Zhao and Y. LeCun (2015), Character-level Convolutional Networks for Text Classification," CoRR, vol. abs/1509.01626.
[14]
H. Le, Q. Pham, D. Sahoo and S. C. H. Hoi (2018), URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection, CoRR, vol. abs/1802.03162.
[15]
R. Rajalakshmi, Chandrabose Aravindan (2018), An Effective and Discriminative Feature Learning for URL based Web page Classification, In Proceedings of 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, pp: 1374--1379.
[16]
T. Mikolov, K. Chen, G. Corrado and J. Dean (2013), Efficient Estimation of Word Representations in Vector Space, CoRR, vol. abs/1301.3781.

Cited By

View all
  • (2024)AI-UNet: Attention Information-based deep URL Network for adult webpage classificationNeural Computing and Applications10.1007/s00521-024-10408-7Online publication date: 7-Dec-2024
  • (2022)A Technology Exploration towards Trustable and Safe Use of Social Media for Vulnerable Women based on Islam and Arab CultureProceedings of the 2022 ACM Conference on Information Technology for Social Good10.1145/3524458.3547259(138-145)Online publication date: 7-Sep-2022
  • (2021)Robust Detection of Malicious URLs with Self-Paced Wide & Deep LearningIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.3121388(1-1)Online publication date: 2021

Index Terms

  1. Deep URL: design of adult URL classifier using deep neural network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AISS '19: Proceedings of the 1st International Conference on Advanced Information Science and System
    November 2019
    253 pages
    ISBN:9781450372916
    DOI:10.1145/3373477
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 January 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. URL representation
    2. adult website filtering
    3. deep URL classifier
    4. word embedding

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    AISS 2019

    Acceptance Rates

    AISS '19 Paper Acceptance Rate 41 of 95 submissions, 43%;
    Overall Acceptance Rate 41 of 95 submissions, 43%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AI-UNet: Attention Information-based deep URL Network for adult webpage classificationNeural Computing and Applications10.1007/s00521-024-10408-7Online publication date: 7-Dec-2024
    • (2022)A Technology Exploration towards Trustable and Safe Use of Social Media for Vulnerable Women based on Islam and Arab CultureProceedings of the 2022 ACM Conference on Information Technology for Social Good10.1145/3524458.3547259(138-145)Online publication date: 7-Sep-2022
    • (2021)Robust Detection of Malicious URLs with Self-Paced Wide & Deep LearningIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.3121388(1-1)Online publication date: 2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media