poster

POSTER: A PU Learning based System for Potential Malicious URL Detection

Authors:

Zhi-Hua ZhouAuthors Info & Claims

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

Pages 2599 - 2601

https://doi.org/10.1145/3133956.3138825

Published: 30 October 2017 Publication History

Get Access

Abstract

This paper describes a PU learning (Positive and Unlabeled learning) based system for potential URL attack detection. Previous machine learning based solutions for this task mainly formalize it as a supervised learning problem. However, in some scenarios, the data obtained always contains only a handful of known attack URLs, along with a large number of unlabeled instances, making the supervised learning paradigms infeasible. In this work, we formalize this setting as a PU learning problem, and solve it by combining two different strategies (two-stage strategy and cost-sensitive strategy). Experimental results show that the developed system can effectively find potential URL attacks. This system can either be deployed as an assistance for existing system or be employed to help cyber-security engineers to effectively discover potential attack mode so that they can improve the existing system with significantly less efforts.

References

[1]

Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-Supervised Learning. IEEE Transactions on Neural Networks 20, 3 (2009), 542--542.

Digital Library

Google Scholar

[2]

Marthinus C du Plessis, Gang Niu, and Masashi Sugiyama. 2014. Analysis of Learning from Positive and Unlabeled Data. In Advances in Neural Information Processing Systems 27. 703--711.

Google Scholar

[3]

Charles Elkan and Keith Noto. 2008. Learning Classifiers from Only Positive and Unlabeled Data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 213--220.

Digital Library

Google Scholar

[4]

Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee, and Philip S Yu. 2003. Building Text Classifiers Using Positive and Unlabeled Examples. In Proceeding of the 3rd IEEE International Conference on Data Mining. 179--186.

Crossref

Google Scholar

[5]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In Proceeding ot the 8th IEEE International Conference on Data Mining. 413--422.

Digital Library

Google Scholar

[6]

Justin Ma, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. 2009. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1245--1254.

Digital Library

Google Scholar

[7]

Zhi-Hua Zhou and Ming Li. 2010. Semi-Supervised Learning by Disagreement. Knowledge and Information Systems 24, 3 (2010), 415--439.

Digital Library

Google Scholar

Cited By

View all

Vishwa V H Jayamangala (2024)Interdisciplinary Strategies for the Resurrection of Antibiotic Failures into Cutting-Edge HerbicidesInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-18601(1-3)Online publication date: 30-May-2024
https://doi.org/10.48175/IJARSCT-18601
Jiang PXiao JLi DYu HBai YGuo YChen X(2024)Detecting Malicious Websites From the Perspective of System Provenance AnalysisIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.327761321:3(1406-1423)Online publication date: May-2024
https://doi.org/10.1109/TDSC.2023.3277613
Wang WYi PXu H(2024)A PU‐learning based approach for cross‐site scripting attacking reality detectionIET Networks10.1049/ntw2.12123Online publication date: 2-Apr-2024
https://doi.org/10.1049/ntw2.12123
Show More Cited By

Index Terms

POSTER: A PU Learning based System for Potential Malicious URL Detection
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Intrusion detection systems

Recommendations

Poster: When Adversary Becomes the Guardian -- Towards Side-channel Security With Adversarial Attacks
CCS '19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

Machine learning algorithms fall prey to adversarial examples. As profiling side-channel attacks are seeing rapid adoption of machine learning-based approaches that can even defeat commonly used side-channel countermeasures, we investigate the potential ...
POSTER: (Semi)-Supervised Machine Learning Approaches for Network Security in High-Dimensional Network Data
CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

Network security represents a keystone to ISPs, who need to cope with an increasing number of network attacks that put the network's integrity at risk. The high-dimensionality of network data provided by current network monitoring systems opens the door ...
Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

In the multi-instance learning (MIL) setting instances are grouped together into bags. Labels are provided only for the bags and not on the level of individual instances. A positive bag label means that at least one instance inside the bag is positive, ...

Comments

Information & Contributors

Information

Published In

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

October 2017

2682 pages

ISBN:9781450349468

DOI:10.1145/3133956

General Chair:
Bhavani Thuraisingham
The University of Texas at Dallas, USA
,
Program Chairs:
David Evans
University of Virginia
,
Tal Malkin
Columbia University
,
Dongyan Xu
Purdue University

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2017

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

NSFC
Collaborative Innovation Center of Novel Software Technology and Industrialization

Conference

CCS '17

Sponsor:

SIGSAC

CCS '17: 2017 ACM SIGSAC Conference on Computer and Communications Security

October 30 - November 3, 2017

Texas, Dallas, USA

Acceptance Rates

CCS '17 Paper Acceptance Rate 151 of 836 submissions, 18%;

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
782
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Vishwa V H Jayamangala (2024)Interdisciplinary Strategies for the Resurrection of Antibiotic Failures into Cutting-Edge HerbicidesInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-18601(1-3)Online publication date: 30-May-2024
https://doi.org/10.48175/IJARSCT-18601
Jiang PXiao JLi DYu HBai YGuo YChen X(2024)Detecting Malicious Websites From the Perspective of System Provenance AnalysisIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.327761321:3(1406-1423)Online publication date: May-2024
https://doi.org/10.1109/TDSC.2023.3277613
Wang WYi PXu H(2024)A PU‐learning based approach for cross‐site scripting attacking reality detectionIET Networks10.1049/ntw2.12123Online publication date: 2-Apr-2024
https://doi.org/10.1049/ntw2.12123
He KWang YXie XShao D(2023)Prediction of Proteins in Cerebrospinal Fluid and Application to Glioma Biomarker IdentificationMolecules10.3390/molecules2808361728:8(3617)Online publication date: 21-Apr-2023
https://doi.org/10.3390/molecules28083617
Jiang YXu QZhao YYang ZWen PCao XHuang Q(2023)Positive-Unlabeled Learning With Label Distribution AlignmentIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.331943145:12(15345-15363)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3319431
Xu FWang NZhao X(2023)Exploring Global and Local Information for Anomaly Detection with Normal Samples2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10394490(3422-3427)Online publication date: 1-Oct-2023
https://doi.org/10.1109/SMC53992.2023.10394490
Deng WChen LHong YLiu Y(2023)An Abnormal Event Classification Model for Big Data Platforms Based on Semi-supervised Learning2023 9th International Conference on Systems and Informatics (ICSAI)10.1109/ICSAI61474.2023.10423326(1-6)Online publication date: 16-Dec-2023
https://doi.org/10.1109/ICSAI61474.2023.10423326
Fan ZYao YDu YDu X(2023)Self-paced and Reweighting PU Learning for Imbalanced Malicious Traffic DetectionGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437512(6018-6023)Online publication date: 4-Dec-2023
https://doi.org/10.1109/GLOBECOM54140.2023.10437512
Hu ZYuan Z(2023)A Review of Data-Driven Approaches for Malicious Website Detection2023 7th Asian Conference on Artificial Intelligence Technology (ACAIT)10.1109/ACAIT60137.2023.10528600(75-82)Online publication date: 10-Nov-2023
https://doi.org/10.1109/ACAIT60137.2023.10528600
Xu HWang YPang GJian SLiu NWang Y(2023)RoSASInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10345960:5Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1016/j.ipm.2023.103459
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Poster: When Adversary Becomes the Guardian -- Towards Side-channel Security With Adversarial Attacks

POSTER: (Semi)-Supervised Machine Learning Approaches for Network Security in High-Dimensional Network Data

Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations