Labelled Vulnerability Dataset on Android Source Code (LVDAndro) to Develop AI-Based Code Vulnerability Detection Models

Janaka Senanayake; Janaka Senanayake; Harsha Kalutarage; Mhd Omar Al-Kadri; Luca Piras; Andrei Petrovski

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Labelled Vulnerability Dataset on Android Source Code (LVDAndro) to Develop AI-Based Code Vulnerability Detection Models

Topics: Machine Learning and AI Security; Secure Software Development Methodologies; Security and Privacy in Mobile Systems; Software Security

In Proceedings of the 20th International Conference on Security and Cryptography SECRYPT - Volume 1, 659-666, 2023 , Rome, Italy

Authors: Janaka Senanayake ^{1

;

2} ; Harsha Kalutarage ¹ ; Mhd Omar Al-Kadri ³ ; Luca Piras ⁴ and Andrei Petrovski ¹

Affiliations: ¹ School of Computing, Robert Gordon University, Aberdeen, U.K. ; ² Faculty of Science, University of Kelaniya, Kelaniya, Sri Lanka ; ³ School of Computing and Digital Technology, Birmingham City University, Birmingham, U.K. ; ⁴ Department of Computer Science, Middlesex University, London, U.K.

Keyword(s): Android Application Security, Code Vulnerability, Labelled Dataset, Artificial Intelligence, Auto Machine Learning.

Abstract: Ensuring the security of Android applications is a vital and intricate aspect requiring careful consideration during development. Unfortunately, many apps are published without sufficient security measures, possibly due to a lack of early vulnerability identification. One possible solution is to employ machine learning models trained on a labelled dataset, but currently, available datasets are suboptimal. This study creates a sequence of datasets of Android source code vulnerabilities, named LVDAndro, labelled based on Common Weakness Enumeration (CWE). Three datasets were generated through app scanning by altering the number of apps and their sources. The LVDAndro, includes over 2,000,000 unique code samples, obtained by scanning over 15,000 apps. The AutoML technique was then applied to each dataset, as a proof of concept to evaluate the applicability of LVDAndro, in detecting vulnerable source code using machine learning. The AutoML model, trained on the dataset, achieved accuracy of 94% and F1-Score of 0.94 in binary classification, and accuracy of 94% and F1-Score of 0.93 in CWE-based multi-class classification. The LVDAndro dataset is publicly available, and continues to expand as more apps are scanned and added to the dataset regularly. The LVDAndro GitHub Repository also includes the source code for dataset generation, and model training. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 13.59.85.64

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Senanayake, J., Kalutarage, H., Al-Kadri, M. O., Piras, L. and Petrovski, A. (2023). Labelled Vulnerability Dataset on Android Source Code (LVDAndro) to Develop AI-Based Code Vulnerability Detection Models. In Proceedings of the 20th International Conference on Security and Cryptography - SECRYPT; ISBN 978-989-758-666-8; ISSN 2184-7711, SciTePress, pages 659-666. DOI: 10.5220/0012060400003555

@conference{secrypt23,
author={Janaka Senanayake and Harsha Kalutarage and Mhd Omar Al{-}Kadri and Luca Piras and Andrei Petrovski},
title={Labelled Vulnerability Dataset on Android Source Code (LVDAndro) to Develop AI-Based Code Vulnerability Detection Models},
booktitle={Proceedings of the 20th International Conference on Security and Cryptography - SECRYPT},
year={2023},
pages={659-666},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012060400003555},
isbn={978-989-758-666-8},
issn={2184-7711},
}

TY - CONF

JO - Proceedings of the 20th International Conference on Security and Cryptography - SECRYPT
TI - Labelled Vulnerability Dataset on Android Source Code (LVDAndro) to Develop AI-Based Code Vulnerability Detection Models
SN - 978-989-758-666-8
IS - 2184-7711
AU - Senanayake, J.
AU - Kalutarage, H.
AU - Al-Kadri, M.
AU - Piras, L.
AU - Petrovski, A.
PY - 2023
SP - 659
EP - 666
DO - 10.5220/0012060400003555
PB - SciTePress