Abstract:
Acquiring labelled data for machine learning tasks, for example, for software performance prediction, remains a resource-intensive task. This study extends our previous w...Show MoreMetadata
Abstract:
Acquiring labelled data for machine learning tasks, for example, for software performance prediction, remains a resource-intensive task. This study extends our previous work by introducing a batch-mode deep active learning approach tailored for regression in graph-structured data. Our framework leverages the source code conversion into Flow Augmented-AST graphs (FA-AST), subsequently utilizing both supervised and unsupervised graph embeddings. In contrast to single-instance querying, the batch-mode paradigm adaptively selects clusters of unlabeled data for labelling. We deploy an array of base kernels, kernel transformations, and selection methods, informed by both Bayesian and non-Bayesian strategies, to enhance the sample efficiency of neural network regression. Our experimental evaluation, conducted on multiple real-world software performance datasets, demonstrates the efficacy of the batch mode deep active learning approach in achieving robust performance with a reduced labelling budget. The methodology scales effectively to larger datasets and requires minimal alterations to existing neural network architectures.
Published in: 2023 IEEE International Conference on Big Data (BigData)
Date of Conference: 15-18 December 2023
Date Added to IEEE Xplore: 22 January 2024
ISBN Information: