Abstract:
Website classification proves crucial for tasks like malicious website detection and information management. Current methods typically focus on effective feature extracti...Show MoreMetadata
Abstract:
Website classification proves crucial for tasks like malicious website detection and information management. Current methods typically focus on effective feature extraction and algorithm selection to create balanced website datasets, often leading to decreased performance due to data imbalance. In this study, we propose an intelligent website classification method(WebPromptM2) based on prompt-based learning with multimodal features. We design a prompt template which incorporates the textual and visual elements of the website, thereby facilitating a multimodal representation of the website, then leverage domain-specific expertise to establish mapping relationships between website categories and a label word set. Finally, we fine-tune the masked pre-trained language model (PLM) and map the prediction results to the categories. We find that our method increases recognition accuracy of tail classes and achieves superior performance on long-tail and short-tail datasets.
Published in: 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)
Date of Conference: 08-10 May 2024
Date Added to IEEE Xplore: 10 July 2024
ISBN Information: