Abstract:
Pretrained language models (PLMs) have made significant progress on various NLP tasks recently. However, PLMs encounter challenges when it comes to domain-specific tasks ...Show MoreMetadata
Abstract:
Pretrained language models (PLMs) have made significant progress on various NLP tasks recently. However, PLMs encounter challenges when it comes to domain-specific tasks such as legal AI. These tasks often involve intricate expertise, expensive data annotation, and limited training data availability. To tackle this problem, we propose a human-oriented artificial–natural parallel system for organized intelligence (HANOI)-Legal based on the parallel learning (PL) framework. First, by regarding the description in PL as the pretraining process based on a large-scale corpus, we setup an artificial system based on a PLM. Second, to adapt the PLM to legal tasks with limited resources, we propose UniPrompt as a prescription. UniPrompt serves as a unified prompt-based training framework, enabling the utilization of diverse open datasets for these tasks. Third, we labeled a few task-specific legal data through distributed autonomous operations (DAO-II) for further fine-tuning. By combining a scalable unified-task-format reformulation and a unified-prompt-based training pipeline, HANOI-Legal leverages PLMs’ linguistic capabilities acquired from a variety of open datasets to generate task-specific models. Our experiments in two legal domain tasks show that HANOI-Legal achieved an excellent performance in low-resource scenarios compared to the state-of-the-art prompt-based approach.
Published in: IEEE Transactions on Computational Social Systems ( Volume: 11, Issue: 2, April 2024)