Abstract:
Information extraction is a key corner-stone in the digitization of office data which requires the conversion of unstructured to structured data. However, in the actual a...Show MoreMetadata
Abstract:
Information extraction is a key corner-stone in the digitization of office data which requires the conversion of unstructured to structured data. However, in the actual application to business cases, there is a big deadlock to adapt common extraction systems to domain-specific documents due to the limitation of preparation of training data. To overcome this issue, we introduce a model, which employs pre-trained language models with a customized CNN layer for domain adaptation. The model is validated on three Japanese domain-specific and two benchmark machine reading comprehension data sets (SQuADs). Experimental results confirm that our model achieves promising results which are applicable for actual business scenarios.
Date of Conference: 18-22 July 2021
Date Added to IEEE Xplore: 21 September 2021
ISBN Information: