loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Ingo Glaser 1 ; Shabnam Sadegharmaki 1 ; Basil Komboz 2 and Florian Matthes 1

Affiliations: 1 Chair of Software Engineering for Business Information Systems, Technical University of Munich, Boltzmannstrasse 3, 85748 Garching bei München, Germany ; 2 Allianz SE, Munich, Germany

Keyword(s): Data Scarcity, Natural Language Processing, Text Classification, Legal Text Analytics.

Abstract: Legal document analysis is an important research area. The classification of clauses or sentences enables valuable insights such as the extraction of rights and obligations. However, datasets consisting of contracts or other legal documents are quite rare, particularly regarding the German language. The exorbitant cost of manually labeled data, especially in regard to text classification, is the motivation of many studies that suggest alternative methods to overcome the lack of labeled data. This paper experiments the effects of text data augmentation on the quality of classification tasks. While a large amount of techniques exists, this work examines a selected subset including semi-supervised learning methods and thesaurus-based data augmentation. We could not just show that thesaurus-based data augmentation as well as text augmentation with synonyms and hypernyms can improve the classification results, but also that the effect of such methods depends on the underlying data structu re. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.145.42.94

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Glaser, I.; Sadegharmaki, S.; Komboz, B. and Matthes, F. (2021). Data Scarcity: Methods to Improve the Quality of Text Classification. In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-486-2; ISSN 2184-4313, SciTePress, pages 556-564. DOI: 10.5220/0010268005560564

@conference{icpram21,
author={Ingo Glaser. and Shabnam Sadegharmaki. and Basil Komboz. and Florian Matthes.},
title={Data Scarcity: Methods to Improve the Quality of Text Classification},
booktitle={Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2021},
pages={556-564},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010268005560564},
isbn={978-989-758-486-2},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Data Scarcity: Methods to Improve the Quality of Text Classification
SN - 978-989-758-486-2
IS - 2184-4313
AU - Glaser, I.
AU - Sadegharmaki, S.
AU - Komboz, B.
AU - Matthes, F.
PY - 2021
SP - 556
EP - 564
DO - 10.5220/0010268005560564
PB - SciTePress