ETIP: a lengthy nested NER problem for Chinese insurance policy analysis

Sun, Lin; Zhang, Kai; Sun, Yuxuan; Weng, Fangsheng; Zhang, Jianwei

doi:10.1007/s10044-020-00885-6

ETIP: a lengthy nested NER problem for Chinese insurance policy analysis

Industrial and commercial application
Published: 27 April 2020

Volume 23, pages 1755–1765, (2020)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Lin Sun ORCID: orcid.org/0000-0003-2923-3281¹,
Kai Zhang¹,
Yuxuan Sun¹,
Fangsheng Weng¹ &
…
Jianwei Zhang¹

323 Accesses
Explore all metrics

Abstract

Contract analysis can significantly ease the work for humans using AI techniques. This paper shows a lengthy nested NER problem of element tagging on insurance policy (ETIP). Compared to NER, ETIP deals with not only different types of entities which vary from a short phrase to a long sentence, but also phrase or clause entities that could be nested. We present a novel hybrid framework of deep learning and heuristic filtering method to recognize the lengthy nested elements. First, a convolutional neural network is constructed to obtain good initial candidates of sliding windows with high softmax probability. Then, the concatenation operator on adjacent candidate segments is introduced to create phrase, clause, or sentence candidates. We design an effective voting strategy to resolve the classification conflict of the concatenated candidates and present a theoretical proof of F1-score optimization. In experiments, we have collected a large Chinese insurance contract dataset to test the performance of the proposed method. An extensive set of experiments is performed to investigate how sliding window candidates can work effectively in our filtering and voting strategy. The optimal parameters are determined by statistical analysis of the experimental data. The results show the promising performance of our method in the ETIP problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-based automatic analysis of legal contracts: a named entity recognition benchmark

Article 07 May 2024

A multi-task network approach for calculating discrimination-free insurance prices

Article Open access 08 November 2023

Contract Clause Extraction Using Question- Answering Task

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Azzopardi S, Gatt A, Pace GJ (2016) Integrating natural language and formal analysis for legal documents. In: 10th conference on language technologies and digital humanities, vol 2016
Baidu (2018) Baidu encyclopedia. https://github.com/Embedding/Chinese-Word-Vectors. Accessed 24 Apr 2020
Chalkidis I, Androutsopoulos I, Michos A (2017) Extracting contract elements. In: Proceedings of the 16th international conference on artificial intelligence and law, pp 19–28
Cohen W, McCallum A (2004) Information extraction and integration: an overview. In: SIGKDD conference
Cortez E, Da Silva AS (2013) Unsupervised information extraction by text segmentation. Springer, Berlin
Book Google Scholar
Curtotti M, Mccreath E (2010) Corpus based classification of text in australian contracts. Soc Sci Electron Publ 687(1):406–424
Google Scholar
Doddington GR, Mitchell A, Przybocki MA, Ramshaw LA, Strassel S, Weischedel RM (2004) The automatic content extraction (ace) program-tasks, data, and evaluation. In: LREC, vol 2, p 1
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Finkel JR, Manning CD (2009) Nested named entity recognition. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 141–150
Freitag D (1998) Information extraction from html: application of a general machine learning approach. In: AAAI/IAAI, pp 517–523
García-Constantino M, Atkinson K, Bollegala D, Chapman K, Coenen F, Roberts C, Robson K (2017) Cliel: context-based information extraction from commercial law documents. In: Proceedings of the 16th edition of the international conference on articial intelligence and law, ACM, pp 79–87
Hasan I, Parapar J, Blanco R (2008) Segmentation of legislative documents using a domain-specific lexicon. In: 19th international workshop on database and expert systems application, 2008. DEXA’08. IEEE, pp 665–669
Hu M, Li Z, Shen Y, Liu A, Liu G, Zheng K, Zhao L (2017) Cnn-iets: a cnn-based probabilistic approach for information extraction by text segmentation. In: Proceedings of the 2017 ACM on conference on information and knowledge management, ACM, pp 1159–1168
Indukuri KV, Krishna PR (2010) Mining e-contract documents to classify clauses. In: Proceedings of the third annual ACM Bangalore conference, ACM, p 7
Jieba (2017) https://github.com/fxsjy/jieba. Accessed 28 May 2018
Ju M, Miwa M, Ananiadou S (2018) A neural layered model for nested named entity recognition. In: Proceedings of the 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies (long papers), vol 1, pp 1446–1459
Katiyar A, Cardie C (2018) Nested named entity recognition revisited. In: Proceedings of the 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies (long papers), vol 1, pp 861–871
Kim JD, Ohta T, Tateisi Y, Tsujii J (2003) Genia corpus: a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl\_1):i180–i182
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:14085882
Loza Mencía E (2009) Segmentation of legal documents. In: Proceedings of the 12th international conference on artificial intelligence and law. ACM, pp 88–97
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781
Moens MF, Uyttendaele C, Dumortier J (2000) Intelligent information extraction from legal texts. Inf Commun Technol Law 9(1):17–26
Article Google Scholar
Muis AO, Lu W (2017) Labeling gaps between words: recognizing overlapping mentions with mention separators. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2608–2618
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvis Investig 30(1):3–26
Article Google Scholar
NLPIR (2018). https://github.com/NLPIR-team/NLPIR. Accessed 28 May 2018
People’s_Daily (2018) News data from people’s daily. https://github.com/Embedding/Chinese-Word-Vectors. Accessed 24 Apr 2020
Piskorski J, Yangarber R (2013) Information extraction: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, Berlin, Heidelberg, pp 23–49
Chapter Google Scholar
Ritter A, Clark S, Etzioni O, et al. (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1524–1534
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wikipedia (2018) Chinese wikipedia. https://github.com/Embedding/Chinese-Word-Vectors. Accessed 24 Apr 2020
Zhang H, Li J, Ji Y, Yue H (2016) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inform 13(2):616–624
Article Google Scholar

Download references

Acknowledgements

Kai Zhang and Yuxuan Sun contributed equally to this work. This work was supported by the National Natural Science Foundation of China (No. 61902346) and National Innovation and Entrepreneurship Training Program for College Students (No. 201913021001, 201913021002).

Author information

Authors and Affiliations

Department of Computer Science, Zhejiang University City College, 51 HuZhou Street, Hangzhou, China
Lin Sun, Kai Zhang, Yuxuan Sun, Fangsheng Weng & Jianwei Zhang

Authors

Lin Sun
View author publications
You can also search for this author inPubMed Google Scholar
Kai Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yuxuan Sun
View author publications
You can also search for this author inPubMed Google Scholar
Fangsheng Weng
View author publications
You can also search for this author inPubMed Google Scholar
Jianwei Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Lin Sun or Jianwei Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, L., Zhang, K., Sun, Y. et al. ETIP: a lengthy nested NER problem for Chinese insurance policy analysis. Pattern Anal Applic 23, 1755–1765 (2020). https://doi.org/10.1007/s10044-020-00885-6

Download citation

Received: 18 April 2019
Accepted: 31 March 2020
Published: 27 April 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10044-020-00885-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ETIP: a lengthy nested NER problem for Chinese insurance policy analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning-based automatic analysis of legal contracts: a named entity recognition benchmark

A multi-task network approach for calculating discrimination-free insurance prices

Contract Clause Extraction Using Question- Answering Task

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now