Skip to main content
Log in

ETIP: a lengthy nested NER problem for Chinese insurance policy analysis

  • Industrial and commercial application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Contract analysis can significantly ease the work for humans using AI techniques. This paper shows a lengthy nested NER problem of element tagging on insurance policy (ETIP). Compared to NER, ETIP deals with not only different types of entities which vary from a short phrase to a long sentence, but also phrase or clause entities that could be nested. We present a novel hybrid framework of deep learning and heuristic filtering method to recognize the lengthy nested elements. First, a convolutional neural network is constructed to obtain good initial candidates of sliding windows with high softmax probability. Then, the concatenation operator on adjacent candidate segments is introduced to create phrase, clause, or sentence candidates. We design an effective voting strategy to resolve the classification conflict of the concatenated candidates and present a theoretical proof of F1-score optimization. In experiments, we have collected a large Chinese insurance contract dataset to test the performance of the proposed method. An extensive set of experiments is performed to investigate how sliding window candidates can work effectively in our filtering and voting strategy. The optimal parameters are determined by statistical analysis of the experimental data. The results show the promising performance of our method in the ETIP problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Azzopardi S, Gatt A, Pace GJ (2016) Integrating natural language and formal analysis for legal documents. In: 10th conference on language technologies and digital humanities, vol 2016

  2. Baidu (2018) Baidu encyclopedia. https://github.com/Embedding/Chinese-Word-Vectors. Accessed 24 Apr 2020

  3. Chalkidis I, Androutsopoulos I, Michos A (2017) Extracting contract elements. In: Proceedings of the 16th international conference on artificial intelligence and law, pp 19–28

  4. Cohen W, McCallum A (2004) Information extraction and integration: an overview. In: SIGKDD conference

  5. Cortez E, Da Silva AS (2013) Unsupervised information extraction by text segmentation. Springer, Berlin

    Book  Google Scholar 

  6. Curtotti M, Mccreath E (2010) Corpus based classification of text in australian contracts. Soc Sci Electron Publ 687(1):406–424

    Google Scholar 

  7. Doddington GR, Mitchell A, Przybocki MA, Ramshaw LA, Strassel S, Weischedel RM (2004) The automatic content extraction (ace) program-tasks, data, and evaluation. In: LREC, vol 2, p 1

  8. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  9. Finkel JR, Manning CD (2009) Nested named entity recognition. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 141–150

  10. Freitag D (1998) Information extraction from html: application of a general machine learning approach. In: AAAI/IAAI, pp 517–523

  11. García-Constantino M, Atkinson K, Bollegala D, Chapman K, Coenen F, Roberts C, Robson K (2017) Cliel: context-based information extraction from commercial law documents. In: Proceedings of the 16th edition of the international conference on articial intelligence and law, ACM, pp 79–87

  12. Hasan I, Parapar J, Blanco R (2008) Segmentation of legislative documents using a domain-specific lexicon. In: 19th international workshop on database and expert systems application, 2008. DEXA’08. IEEE, pp 665–669

  13. Hu M, Li Z, Shen Y, Liu A, Liu G, Zheng K, Zhao L (2017) Cnn-iets: a cnn-based probabilistic approach for information extraction by text segmentation. In: Proceedings of the 2017 ACM on conference on information and knowledge management, ACM, pp 1159–1168

  14. Indukuri KV, Krishna PR (2010) Mining e-contract documents to classify clauses. In: Proceedings of the third annual ACM Bangalore conference, ACM, p 7

  15. Jieba (2017) https://github.com/fxsjy/jieba. Accessed 28 May 2018

  16. Ju M, Miwa M, Ananiadou S (2018) A neural layered model for nested named entity recognition. In: Proceedings of the 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies (long papers), vol 1, pp 1446–1459

  17. Katiyar A, Cardie C (2018) Nested named entity recognition revisited. In: Proceedings of the 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies (long papers), vol 1, pp 861–871

  18. Kim JD, Ohta T, Tateisi Y, Tsujii J (2003) Genia corpus: a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl\_1):i180–i182

  19. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:14085882

  20. Loza Mencía E (2009) Segmentation of legal documents. In: Proceedings of the 12th international conference on artificial intelligence and law. ACM, pp 88–97

  21. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781

  22. Moens MF, Uyttendaele C, Dumortier J (2000) Intelligent information extraction from legal texts. Inf Commun Technol Law 9(1):17–26

    Article  Google Scholar 

  23. Muis AO, Lu W (2017) Labeling gaps between words: recognizing overlapping mentions with mention separators. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2608–2618

  24. Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvis Investig 30(1):3–26

    Article  Google Scholar 

  25. NLPIR (2018). https://github.com/NLPIR-team/NLPIR. Accessed 28 May 2018

  26. People’s_Daily (2018) News data from people’s daily. https://github.com/Embedding/Chinese-Word-Vectors. Accessed 24 Apr 2020

  27. Piskorski J, Yangarber R (2013) Information extraction: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, Berlin, Heidelberg, pp 23–49

    Chapter  Google Scholar 

  28. Ritter A, Clark S, Etzioni O, et al. (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1524–1534

  29. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  30. Wikipedia (2018) Chinese wikipedia. https://github.com/Embedding/Chinese-Word-Vectors. Accessed 24 Apr 2020

  31. Zhang H, Li J, Ji Y, Yue H (2016) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inform 13(2):616–624

    Article  Google Scholar 

Download references

Acknowledgements

Kai Zhang and Yuxuan Sun contributed equally to this work. This work was supported by the National Natural Science Foundation of China (No. 61902346) and National Innovation and Entrepreneurship Training Program for College Students (No. 201913021001, 201913021002).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lin Sun or Jianwei Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Zhang, K., Sun, Y. et al. ETIP: a lengthy nested NER problem for Chinese insurance policy analysis. Pattern Anal Applic 23, 1755–1765 (2020). https://doi.org/10.1007/s10044-020-00885-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00885-6

Keywords