Automatic extraction of associated fact elements from civil cases based on a deep contextualized embeddings approach: KGCEE

Dong, Hongsong; Yang, Fengbao; Wang, Xiaoxia; Sun, Yufeng

doi:10.1007/s00500-021-05971-3

Automatic extraction of associated fact elements from civil cases based on a deep contextualized embeddings approach: KGCEE

Application of soft computing
Published: 29 June 2021

Volume 25, pages 11817–11836, (2021)
Cite this article

Soft Computing Aims and scope Submit manuscript

Hongsong Dong^1,2,
Fengbao Yang ORCID: orcid.org/0000-0002-9087-5796¹,
Xiaoxia Wang¹ &
…
Yufeng Sun¹

312 Accesses
Explore all metrics

Abstract

Automatic factor extraction is to extract the relevant facts from the case to assist the judge in the intelligent decision-making of civil disputes. Previously, the existing methods mainly focus on context-free word embeddings to deal with extraction tasks in the field of law, which cannot get a better semantic understanding of the text and in turn leads to an adverse extraction performance. Therefore, in this paper, a deep contextualized embeddings-based method called the knowledge-guided civil case fact elements extraction (KGCEE) model to automatically extract civil fact elements in the civil case domain is proposed. This approach is mainly based on the RoBERTa, but a few techniques make a more powerful model. Firstly, the model is retrained with civil domain data to provide more sensitive weight to initialize the model parameters in the downstream task. Secondly, the extraction is transformed into a sentence pairs task and we have incorporated data by leveraging label information to improve the generalization ability of the model. Thirdly, at the beginning of the KGCEE, we propose to inject part-of-speech information to the word embeddings to enhance the ability to capture the semantic and syntactic information, which aims to obtain better text representations. Finally, the KGCEE method is evaluated under civil domain data such as marriage and family, labor disputes and loan contracts originally from Chinese AI and Law (CAIL). The experimental results demonstrate that our KGCEE method outperforms other context-free word embeddings methods and other traditional transformer-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weakly-Supervised Relation Extraction in Legal Knowledge Bases

Effective and scalable legal judgment recommendation using pre-learned word embedding

Article Open access 17 February 2022

Transformer-Based Architecture for Judgment Prediction and Explanation in Legal Proceedings

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

All data included in this study are available upon request.

References

Alkhodair S-A, Ding S-H, Fung B, Liu J (2020) Detecting breaking news rumors of emerging topics in social media. Inf Process Manag. https://doi.org/10.1016/j.ipm.2019.02.016
Article Google Scholar
Bartolini R, Lenci A, Montemagni S, Pirrelli V, Soria C (2004) Semantic mark-up of Italian legal texts through NLP-based techniques. In: Proceedings of the 4th international conference on language resources and evaluation, pp 795–798
Burdisso S-G, Errecalde M, Montes-Y-Gomez M (2019) A text classification framework for simple and effective early depression detection over social media streams. Expert Syst Appl 133:182–197
Article Google Scholar
Chen H, Luo X (2019) An automatic literature knowledge graph and reasoning network modeling framework based on ontology and natural language processing. Adv Eng Inform 42:100959. https://doi.org/10.1016/j.aei.2019.100959
Article Google Scholar
Chen L, Lee C, Chen M (2020a) Exploration of social media for sentiment analysis using deep learning. Soft Comput 24(11):8187–8197. https://doi.org/10.1007/s00500-019-04402-8
Article Google Scholar
Chen F, Yuan Z, Huang Y (2020b) Multi-source data fusion for aspect-level sentiment classification. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2019.07.002
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers). Minneapolis, Minnesota: Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423
Dong H, Yang F, Wang X (2020) Multi-label charge predictions leveraging label co-occurrence in imbalanced data scenario. Soft Comput. https://doi.org/10.1007/s00500-020-05029-w
Article Google Scholar
Du Y, Pei B, Zhao X, Ji J (2020) Deep scaled dot-product attention based domain adaptation model for biomedical question answering. Methods 173:69–74. https://doi.org/10.1016/j.ymeth.2019.06.024
Article Google Scholar
Ekinci E, Omurca S-I (2020) Concept-LDA: incorporating Babelfy into LDA for aspect extraction. J Inf Sci 46(3):406–418. https://doi.org/10.1177/0165551519845854
Article Google Scholar
Elnagar A, Al-Debsi R, Einea O (2020) Arabic text classification using deep learning models. Inf Process Manag. https://doi.org/10.1016/j.ipm.2019.102121
Article Google Scholar
Fan Z, Li G, Liu Y (2020) Processes and methods of information fusion for ranking products based on online reviews: an overview. Inf Fusion 60:87–97. https://doi.org/10.1016/j.inffus.2020.02.007
Article Google Scholar
Fang W, Luo H, Xu S, Love P, Lu Z, Ye C (2020) Automated text classification of near-misses from safety reports: an improved deep learning approach. Adv Eng Inf. https://doi.org/10.1016/j.aei.2020.101060
Article Google Scholar
Gargiulo F, Silvestri S, Ciampi M, De Pietro G (2019) Deep neural network for hierarchical extreme multi-label text classification. Appl Soft Comput 79:125–138. https://doi.org/10.1016/j.asoc.2019.03.041
Article Google Scholar
Gonzalez JA, Hurtado LF, Pla F (2020) Transformer based contextualization of pre-trained word embeddings for irony detection in Twitter. Inf Process Manag. https://doi.org/10.1016/j.ipm.2020.102262
Article Google Scholar
Greff K, Srivastava K-J, Steunebrink B, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
Article MathSciNet Google Scholar
Guo B, Zhang C, Liu J, Ma X (2019) Improving text classification with weighted word embeddings via a multi-channel TextCNN model. Neurocomputing 363:366–374. https://doi.org/10.1016/j.neucom.2019.07.052
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE Computer Society, pp 770–778
He J, Zhao L, Yang H, Zhang M, Li W (2020) HSI-BERT: hyperspectral image classification using the bidirectional encoder representation from transformers. IEEE Trans Geosci Remote Sens 58(1):165–178. https://doi.org/10.1109/TGRS.2019.2934760
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1, Long Papers. Melbourne, Australia: Association for Computational Linguistics, pp 328–339. https://doi.org/10.18653/v1/P18-1031
Hu Z, Li X, Tu C, Liu Z, Sun M (2018) Few-shot charge prediction with discriminative legal attributes. In: Proceedings of the 27th international conference on computational linguistics. Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp 487–498. https://www.aclweb.org/anthology/C18-1041
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, Lille, France, PMLR 37, pp 448–456
Kao A, Poteet S (2007) Natural language processing and text mining. ACM Sigkdd Explor Newslett 7(1):115
MATH Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, pp 1746–1751. https://doi.org/10.3115/v1/D14-1181
Kim S, Park H, Lee J (2020) Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: a study on blockchain technology trend analysis. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113401
Article Google Scholar
Lai S, Xu L, Liu K et al (2015) Recurrent convolutional neural networks for text classification. Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 2267–2273
Li J, Zhang G, Yan H, Yu L, Meng T (2018) A Markov logic networks based method to predict judicial decisions of divorce cases. In: 2018 IEEE international conference on smart cloud (SmartCloud), New York, NY, pp 129–132. https://doi.org/10.1109/SmartCloud.2018.00029
Li J, Zhang G, Yu L, Meng T (2019a) Research and design on cognitive computing framework for predicting judicial decisions. J Sig Process Syst 91(10):1159–1167
Article Google Scholar
Li C, Sheng Y, Ge J, Luo B (2019) Apply event extraction techniques to the judicial field. In: The 2019 ACM international joint conference on pervasive and ubiquitous computing and the 2019 ACM international symposium, pp 492–497
Li X, Zhang H, Zhou X (2020) Chinese clinical named entity recognition with variant neural structures based on bert methods. J Biomed Inf. https://doi.org/10.1016/j.jbi.2020.103422
Article Google Scholar
Lin W, Kuo T, Chang T, Yen C, Chen C, Lin C (2012) Exploiting machine learning models for Chinese legal documents labeling, case classification, and sentencing prediction. In: Proceedings of the 24th conference on computational linguistics and speech processing (ROCLING 2012). Chung-Li, Taiwan: The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), pp 140–141. https://www.aclweb.org/anthology/O12-1013
Liu G, Guo J (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338. https://doi.org/10.1016/j.neucom.2019.01.078
Article Google Scholar
Liu Y, Chen Y, Ho W (2015) Predicting associated statutes for legal problems. Inf Process Manag 51(1):194–211. https://doi.org/10.1016/j.ipm.2014.07.003
Article Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Stoyanov V (2019) Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692
Liu Y, Jin X, Shen H (2019b) Towards early identification of online rumors based on long short-term memory networks. Inf Process Manag 56(4):1457–1467. https://doi.org/10.1016/j.ipm.2018.11.003
Article Google Scholar
Luo B, Feng Y, Xu J, Zhang X, Zhao D (2017) Learning to predict charges for criminal cases with legal basis. In: Proceedings of the 2017 conference on empirical methods in natural language processing. Copenhagen, Denmark: Association for Computational Linguistics, pp 2727–2736. https://doi.org/10.18653/v1/d17-1289
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, abs/1301.3781
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, Curran Associates Inc, pp 3111–3119
Moradi M, Dorffner G, Samwald M (2020) Deep contextualized embeddings for quantifying the informative content in biomedical text summarization. Comput Methods Programs Biomed 184:105117. https://doi.org/10.1016/j.cmpb.2019.105117
Article Google Scholar
Peters M, Neumann M, Iyyer M, Gardner M, Zettlemoyer L (2018) Deep contextualized word representations. In: Conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics, pp 2227–2237. https://doi.org/10.18653/v1/N18-1202
Rezaeinia SM, Rahmani R, Ghodsi A, Veisi H (2019) Sentiment analysis based on improved pre-trained word embeddings. Expert Syst Appl 117:139–147. https://doi.org/10.1016/j.eswa.2018.08.044
Article Google Scholar
Schilder F, Graham K, James P (2005) Event extraction and temporal reasoning in legal documents. In: Proceedings of the 2005 international conference on Annotating, extracting and reasoning about time and events, pp 59–71
Sinoara R-A, Camacho-Collados J, Rossi R-G, Navigli R, Rezende S-O (2019) Knowledge-enhanced document embeddings for text classification. Knowl-Based Syst 163:955–971. https://doi.org/10.1016/j.knosys.2018.10.026
Article Google Scholar
Sun C, Yang Z, Wang L, Zhang Y, Wang J (2020) Attention guided capsule networks for chemical-protein interaction extraction. J Biomed Inf. https://doi.org/10.1016/j.jbi.2020.103392
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. ACM, pp 6000–6010
Xia H, Yang Y, Pan X, Zhang Z, An W (2020) Sentiment analysis for online reviews using conditional random fields and support vector machines. Electron Commer Res 20(2):343–360. https://doi.org/10.1007/s10660-019-09354-7
Article Google Scholar
Yan Y, Zheng D, Lu Z, Song S (2017) Event identification as a decision process with non-linear representation of text. arXiv:1710.00969
Zablith F, Osman I-H (2019) ReviewModus: text classification and sentiment prediction of unstructured reviews using a hybrid combination of machine learning and evaluation models. Appl Math Model 71:569–583. https://doi.org/10.1016/j.apm.2019.02.032
Article MATH Google Scholar
Zhang F, Fleyeh H, Wang X, Lu M (2019a) Construction site accident analysis using text mining and natural language processing techniques. Autom Constr 99:238–248. https://doi.org/10.1016/j.autcon.2018.12.016
Article Google Scholar
Zhang X, Zhang Y, Zhang Q, Ren Y, Qiu T, Ma J, Sun Q (2019b) Extracting comprehensive clinical information for breast cancer using deep learning methods. Int J Med Inf. https://doi.org/10.1016/j.ijmedinf.2019.103985
Article Google Scholar
Zhao F, Li P, Li Y, Hou J, Li Y (2019) Semi-supervised convolutional neural network for law advice online. Appl Sci Basel 9(17):3617. https://doi.org/10.3390/app9173617
Article Google Scholar
Zhong H, Guo Z, Tu C, Xiao C, Liu Z, Sun M (2018) Legal judgment prediction via topological learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium: Association for Computational Linguistics, pp 3540–3549. https://doi.org/10.18653/v1/D18-1390

Download references

Acknowledgements

The authors are grateful to the editors and the anonymous reviewers for their insightful comments and suggestions, which have improved the quality of the paper immensely. This work is supported by National Key R & D Program of China, under Grant Nos.2018YFC0830800.

Author information

Authors and Affiliations

School of Information and Communication Engineering, North University of China, Taiyuan, China
Hongsong Dong, Fengbao Yang, Xiaoxia Wang & Yufeng Sun
Department of Computer Science, Luliang University, Luliang, China
Hongsong Dong

Authors

Hongsong Dong
View author publications
You can also search for this author in PubMed Google Scholar
Fengbao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengbao Yang.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, H., Yang, F., Wang, X. et al. Automatic extraction of associated fact elements from civil cases based on a deep contextualized embeddings approach: KGCEE. Soft Comput 25, 11817–11836 (2021). https://doi.org/10.1007/s00500-021-05971-3

Download citation

Accepted: 10 June 2021
Published: 29 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00500-021-05971-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic extraction of associated fact elements from civil cases based on a deep contextualized embeddings approach: KGCEE

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Weakly-Supervised Relation Extraction in Legal Knowledge Bases

Effective and scalable legal judgment recommendation using pre-learned word embedding

Transformer-Based Architecture for Judgment Prediction and Explanation in Legal Proceedings

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now