Abstract
The revolution in hardware technology has made it possible to obtain high-definition data through highly sophisticated algorithms. Deep learning has emerged and is widely used in various fields, and the judicial area is no exception. As the carrier of the litigation activities, the judgment documents record the process and results of the people’s courts, and their quality directly affects the fairness and credibility of the law. To be able to measure the quality of judgment documents, the interpretability of judgment documents has been an indispensable dimension. Unfortunately, due to the various uncontrollable factors during the process, such as data transmission and storage, The data set for training usually has a poor quality. Besides, due to the severe imbalance of the distribution of case data, data augmentation is essential to generate data for low-frequency cases. Based on the existing data set and the application scenarios, we explore data quality issues in four areas. Then we systematically investigate them to figure out their impact on the data set. After that, we compare the four dimensions to find out which one has the most considerable damage to the data set.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sidi, F., Panahy, P.H.S., Affendey, L.S., Jabar, M.A., Ibrahim, H., Mustapha, A.: Data quality: a survey of data quality dimensions. In: 2012 International Conference on Information Retrieval & Knowledge Management, pp. 300–304. IEEE (2012)
Kiefer, C.: Assessing the quality of unstructured data: an initial overview. In: LWDA, pp. 62–73 (2016)
Firmani, D., Mecella, M., Scannapieco, M., Batini, C.: On the meaningfulness of “big data quality”. Data Sci. Eng. 1(1), 6–20 (2016)
Batini, C., Scannapieco, M., et al.: Data and Information Quality. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7
Kiefer, C.: Quality indicators for text data. BTW 2019-Workshopband (2019)
Gupta, A., et al.: Toward building a legal knowledge-base of Chinese judicial documents for large-scale analytics. Legal knowledge and information systems (2017)
Casati, F., Shan, M.C., Sayal, M.: Investigating business processes. US Patent 7,610,211, 27 Oct 2009
Sadiq, S., Indulska, M.: Open data: quality over quantity. Int. J. Inf. Manag. 37(3), 150–154 (2017)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. Computation and Language (2016)
Cuayahuitl, H., Renals, S., Lemon, O., Shimodaira, H.: Human-computer dialogue simulation using hidden Markov models, pp. 290–295 (2005)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: North American Chapter of the Association for Computational Linguistics, pp. 260–270 (2016)
Simon, L., Webster, R., Rabin, J.: Revisiting precision and recall definition for generative model evaluation. Learning (2019)
Wasikowski, M., Chen, X.W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)
Batini, C., Palmonari, M., Viscusi, G.: The many faces of information and their impact on information quality. In: AISB/IACAP World Congress 2012-Information Quality, pp. 212–228 (2012)
Acknowledgment
The work is supported in part by the National Key Research and Development Program of China (2016YFC0800805) and the National Natural Science Foundation of China (61832009, 61932012).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, J., Wang, D., Wang, Z., Chen, Z. (2020). Data Quality for Deep Learning of Judgment Documents: An Empirical Study. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Communications in Computer and Information Science, vol 1157. Springer, Singapore. https://doi.org/10.1007/978-981-15-3412-6_5
Download citation
DOI: https://doi.org/10.1007/978-981-15-3412-6_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3411-9
Online ISBN: 978-981-15-3412-6
eBook Packages: Computer ScienceComputer Science (R0)