Abstract
With the continuous progress of society and the improvement of people’s awareness of law, the number of cases accepted by the people’s court of China continues to run at a high level in recent years. In the increasingly severe situation of the contradiction of “many cases but few people”, how to ensure judicial justice and improve the efficiency of the trial has become an urgent problem to be solved. Under the new litigation system, courts organize debates around controversial issues, which are the core of the conflicts between the parties in the case. Similar controversial issues play an important role in identifying similar cases. Similar cases also play an important role in improving judges’ trial efficiency, promoting similar cases to be judged similarly, and ensuring the uniformity of law application. However, in the process of identifying controversial issues, it is not only affected by the uncertainty of law and facts, but also affected by the judge’s discretion and factors outside the case. So it is difficult to format controversial issues and inappropriate to judge the similarity by the consistency of controversial issues. Meanwhile, controversial issues data is subject to power law distribution, and its types are hard to be exhausted, which further aggravates the difficulty of manual annotation. Machine learning is an appropriate method to identify similar groups in the case of huge amount of controversial issues data. In this paper, a semi-supervised short text clustering algorithm is proposed to identify the homogeneous groups in controversial issues. In this algorithm, a graph model is constructed to discover the closely connected term groups, which are used as the clustering topics, and controversial issues are classified according to the topic term groups. In addition, the algorithm incorporates prior knowledge of law to improve the performance of the algorithm. This algorithm can capture semantic similarity in controversial issues, automatically induce the topic term groups of controversial issues’ categories, flexibly adjust the number of categories, and quickly get the clustering result, so as to promote the identification and retrieval of the similar case.
This work was partly supported by National Key R &D Program of China under Grant 2020YFC0832400. This work was partly supported by Key R &D Program of Sichuan Province under Grant 2021YFS0397.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, G.M.: Issues concerning several relations in the design of pretrial preliminary procedure. Polit. Sci. Law Tribure 4(11), 9–15 (2004)
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)
Yang, S., Huang, G., Cai, B.: Discovering topic representative terms for short text clustering. IEEE Access 7, 92037–92047 (2019)
Fang, Y., et al.: Few-shot learning for Chinese legal controversial issues classification. IEEE Access 8, 75022–75034 (2020)
Tian, X., et al.: K-means clustering for controversial issues merging in Chinese legal texts. In: Legal Knowledge and Information Systems, vol. 313 (2018)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Ni, X., et al.: Short text clustering by finding core terms. Knowl. Inf. Syst. 27(3), 345–365 (2011)
Jia, C., et al.: Concept decompositions for short text clustering by identifying word communities. Pattern Recogn. 76, 691–703 (2018)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Yan, X., et al.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, X., Weng, Y., Chen, B. (2023). Topic Term Clustering Based on Semi-supervised Co-occurrence Graph and Its Application in Chinese Judgement Documents. In: Hong, W., Weng, Y. (eds) Computer Science and Education. ICCSE 2022. Communications in Computer and Information Science, vol 1811. Springer, Singapore. https://doi.org/10.1007/978-981-99-2443-1_8
Download citation
DOI: https://doi.org/10.1007/978-981-99-2443-1_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2442-4
Online ISBN: 978-981-99-2443-1
eBook Packages: Computer ScienceComputer Science (R0)