Topic Term Clustering Based on Semi-supervised Co-occurrence Graph and Its Application in Chinese Judgement Documents

Yang, Xin; Weng, Yang; Chen, Baogui

doi:10.1007/978-981-99-2443-1_8

Xin Yang⁷,
Yang Weng⁷ &
Baogui Chen⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1811))

Included in the following conference series:

International Conference on Computer Science and Education

385 Accesses

Abstract

With the continuous progress of society and the improvement of people’s awareness of law, the number of cases accepted by the people’s court of China continues to run at a high level in recent years. In the increasingly severe situation of the contradiction of “many cases but few people”, how to ensure judicial justice and improve the efficiency of the trial has become an urgent problem to be solved. Under the new litigation system, courts organize debates around controversial issues, which are the core of the conflicts between the parties in the case. Similar controversial issues play an important role in identifying similar cases. Similar cases also play an important role in improving judges’ trial efficiency, promoting similar cases to be judged similarly, and ensuring the uniformity of law application. However, in the process of identifying controversial issues, it is not only affected by the uncertainty of law and facts, but also affected by the judge’s discretion and factors outside the case. So it is difficult to format controversial issues and inappropriate to judge the similarity by the consistency of controversial issues. Meanwhile, controversial issues data is subject to power law distribution, and its types are hard to be exhausted, which further aggravates the difficulty of manual annotation. Machine learning is an appropriate method to identify similar groups in the case of huge amount of controversial issues data. In this paper, a semi-supervised short text clustering algorithm is proposed to identify the homogeneous groups in controversial issues. In this algorithm, a graph model is constructed to discover the closely connected term groups, which are used as the clustering topics, and controversial issues are classified according to the topic term groups. In addition, the algorithm incorporates prior knowledge of law to improve the performance of the algorithm. This algorithm can capture semantic similarity in controversial issues, automatically induce the topic term groups of controversial issues’ categories, flexibly adjust the number of categories, and quickly get the clustering result, so as to promote the identification and retrieval of the similar case.

This work was partly supported by National Key R &D Program of China under Grant 2020YFC0832400. This work was partly supported by Key R &D Program of Sichuan Province under Grant 2021YFS0397.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, G.M.: Issues concerning several relations in the design of pretrial preliminary procedure. Polit. Sci. Law Tribure 4(11), 9–15 (2004)
Google Scholar
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)
Article MathSciNet MATH Google Scholar
Yang, S., Huang, G., Cai, B.: Discovering topic representative terms for short text clustering. IEEE Access 7, 92037–92047 (2019)
Article Google Scholar
Fang, Y., et al.: Few-shot learning for Chinese legal controversial issues classification. IEEE Access 8, 75022–75034 (2020)
Article Google Scholar
Tian, X., et al.: K-means clustering for controversial issues merging in Chinese legal texts. In: Legal Knowledge and Information Systems, vol. 313 (2018)
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Ni, X., et al.: Short text clustering by finding core terms. Knowl. Inf. Syst. 27(3), 345–365 (2011)
Article Google Scholar
Jia, C., et al.: Concept decompositions for short text clustering by identifying word communities. Pattern Recogn. 76, 691–703 (2018)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Yan, X., et al.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Mathematics, Sichuan University, Chengdu, China
Xin Yang & Yang Weng
Data Management Department, Information Center of Supreme People’s Court of P.R.C., Beijing, China
Baogui Chen

Authors

Xin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Weng
View author publications
You can also search for this author in PubMed Google Scholar
Baogui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baogui Chen .

Editor information

Editors and Affiliations

Xiamen University, Xiamen, China
Wenxing Hong
Sichuan University, Chengdu, China
Yang Weng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, X., Weng, Y., Chen, B. (2023). Topic Term Clustering Based on Semi-supervised Co-occurrence Graph and Its Application in Chinese Judgement Documents. In: Hong, W., Weng, Y. (eds) Computer Science and Education. ICCSE 2022. Communications in Computer and Information Science, vol 1811. Springer, Singapore. https://doi.org/10.1007/978-981-99-2443-1_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-2443-1_8
Published: 14 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2442-4
Online ISBN: 978-981-99-2443-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Topic Term Clustering Based on Semi-supervised Co-occurrence Graph and Its Application in Chinese Judgement Documents