Skip to main content

Topic Term Clustering Based on Semi-supervised Co-occurrence Graph and Its Application in Chinese Judgement Documents

  • Conference paper
  • First Online:
Computer Science and Education (ICCSE 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1811))

Included in the following conference series:

  • 385 Accesses

Abstract

With the continuous progress of society and the improvement of people’s awareness of law, the number of cases accepted by the people’s court of China continues to run at a high level in recent years. In the increasingly severe situation of the contradiction of “many cases but few people”, how to ensure judicial justice and improve the efficiency of the trial has become an urgent problem to be solved. Under the new litigation system, courts organize debates around controversial issues, which are the core of the conflicts between the parties in the case. Similar controversial issues play an important role in identifying similar cases. Similar cases also play an important role in improving judges’ trial efficiency, promoting similar cases to be judged similarly, and ensuring the uniformity of law application. However, in the process of identifying controversial issues, it is not only affected by the uncertainty of law and facts, but also affected by the judge’s discretion and factors outside the case. So it is difficult to format controversial issues and inappropriate to judge the similarity by the consistency of controversial issues. Meanwhile, controversial issues data is subject to power law distribution, and its types are hard to be exhausted, which further aggravates the difficulty of manual annotation. Machine learning is an appropriate method to identify similar groups in the case of huge amount of controversial issues data. In this paper, a semi-supervised short text clustering algorithm is proposed to identify the homogeneous groups in controversial issues. In this algorithm, a graph model is constructed to discover the closely connected term groups, which are used as the clustering topics, and controversial issues are classified according to the topic term groups. In addition, the algorithm incorporates prior knowledge of law to improve the performance of the algorithm. This algorithm can capture semantic similarity in controversial issues, automatically induce the topic term groups of controversial issues’ categories, flexibly adjust the number of categories, and quickly get the clustering result, so as to promote the identification and retrieval of the similar case.

This work was partly supported by National Key R &D Program of China under Grant 2020YFC0832400. This work was partly supported by Key R &D Program of Sichuan Province under Grant 2021YFS0397.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, G.M.: Issues concerning several relations in the design of pretrial preliminary procedure. Polit. Sci. Law Tribure 4(11), 9–15 (2004)

    Google Scholar 

  2. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Yang, S., Huang, G., Cai, B.: Discovering topic representative terms for short text clustering. IEEE Access 7, 92037–92047 (2019)

    Article  Google Scholar 

  4. Fang, Y., et al.: Few-shot learning for Chinese legal controversial issues classification. IEEE Access 8, 75022–75034 (2020)

    Article  Google Scholar 

  5. Tian, X., et al.: K-means clustering for controversial issues merging in Chinese legal texts. In: Legal Knowledge and Information Systems, vol. 313 (2018)

    Google Scholar 

  6. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  7. Ni, X., et al.: Short text clustering by finding core terms. Knowl. Inf. Syst. 27(3), 345–365 (2011)

    Article  Google Scholar 

  8. Jia, C., et al.: Concept decompositions for short text clustering by identifying word communities. Pattern Recogn. 76, 691–703 (2018)

    Article  Google Scholar 

  9. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  10. Yan, X., et al.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)

    Google Scholar 

  11. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baogui Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, X., Weng, Y., Chen, B. (2023). Topic Term Clustering Based on Semi-supervised Co-occurrence Graph and Its Application in Chinese Judgement Documents. In: Hong, W., Weng, Y. (eds) Computer Science and Education. ICCSE 2022. Communications in Computer and Information Science, vol 1811. Springer, Singapore. https://doi.org/10.1007/978-981-99-2443-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-2443-1_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-2442-4

  • Online ISBN: 978-981-99-2443-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics