Abstract
This paper aims to develop a Chinese text classification workflow in education situations, where a grade can swing due to subjective cognitive loads. This problem is often observed between the academic paper comments and their grades, leading to a challenge in Chinese texts. To analyze this problem, we in this paper introduce an effective Chinese text classifier by extending the popular seed words-based model into an effective workflow. We first made texts into vectors in the proposed method using Chinese preprocessing. We then exploited the bidirectional encoder representations from Transformers to integrate the contextualization features, then performed a hierarchical attention network for classification. In this study, we collected 4,310 review comment short-texts involving 140 universities in China. As these texts include noisy grades from experts, the proposed method yields seed words for each category, resulting in pseudo labels to weakly supervise the network training instead of the noisy labels. We finally evaluated the designed workflow on the real-world datasets and achieved a good performance in Chinese classification compared with the traditional models. This study provides insights into a real educational text case where a review grade can swing due to subjective cognitive loads and an available workflow to automatically grade these Chinese expert comment texts, facilitating the precise academic evaluation system.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, Y., Sohn, S., Liu, S., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19, 1 (2019)
Yu, M., Jiaming, S., Chao, Z., Jiawei, H.: Weakly-supervised neural text classification. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), pp. 983–992. Association for Computing Machinery, New York, NY, USA (2018)
Zhang, Y., Dai, H., Yun, Y., Liu, S., Lan, A., Shang, X.: Meta-knowledge dictionary learning on 1-bit response data for student knowledge diagnosis. Knowl. Based Syst. 205, 106290 (2020)
Zhang, Y., An, R., Liu, S., Cui, J., Shang, X., 2021. Predicting and understanding student learning performance using multi-source sparse attention convolutional neural networks. IEEE Trans. Big Data 1–1 (2021)
Liu, Q., Shen, S., Huang, Z., Chen, E., Zheng, Y.: A survey of knowledge tracing. arXiv preprint arXiv:2105.15106 (2021)
Yun, Y., Dai, H., Cao, R., Zhang, Y., Shang, X.: Self-paced graph memory network for student GPA prediction and abnormal student detection. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds.) AIED 2021. LNCS (LNAI), vol. 12749, pp. 417–421. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78270-2_74
Dwivedi, P., Kant, V., Bharadwaj, K.K.: Learning path recommendation based on modified variable length genetic algorithm. Educ. Inform. Technol. 23(2), 819–836 (2017). https://doi.org/10.1007/s10639-017-9637-7
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning–based text classification: a comprehensive review. ACM Comput. Surv. 54(3), 1–40 (2021)
Mohamed, D.A.R., Sakre, M.M.: A performance comparison between classification techniques with CRM application. SAI Intell. Syst. Conf. 2015, 112–119 (2015)
Kumar, G.K., Rani, D.M.: Paragraph summarization based on word frequency using NLP techniques. In: AIP Conference Proceedings, vol. 2317, p. 060001 (2021)
Anhar, R., Adji, T.B., Setiawan, N.A.: Question classification on question-answer system using bidirectional-LSTM. In: 2019 5th International Conference on Science and Technology (ICST), pp. 1–5 (2019)
En.wikipedia.org.: Support-vector machine – Wikipedia (2022). https://en.wikipedia.org/wiki/Support-vector-machine. Accessed 10 April 2022
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp. 785–794. Association for Computing Machinery, New York, NY, USA (2016)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018)
Yu, S., et al.: ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation (2021)
Shree, P.: The Journey of Open AI GPT models. Medium (2020). https://medium.com/walmartglobaltech/the-journey-of-open-ai-gpt-models-32d95b7b7fb2. Accessed 10 April 2022
Mass, Y., Roitman, H.: Ad-hoc document retrieval using weak-supervision with BERT and GPT2. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4191–4197. Association for Computational Linguistics (2020)
Zhang, L., Ding, J., Xu, Y., Liu, Y., Zhou, S.: Weakly-supervised Text Classification Based on Keyword Graph (2021). https://doi.org/10.18653/v1/2021.emnlp-main.222
Wikimedia Foundation: Transformer (Machine Learning Model). Wikipedia (2022). Retrieved from 12 April 2022. https://en.wikipedia.org/wiki/Transformer(machine-learning-model)32d95b7b7fb2. Accessed 10 April 2022
Zhang, Y., Zhou, Y., Xiao, M., et al.: Comment text grading for Chinese graduate academic dissertation using attention convolutional neural networks. In: 2021 7th International Conference on Systems and Informatics (ICSAI), pp. 1–6. IEEE (2021)
PyPI: jieba (2022). https://pypi.org/project/jieba/. Accessed 11 April 2022
Welcome to Harvesttext's documentation: Welcome to HarvestText's documentation - HarvestText 0.8.1.6 documentation. (n.d.). Retrieved from 11 April 2022. https://harvesttext.readthedocs.io/en/latest/. Accessed 11 April 2022
Mekala, D., Shang, J.: Contextualized weak supervision for text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 323–333. Association for Computational Linguistics (2020)
Huggingface.co.: ckiplab/bert-base-chinese · Hugging Face (2022). https://huggingface.co/ckiplab/bert-base-chinese. Accessed 11 April 2022
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489. Association for Computational Linguistics, San Diego, California (2016)
Analytics India Magazine: A complete tutorial on masked language modelling using BERT (2022). https://analyticsindiamag.com/a-complete-tutorial-on-masked-language-modelling-using-bert. Accessed 14 April 2022
Agarap, A.F.: Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018)
Diederik, K., Jimmy, B.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Zhang, Y., Xiang, M., Yang, B.: Low-rank preserving embedding. Pattern Recogn. 70, 112–125 (2017)
Zhang, Y., Xiang, M., Yang, B.: Hierarchical sparse coding from a Bayesian perspective. Neurocomputing 272, 279–293 (2018)
Stopwords-Iso.: STOPWORDS-ZH/STOPWORDS-ZH.TXT at master · stopwords-ISO/stopwords-zh. GitHub (2020). Retrieved from 28 March 2022. https://github.com/stopwords-iso/stopwords-zh/blob/master/stopwords-zh.txt. Accessed 11 April 2022
Zhang, Y., Xiang, M., Yang, B.: Low-rank preserving embedding. Pattern Recogn. 70, 112–125 (2017). ISSN 0031-3203. https://doi.org/10.1016/j.patcog.2017.05.003
Zhang Y, et al.: Multi-needle detection in 3D ultrasound images using unsupervised order-graph regularized sparse dictionary learning. IEEE Trans. Med. Imaging 39(7), 2302–2315 (2020). https://doi.org/10.1109/TMI.2020.2968770. Epub 2020 Jan 22. PMID: 31985414; PMCID: PMC7370243
Zhang, Y., Dai, H., Yun, Y., Liu, S., Lan, S., Shang, X.: Meta-knowledge dictionary learning on 1-bit response data for student knowledge diagnosis. Knowl. Based Syst. 205, 106290 (2020). ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2020.106290
Zhang, Y., An, R., Liu, S., Cui, J., Shang, X.: Predicting and understanding student learning performance using multi-source sparse attention convolutional neural networks. IEEE Trans. Big Data. https://doi.org/10.1109/TBDATA.2021.3125204
Liu, S., Zhang, Y., Shang, X., Zhang, Z.: ProTICS reveals prognostic impact of tumor infiltrating immune cells in different molecular subtypes. Brief Bioinform. 22(6), bbab164 (2021). https://doi.org/10.1093/bib/bbab164. PMID: 33963834
Acknowledgement
This study was funded in part by the National Natural Science Foundation of China (U1811262, 61802313, 61772426), the Key Research and Development Program of China (2020AAA0108500), the Reformation Research on Education and Teaching at Northwestern Polytechnical University (2021JGY31), the Higher Research Funding on International Talent cultivation at Northwestern Polytechnical University (GJGZZD202202), Research Topic at The Chinese Society of Academic Degrees and Graduate Education (2020ZA1008).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Khan, M.S.I., Zhou, Y., Xiao, M., Shang, X. (2022). An Effective Chinese Text Classification Method with Contextualized Weak Supervision for Review Autograding. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2022. Lecture Notes in Computer Science(), vol 13395. Springer, Cham. https://doi.org/10.1007/978-3-031-13832-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-13832-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13831-7
Online ISBN: 978-3-031-13832-4
eBook Packages: Computer ScienceComputer Science (R0)