Abstract
Character-level convolutional neural networks (CLCNNs) are commonly used to classify textual data. CLCNN is used as a more versatile tool. For natural language recognition, after decomposing a sentence into character units, each unit is converted into a corresponding character code (e.g., Unicode values) and the code is input into the CLCNN network. Thus, sentences can be treated like images. We have previously applied a CLCNN to verify whether a university’s diploma and/or curriculum policies are well written. In this study, we experimentally confirm the effectiveness of CLCNN using tweet data. In particular, we focus on the effect of the number of units on performance using the following two types of data; one is a real and public tweet dataset on the reputation of a cell phone, and the other is the NTCIR-13 MedWeb task, which consists of pseudo-tweet data and is a well-known collection of tests for multi-label problems. Results of experiments conducted by varying the number of units in the all-coupled layer confirmed the agreement of the results with the theorem introduced in the Amari’s book (Amari in Mathematical Science New Development of Information Geometry, For Senior & Graduate Courses. SAIENSU-SHA Co., 2014). Furthermore, in the NTCIR-13 MedWeb task, we analyze two kinds of experiments, the effects of kernel size and weight perturbation. The results of the difference in the kernel size suggest the existence of an optimal kernel size for sentence comprehension. The results of perturbations to the convolutional layer and pooling layer indicate the possibility of relationship between the numbers of degrees of freedom and network parameters.
Similar content being viewed by others
Notes
The three policies are: diploma policy, which concerns graduation certification; curriculum policy, which concerns course contents and their organization; and admission policy, which concerns enrollment acceptance.
This figure is recreated based on Ref. [12].
Although it is possible to increase the percentage of exact matches by also performing dropout after batch normalization, it is not used in this paper to ascertain the effect of perturbations more accurately.
This table is reprinted from Ref. [15].
References
Amari S (2014) Mathematical Science New Development of Information Geometry, For Senior & Graduate Courses. SAIENSU-SHA Co., Ltd ((in Japanese))
Belkin M, Hsu D, Ma S, Mandal S (2019) Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc Natl Acad Sci 116(32):15849–15854
Hastie T, Montanari A, Rosset S, Tibshirani RJ (2019) Surprises in high-dimensional least squares interpolation. arXiv:1903.08560
Keskar NS, Nocedal J, Tang PTP, Mudigere D, Smelyanskiy M (2019) On large-batch training for deep learning: generalization gap and sharp minima. In: International Conference on Learning Representations
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp 1746–1751
Miyazaki K, Ida M (2012) Proposal and evaluation of the active course classification support system with exploitation-oriented learning, the 9th European workshop on reinforcement learning (EWRL-9), Sept. 9, 2011. Athens Royal Olympic Hotel, Lecture Notes in Computer Science 7188:333–344
Miyazaki K, Ida M, Yoshikane F, Nozawa T, Kita H (2005) Development of a course classification support system for the awarding of degrees using syllabus data. IPSJ J 46(3):782–791 (in Japanese)
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Miyazaki K, Takahashi N, Mori R (2019). Research on Consistency between Diploma Policies and Nomenclature of Major Disciplines: Deep Learning Approach, Proc. of 2019 7th International Conference on Information and Education Technology (ICIET2019)
Miyazaki K, Ida M (2020) Construction of Consistency Judgment System of Diploma Policy and Curriculum Policy using Character-level CNN. Electronics and Communications in Japan 102(12):30–39
Miyazaki K (2020). Classification of Medical Data using Character-level CNN, The 3rd International Conference on Information Science and System, pp.43-47
Miyazaki K, Ida M (2021). Evaluation of Character-level CNN using NTCIR-13 MedWeb Task, 2021 Annual Conference on Electronics, Information and Systems Institute of Electrical Engineers of Japan (IEEJ), 6 pages (in Japanese)
Miyazaki K, Ida M (2021). Evaluation of Character-Level CNNs using the NTCIR-13 MedWeb Task, the 22nd International Symposium on Advanced Intelligent Systems (ISIS2021), 6 pages
Miyazaki K, Yamaguchi S, Mori R, Yoshikawa Y, Saito T, Suzuki T (2022). Proposal and evaluation of a course classification support system emphasizing communication with the sub-committees within the Committee of Validation and Examination for Degrees, Preliminary Soft-Proceedings 4th EAI International Conference on Artificial Intelligence for Communications and Networks, pp.122-129
Miyazaki K, Ida M (2023). Effectiveness of Character-level CNN and its Examination of Perturbation for Weights, 28th International Symposium on Artificial Life and Robotics (AROB 28th 2023), 5 pages
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) 2017. Attention Is All You Need, Neural Information Processing Systems (NIPS
Wakamiya S, Morita M, Kano Y, Ohkuma T, Aramaki E (2017). Overview of the NTCIR-13 MedWeb Task, In Proceedings of the 13th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-13), pp. 40-49
http://research.nii.ac.jp/ntcir/permission/ntcir-13/perm-en-MedWeb.html [accessed: 2023-12-21]
Yanaka H, Mineshima K (2022). Compositional Evaluation on Japanese Textual Entailment and Similarity (arXiv, data), Transactions of the Association for Computational Linguistics (TACL), Vol.10, pp.1266-1284
Yang Y, Zhang Y, Tar C, Baldridge J (2019). PAWS-X: A cross- lingual adversarial dataset for paraphrase identification, In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.3687-3692
Zhang X, Zhao J, LeCun Y (2015). Characterlevel Convolutional Networks for Text Classification, arXiv:1509.01626
https://www.db.info.gifu-u.ac.jp/sentiment_analysis/ [accessed: 2023-12-21]
https://retty.me [accessed: 2023-12-21]
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was presented in part at the joint symposium of the 28th International Symposium on Artificial Life and Robotics, the 8th International Symposium on BioComplexity, and the 6th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Beppu, Oita and Online, January 25-27, 2023).
About this article
Cite this article
Miyazaki, K., Ida, M. Performance evaluation of character-level CNNs using tweet data and analysis for weight perturbations. Artif Life Robotics 29, 266–273 (2024). https://doi.org/10.1007/s10015-024-00944-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-024-00944-9