Abstract:
Multimodal information fusion is gaining traction in Chinese Natural Language Processing (CNLP), particularly for phono-semantic compound comprehension and character iden...Show MoreMetadata
Abstract:
Multimodal information fusion is gaining traction in Chinese Natural Language Processing (CNLP), particularly for phono-semantic compound comprehension and character identification. Existing research often overlooks the impact of varying pixel sizes, scales, and stroke counts on character image processing, leading to potential noise. This paper addresses this gap by analyzing our prepared dataset of Chinese characters with varying stroke counts (1-64) at different pixel resolutions (12, 16, 24, 35, 60, 96) and including up to 100 characters per stroke count. We identify a processing threshold for character images based on stroke count and resolution, a first in the field. Using Euclidean near-graphic similarity and ResNet50 image embedding similarity analyses, we establish thresholds such as 12 strokes for 16-pixel images and 26 strokes for 24-pixel images. These findings offer valuable insights for enhancing the robustness of multimodal information fusion for Chinese character recognition in NLP.
Date of Conference: 04-06 August 2024
Date Added to IEEE Xplore: 10 September 2024
ISBN Information: