Abstract:
In the domain of historical Mongolian document image retrieval (HMDIR), word spotting poses a inherent challenge due to the frequent appearance of out-of-vocabulary (OOV)...Show MoreMetadata
Abstract:
In the domain of historical Mongolian document image retrieval (HMDIR), word spotting poses a inherent challenge due to the frequent appearance of out-of-vocabulary (OOV) words. Existing methods have mainly focused on query-by-example (QBE), neglecting the query-by-string (QBS) approach. Meanwhile, the hierarchical structure of word makes Euclidean space not the optimal choice for representing complex structured data. To address the aforementioned problems, we propose a novel method that leverages a shared hyperbolic space to effectively align text strings and word images. Specifically, we use the Pyramidal Histogram of Characters (PHOC) for text string embeding, and a robust encoder-decoder architecture for word image embedding, then map their embeddings in the shared hyperbolic space. Moreover, we propose a new dataset of historical Mongolian documents called Geser, which includes 143,508 word images and 10,951 vocabularies. Extensive experiments conducted on two datasets of historical Mongolian documents with an OOV partitioning scheme (Kanjur and Geser), demonstrate that our proposed method surpasses state-of-the-art methods and achieves outstanding results on Geser.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: