Loading [a11y]/accessibility-menu.js
HENet: Hyperbolic-Based Encoder-Decoder Network for Word Spotting in Historical Mongolian Documents | IEEE Conference Publication | IEEE Xplore

HENet: Hyperbolic-Based Encoder-Decoder Network for Word Spotting in Historical Mongolian Documents


Abstract:

In the domain of historical Mongolian document image retrieval (HMDIR), word spotting poses a inherent challenge due to the frequent appearance of out-of-vocabulary (OOV)...Show More

Abstract:

In the domain of historical Mongolian document image retrieval (HMDIR), word spotting poses a inherent challenge due to the frequent appearance of out-of-vocabulary (OOV) words. Existing methods have mainly focused on query-by-example (QBE), neglecting the query-by-string (QBS) approach. Meanwhile, the hierarchical structure of word makes Euclidean space not the optimal choice for representing complex structured data. To address the aforementioned problems, we propose a novel method that leverages a shared hyperbolic space to effectively align text strings and word images. Specifically, we use the Pyramidal Histogram of Characters (PHOC) for text string embeding, and a robust encoder-decoder architecture for word image embedding, then map their embeddings in the shared hyperbolic space. Moreover, we propose a new dataset of historical Mongolian documents called Geser, which includes 143,508 word images and 10,951 vocabularies. Extensive experiments conducted on two datasets of historical Mongolian documents with an OOV partitioning scheme (Kanjur and Geser), demonstrate that our proposed method surpasses state-of-the-art methods and achieves outstanding results on Geser.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.