Loading [a11y]/accessibility-menu.js
Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents | IEEE Conference Publication | IEEE Xplore

Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents


Abstract:

This paper proposed a visual embeddings approach to capturing semantic relatedness between visual words. To be specific, visual words are extracted and collected from a w...Show More

Abstract:

This paper proposed a visual embeddings approach to capturing semantic relatedness between visual words. To be specific, visual words are extracted and collected from a word image collection under the Bag-of-Visual-Words framework. And then, a deep learning procedure is used for mapping visual words into embedding vectors in a semantic space. To integrate spatial constraints into the representation of word images, one word image is segmented into several sub-regions with equal size along rows and columns. After that, each sub-region can be represented as an average of embedding vectors, which is the centroid of the embedding vectors of all visual words within the same sub-region. By this way, one word image can be converted into a fixed-length vector by concatenating the corresponding average embedding vectors from its all sub-regions. Euclidean distance can be calculated to measure similarity between word images. Experimental results demonstrate that the proposed representation approach outperforms Bag-of-Visual-Words, visual language model, spatial pyramid matching, latent Dirichlet allocation, average visual word embeddings and recurrent neural network.
Date of Conference: 20-24 August 2018
Date Added to IEEE Xplore: 29 November 2018
ISBN Information:
Print on Demand(PoD) ISSN: 1051-4651
Conference Location: Beijing, China

Contact IEEE to Subscribe

References

References is not available for this document.