Abstract:
Visual words of Bag-of-Visual-Words (BoVW) framework are independent each other, which results in not only discarding spatial orders between visual words but also lacking...Show MoreMetadata
Abstract:
Visual words of Bag-of-Visual-Words (BoVW) framework are independent each other, which results in not only discarding spatial orders between visual words but also lacking semantic information. This study is inspired by word embeddings that a similar embedding procedure is applied to a large number of visual words. By this way, the corresponding embedding vectors of the visual words can be formulated. For a word image, the average of embedding vectors of all visual words within the word image is taken as its embedding vector. Moreover, Recurrent Neural Network (RNN) is utilized to encode each word image into embeddings like an auto-encoder. The RNN embeddings and the visual word embeddings are complementary. In this study, all word images are represented by combining visual word embeddings and RNN embeddings. Experimental results show that the proposed representation approach is superior to the traditional BoVW, spatial pyramid matching and latent Dirichlet allocation.
Date of Conference: 10-14 July 2017
Date Added to IEEE Xplore: 31 August 2017
ISBN Information:
Electronic ISSN: 1945-788X