Abstract
By bringing together the most prominent European institutions and archives in the field of Classical Latin and Greek epigraphy, the EAGLE project has collected the vast majority of the surviving Greco-Latin inscriptions into a single readily-searchable database. Text-based search engines are typically used to retrieve information about ancient inscriptions (or about other artifacts). These systems require that the users formulate a text query that contains information such as the place where the object was found or where it is currently located. Conversely, visual search systems can be used to provide information to users (like tourists and scholars) in a most intuitive and immediate way, just using an image as query. In this article, we provide a comparison of several approaches for visual recognizing ancient inscriptions. Our experiments, conducted on 17, 155 photos related to 14, 560 inscriptions, show that BoW and VLAD are outperformed by both Fisher Vector (FV) and Convolutional Neural Network (CNN) features. More interestingly, combining FV and CNN features into a single image representation allows achieving very high effectiveness by correctly recognizing the query inscription in more than 90% of the cases. Our results suggest that combinations of FV and CNN can be also exploited to effectively perform visual retrieval of other types of objects related to cultural heritage such as landmarks and monuments.
- Epigraphic Database Roma. 1999. Retrieved from http://www.edr-edr.it.Google Scholar
- G. Amato, F. Falchi, and C. Gennaro. 2013. On reducing the number of visual words in the bag-of-features representation. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISIGRAPP’13). 657--662.Google Scholar
- G. Amato, F. Falchi, and C. Gennaro. 2015. Fast image classification for monument recognition. Journal on Computing and Cultural Heritage 8, 4, Article 18 (Aug. 2015), 25 pages. Google ScholarDigital Library
- G. Amato, F. Falchi, F. Rabitti, and L. Vadicamo. 2014. Inscriptions visual recognition. A comparison of state-of-the-art object recognition approaches. In Proceedings of the 1st EAGLE International Conference, Vol. 26. Sapienza Universitá Editrice, 117--131. http://archiv.ub.uni-heidelberg.de/propylaeumdok/volltexte/2015/2337.Google Scholar
- R. Arandjelovic and A. Zisserman. 2013. All about VLAD. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). 1578--1585. Google ScholarDigital Library
- A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. 2014. Neural codes for image retrieval. In Computer Vision--ECCV 2014. Springer, 584--599.Google Scholar
- J. M. Barrios, B. Bustos, and T. Skopal. 2014. Analyzing and dynamically indexing the query set. Information Systems 45 (2014), 37--47.Google ScholarCross Ref
- H. Bay, T. Tuytelaars, and L. Van Gool. 2006. SURF: Speeded up robust features. In Computer Vision - ECCV 2006, Ales Leonardis, Horst Bischof, and Axel Pinz (Eds.). Lecture Notes in Computer Science, Vol. 3951. Springer, Berlin, 404--417. Google ScholarDigital Library
- C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarDigital Library
- V. Chandrasekhar, J. Lin, O. Morère, H. Goh, and A. Veillard. 2015. A practical guide to CNNs and fisher vectors for image instance retrieval. CoRR abs/1508.02496 (2015). http://arxiv.org/abs/1508.02496Google Scholar
- D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, Huizhong Chen, R. Vedantham, R. Grzeszczuk, and B. Girod. 2011. Residual enhanced visual vectors for on-device image matching. In Proceedings of the 2011 Conference Record of the 45th Asilomar Conference on Signals, Systems and Computers (ASILOMAR’11). 850--854.Google Scholar
- G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the Workshop on Statistical Learning in Computer Vision, ECCV 1, 1--22 (2004), 1--2.Google Scholar
- J. Delhumeau, P.-H. Gosselin, H. Jégou, and P. Pérez. 2013. Revisiting the VLAD image representation. In Proceedings of the 21st ACM International Conference on Multimedia (MM’13). ACM, New York, NY, 653--656. Google ScholarDigital Library
- J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR’09). 248--255.Google Scholar
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. 2013. DeCAF: A deep convolutional activation feature for generic visual recognition. CoRR abs/1310.1531 (2013). http://arxiv.org/abs/1310.1531Google Scholar
- M. A. Fischler and R. C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (June 1981), 381--395. Google ScholarDigital Library
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 580--587. Google ScholarDigital Library
- I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. Retrieved from http://www.deeplearningbook.org.Google Scholar
- R. M. Gray and D. L. Neuhoff. 1998. Quantization. IEEE Transactions on Information Theory 44, 6 (Oct. 1998), 2325--2383. Google ScholarDigital Library
- T. Jaakkola and D. Haussler. 1998. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11. MIT Press, 487--493. Google ScholarDigital Library
- H. Jégou, M. Douze, and C. Schmid. 2010. Improving bag-of-features for large scale image search. International Journal of Computer Vision 87, 3 (2010), 316--336. Google ScholarDigital Library
- H. Jégou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (Jan. 2011), 117--128. Google ScholarDigital Library
- H. Jégou, M. Douze, C. Schmid, and P. Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision 8 Pattern Recognition.Google Scholar
- H. Jégou, F. Perronnin, M. Douze, J. Sànchez, P. Pérez, and C. Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9 (2012), 1704--1716. Google ScholarDigital Library
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
- J. Kamahara, T. Nagamatsu, and N. Tanaka. 2012. Conjunctive ranking function using geographic distance and image distance for geotagged image retrieval. In Proceedings of the ACM Multimedia 2012 Workshop on Geotagging and Its Applications in Multimedia (GeoMM’12). ACM, New York, NY, 9--14. Google ScholarDigital Library
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, 1097--1105. Google ScholarDigital Library
- S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (March 1982), 129--137. Google ScholarDigital Library
- D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91--110. Google ScholarDigital Library
- G. McLachlan and D. Peel. 2000. Finite Mixture Models. Wiley.Google Scholar
- F. Perronnin and C. Dance. 2007. Fisher kernels on visual vocabularies for image categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07). 1--8.Google Scholar
- F. Perronnin, Yan Liu, J. Sànchez, and H. Poirier. 2010a. Large-scale image retrieval with compressed Fisher vectors. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 3384--3391.Google Scholar
- F. Perronnin, J. Sànchez, and T. Mensink. 2010b. Improving the Fisher kernel for large-scale image classification. In Proceedings of the Computer Vision (ECCV’10). Lecture Notes in Computer Science, Vol. 6314. Springer, Berlin, 143--156. Google ScholarDigital Library
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.Google Scholar
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). 1--8.Google Scholar
- G. A. Pratt. 2015. Is a Cambrian explosion coming for robotics? Journal of Economic Perspectives 29, 3 (Aug. 2015), 51--60.Google ScholarCross Ref
- A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’14). IEEE, 512--519. Google ScholarDigital Library
- G. Salton and M. J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY. Google ScholarDigital Library
- J. Sànchez, F. Perronnin, T. Mensink, and J. Verbeek. 2013. Image classification with the fisher vector: Theory and practice. International Journal of Computer Vision 105, 3 (2013), 222--245. Google ScholarDigital Library
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556Google Scholar
- J. Sivic and A. Zisserman. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV’03), Vol. 2. IEEE Computer Society, 1470--1477. Google ScholarDigital Library
- B. Thomee, E. M. Bakker, and M. S. Lew. 2010. TOP-SURF: A visual words toolkit. In Proceedings of the International Conference on Multimedia (MM’10). ACM, 1473--1476. Google ScholarDigital Library
- G. Tolias and H. Jégou. 2013. Local Visual Query Expansion: Exploiting an Image Collection to Refine Local Descriptors. Research Report RR-8325. Retrieved from https://hal.inria.fr/hal-00840721.Google Scholar
- G. Tolias, R. Sicre, and H. Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015). http://arxiv.org/abs/1511.05879Google Scholar
- J. C. Van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. 2010. Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 7 (July 2010), 1271--1283. Google ScholarDigital Library
- W. L. Zhao, H. Jégou, and G. Gravier. 2013. Oriented pooling for dense and non-dense rotation-invariant features. In Proceedings of the 24th British Machine Vision Conference (BMVC’13).Google Scholar
- Y. T. Zheng, M. Zhao, Y. Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T. S. Chua, and H. Neven. 2009. Tour the world: Building a web-scale landmark recognition engine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR’09). 1085--1092.Google Scholar
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K.Q. Weinberger (Eds.). Curran Associates, 487--495. Google ScholarDigital Library
Index Terms
- Visual Recognition of Ancient Inscriptions Using Convolutional Neural Network and Fisher Vector
Recommendations
Discovering Novelty Patterns from the Ancient Christian Inscriptions of Rome
Studying Greek and Latin cultural heritage has always been considered essential to the understanding of important aspects of the roots of current European societies. However, only a small fraction of the total production of texts from ancient Greece and ...
Searching the EAGLE Epigraphic Material Through Image Recognition via a Mobile Device
SISAP 2015: Proceedings of the 8th International Conference on Similarity Search and Applications - Volume 9371This demonstration paper describes the mobile application developed by the EAGLE project to increase the use and visibility of its epigraphic material. The EAGLE project European network of Ancient Greek and Latin Epigraphy is gathering a comprehensive ...
A virtual tour to the inscriptions of the UNESCO World Heritage Site St. Michael in Hildesheim
EVA '15: Proceedings of the Conference on Electronic Visualisation and the ArtsMuseums, places of interest and cultural heritage sites can be often visited on the Internet. Panoramic images and virtual tours allow to access distant sites from home. But can this popular and touristic way of presentation also be used to present ...
Comments