skip to main content
research-article

Visual Recognition of Ancient Inscriptions Using Convolutional Neural Network and Fisher Vector

Published:19 December 2016Publication History
Skip Abstract Section

Abstract

By bringing together the most prominent European institutions and archives in the field of Classical Latin and Greek epigraphy, the EAGLE project has collected the vast majority of the surviving Greco-Latin inscriptions into a single readily-searchable database. Text-based search engines are typically used to retrieve information about ancient inscriptions (or about other artifacts). These systems require that the users formulate a text query that contains information such as the place where the object was found or where it is currently located. Conversely, visual search systems can be used to provide information to users (like tourists and scholars) in a most intuitive and immediate way, just using an image as query. In this article, we provide a comparison of several approaches for visual recognizing ancient inscriptions. Our experiments, conducted on 17, 155 photos related to 14, 560 inscriptions, show that BoW and VLAD are outperformed by both Fisher Vector (FV) and Convolutional Neural Network (CNN) features. More interestingly, combining FV and CNN features into a single image representation allows achieving very high effectiveness by correctly recognizing the query inscription in more than 90% of the cases. Our results suggest that combinations of FV and CNN can be also exploited to effectively perform visual retrieval of other types of objects related to cultural heritage such as landmarks and monuments.

References

  1. Epigraphic Database Roma. 1999. Retrieved from http://www.edr-edr.it.Google ScholarGoogle Scholar
  2. G. Amato, F. Falchi, and C. Gennaro. 2013. On reducing the number of visual words in the bag-of-features representation. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISIGRAPP’13). 657--662.Google ScholarGoogle Scholar
  3. G. Amato, F. Falchi, and C. Gennaro. 2015. Fast image classification for monument recognition. Journal on Computing and Cultural Heritage 8, 4, Article 18 (Aug. 2015), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Amato, F. Falchi, F. Rabitti, and L. Vadicamo. 2014. Inscriptions visual recognition. A comparison of state-of-the-art object recognition approaches. In Proceedings of the 1st EAGLE International Conference, Vol. 26. Sapienza Universitá Editrice, 117--131. http://archiv.ub.uni-heidelberg.de/propylaeumdok/volltexte/2015/2337.Google ScholarGoogle Scholar
  5. R. Arandjelovic and A. Zisserman. 2013. All about VLAD. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). 1578--1585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. 2014. Neural codes for image retrieval. In Computer Vision--ECCV 2014. Springer, 584--599.Google ScholarGoogle Scholar
  7. J. M. Barrios, B. Bustos, and T. Skopal. 2014. Analyzing and dynamically indexing the query set. Information Systems 45 (2014), 37--47.Google ScholarGoogle ScholarCross RefCross Ref
  8. H. Bay, T. Tuytelaars, and L. Van Gool. 2006. SURF: Speeded up robust features. In Computer Vision - ECCV 2006, Ales Leonardis, Horst Bischof, and Axel Pinz (Eds.). Lecture Notes in Computer Science, Vol. 3951. Springer, Berlin, 404--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Chandrasekhar, J. Lin, O. Morère, H. Goh, and A. Veillard. 2015. A practical guide to CNNs and fisher vectors for image instance retrieval. CoRR abs/1508.02496 (2015). http://arxiv.org/abs/1508.02496Google ScholarGoogle Scholar
  11. D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, Huizhong Chen, R. Vedantham, R. Grzeszczuk, and B. Girod. 2011. Residual enhanced visual vectors for on-device image matching. In Proceedings of the 2011 Conference Record of the 45th Asilomar Conference on Signals, Systems and Computers (ASILOMAR’11). 850--854.Google ScholarGoogle Scholar
  12. G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the Workshop on Statistical Learning in Computer Vision, ECCV 1, 1--22 (2004), 1--2.Google ScholarGoogle Scholar
  13. J. Delhumeau, P.-H. Gosselin, H. Jégou, and P. Pérez. 2013. Revisiting the VLAD image representation. In Proceedings of the 21st ACM International Conference on Multimedia (MM’13). ACM, New York, NY, 653--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR’09). 248--255.Google ScholarGoogle Scholar
  15. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. 2013. DeCAF: A deep convolutional activation feature for generic visual recognition. CoRR abs/1310.1531 (2013). http://arxiv.org/abs/1310.1531Google ScholarGoogle Scholar
  16. M. A. Fischler and R. C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (June 1981), 381--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 580--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. Retrieved from http://www.deeplearningbook.org.Google ScholarGoogle Scholar
  19. R. M. Gray and D. L. Neuhoff. 1998. Quantization. IEEE Transactions on Information Theory 44, 6 (Oct. 1998), 2325--2383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Jaakkola and D. Haussler. 1998. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11. MIT Press, 487--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Jégou, M. Douze, and C. Schmid. 2010. Improving bag-of-features for large scale image search. International Journal of Computer Vision 87, 3 (2010), 316--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Jégou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (Jan. 2011), 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Jégou, M. Douze, C. Schmid, and P. Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision 8 Pattern Recognition.Google ScholarGoogle Scholar
  24. H. Jégou, F. Perronnin, M. Douze, J. Sànchez, P. Pérez, and C. Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9 (2012), 1704--1716. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. ACM, 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Kamahara, T. Nagamatsu, and N. Tanaka. 2012. Conjunctive ranking function using geographic distance and image distance for geotagged image retrieval. In Proceedings of the ACM Multimedia 2012 Workshop on Geotagging and Its Applications in Multimedia (GeoMM’12). ACM, New York, NY, 9--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (March 1982), 129--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. McLachlan and D. Peel. 2000. Finite Mixture Models. Wiley.Google ScholarGoogle Scholar
  31. F. Perronnin and C. Dance. 2007. Fisher kernels on visual vocabularies for image categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07). 1--8.Google ScholarGoogle Scholar
  32. F. Perronnin, Yan Liu, J. Sànchez, and H. Poirier. 2010a. Large-scale image retrieval with compressed Fisher vectors. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 3384--3391.Google ScholarGoogle Scholar
  33. F. Perronnin, J. Sànchez, and T. Mensink. 2010b. Improving the Fisher kernel for large-scale image classification. In Proceedings of the Computer Vision (ECCV’10). Lecture Notes in Computer Science, Vol. 6314. Springer, Berlin, 143--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.Google ScholarGoogle Scholar
  35. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). 1--8.Google ScholarGoogle Scholar
  36. G. A. Pratt. 2015. Is a Cambrian explosion coming for robotics? Journal of Economic Perspectives 29, 3 (Aug. 2015), 51--60.Google ScholarGoogle ScholarCross RefCross Ref
  37. A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’14). IEEE, 512--519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Salton and M. J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Sànchez, F. Perronnin, T. Mensink, and J. Verbeek. 2013. Image classification with the fisher vector: Theory and practice. International Journal of Computer Vision 105, 3 (2013), 222--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556Google ScholarGoogle Scholar
  41. J. Sivic and A. Zisserman. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV’03), Vol. 2. IEEE Computer Society, 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. B. Thomee, E. M. Bakker, and M. S. Lew. 2010. TOP-SURF: A visual words toolkit. In Proceedings of the International Conference on Multimedia (MM’10). ACM, 1473--1476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. G. Tolias and H. Jégou. 2013. Local Visual Query Expansion: Exploiting an Image Collection to Refine Local Descriptors. Research Report RR-8325. Retrieved from https://hal.inria.fr/hal-00840721.Google ScholarGoogle Scholar
  44. G. Tolias, R. Sicre, and H. Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015). http://arxiv.org/abs/1511.05879Google ScholarGoogle Scholar
  45. J. C. Van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. 2010. Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 7 (July 2010), 1271--1283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. W. L. Zhao, H. Jégou, and G. Gravier. 2013. Oriented pooling for dense and non-dense rotation-invariant features. In Proceedings of the 24th British Machine Vision Conference (BMVC’13).Google ScholarGoogle Scholar
  47. Y. T. Zheng, M. Zhao, Y. Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T. S. Chua, and H. Neven. 2009. Tour the world: Building a web-scale landmark recognition engine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR’09). 1085--1092.Google ScholarGoogle Scholar
  48. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K.Q. Weinberger (Eds.). Curran Associates, 487--495. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Visual Recognition of Ancient Inscriptions Using Convolutional Neural Network and Fisher Vector

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal on Computing and Cultural Heritage
            Journal on Computing and Cultural Heritage   Volume 9, Issue 4
            December 2016
            120 pages
            ISSN:1556-4673
            EISSN:1556-4711
            DOI:10.1145/2999570
            Issue’s Table of Contents

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 December 2016
            • Accepted: 1 June 2016
            • Revised: 1 May 2016
            • Received: 1 February 2016
            Published in jocch Volume 9, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader