ABSTRACT
We present an image search engine that allows searching by similarity about 100M images included in the YFCC100M dataset, and annotate query images. Image similarity search is performed using YFCC100M-HNfc6, the set of deep features we extracted from the YFCC100M dataset, which was indexed using the MI-File index for efficient similarity searching. A metadata cleaning algorithm, that uses visual and textual analysis, was used to select from the YFCC100M dataset a relevant subset of images and associated annotations, to create a training set to perform automatic textual annotation of submitted queries. The on-line image and annotation system demonstrates the effectiveness of the deep features for assessing conceptual similarity among images, the effectiveness of the metadata cleaning algorithm, to identify a relevant training set for annotation, and the efficiency and accuracy of the MI-File similarity index techniques, to search and annotate using a dataset of 100M images, with very limited computing resources.
- Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2016. YFCC100M-HNfc6: A Large-Scale Deep Features Benchmark for Similarity Search. Springer International Publishing, Cham, 196--209. https://doi.org/10.1007/978-3-319-46759-7_15Google Scholar
- Giuseppe Amato, Claudio Gennaro, and Pasquale Savino. 2014. MI-File: using inverted files for scalable approximate similarity search. Multimedia tools and applications 71, 3 (2014), 1333--1362. Google ScholarDigital Library
- Kobus Barnard and David Forsyth. 2001. Learning the semantics of words and pictures. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, Vol. 2. IEEE, 408--415. Google ScholarCross Ref
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2013. Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531 (2013).Google Scholar
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96). AAAI Press, 226--231. http://dl.acm.org/citation.cfm?id=3001460.3001507Google ScholarDigital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444. Google ScholarCross Ref
- Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, and Alberto Del Bimbo. 2016. Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval. ACM Comput. Surv. 49, 1, Article 14 (June 2016), 39 pages. https://doi.org/10.1145/2906152Google Scholar
- Luis Pellegrin, Hugo Jair Escalante, Manuel Montes-y Gómez, and Fabio A. González. 2016. Local and global approaches for unsupervised image annotation. Multimedia Tools and Applications (2016), 1--26.Google Scholar
- M. F. Porter. 1997. Readings in Information Retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter An Algorithm for Suffix Stripping, 313--316. http://dl.acm.org/citation.cfm?id=275537.275705Google Scholar
- Ali Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 806--813. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Bart Thomee, Benjamin Elizalde, David A Shamma, Karl Ni, Gerald Friedland, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64--73. Google ScholarDigital Library
- Bart Thomee, Pierre Guarrigues, Liangliang Cao, and David A. Shamma. 2016. A Yahoo-Flickr Grand Challenge on Tag and Caption Prediction! https://multimediacommons.wordpress.com/tag-caption-prediction-challenge/. (2016). [Online; accessed 14-March-2017].Google Scholar
- Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in neural information processing systems. 487--495.Google Scholar
Index Terms
- Searching and annotating 100M Images with YFCC100M-HNfc6 and MI-File
Recommendations
Annotating gigapixel images
UIST '08: Proceedings of the 21st annual ACM symposium on User interface software and technologyPanning and zooming interfaces for exploring very large images containing billions of pixels (gigapixel images) have recently appeared on the internet. This paper addresses issues that arise when creating and rendering auditory and textual annotations ...
Web image retrieval systems with automatic web image annotating techniques
Due to the popularity of digital cameras and web authors'enriching the visual aesthetics, the number of web images is growing in an uncontrolled speed. The images in the World Wide Web are becoming a large image library for browsing. It is an important ...
Annotating Historical Archives of Images
Recent programs like the Million Book Project and Google Print Library Project have archived several million books in digital format, and within a few years a significant fraction of world's books will be online. While the majority of the data will ...
Comments