Abstract
Apple iPad is a portable tablet computer that offers users a generic platform for consumer media including games, books, and movies. Though iPad is gaining popularity very quickly, its application in content-based image retrieval and annotation is still in its infancy. This paper aims to develop an interactive system to efficiently retrieve and annotate image objects on iPad, which mainly consists of two components of the front-end GUI (graphical user interface) and the back-end retrieval model. In the first component, an iPad-based GUI is implemented, which can provide users with an efficient way to select query objects and facilitate annotations. In the second component, we propose an object-based image retrieval algorithm that combines a novel feature descriptor based on context-preserving bags-of-words (BoW) and a two-stage re-ranking technique to measure the similarity between the query image and each image in the database. The retrieval results are returned and visualized on the iPad-based GUI, and annotations offered by users can be propagated among them. The communication between the front-end GUI and the back-end module is through the use of wireless networks. Comprehensive experiments on several benchmark datasets demonstrated the effectiveness of the proposed framework.
Similar content being viewed by others
References
Abramson Y, Freund Y (2005) Semi-automatic visual learning (seville): a tutorial on active learning for visual object recognition. Tutorial of IEEE Conference on Computer Vision and Pattern Recognition
ASIHTTPRequest documentation [Online]. Available: http://allseeing-i.com/ASIHTTPRequest/. Accessed 5 April 2012
Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans. on Graphics, 26(3):10
Cao Y, Wang C, Li Z, Zhang L, Zhang L (2010) Spatial-bag-of-features. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 3352–3359
Carneiro G, Jepson A (2007) Flexible spatial configuration of local image features. IEEE Trans Patterns Anal Mach Intell 26:2089–2104
Chandrasekhar V, Chen DM, Lin A, Takacs G, Tsai S, Cheung N-M, Reznik Y, Grzeszczuk R, Girod B (2010) Comparison of local feature descriptors for mobile visual search. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 3885–3888
Chen D, Baatz G, Koser K, Tsai S, Vedantham R, Pylvanainen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R (2011) City-scale landmark identification on mobile devices. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition pp. 737–744
Erol B, Antúnez E, Hull J (2008) HOTPAPER: Multimedia interaction with paper using mobile phones. In Proceedings of the 16th ACM International Conference on Multimedia pp. 399–408
Ferrari V, Jurie F, Schmid C (2009) From images to shape models for object detection. Int J Comput Vis 87:284–303
Flickr Photo Sharing Service. [Online]. Available: http://www.flickr.com. Accessed 5 April 2012
Google Goggles [Online]. Available: http://www.google.com/mobile/goggles/. Accessed 5 April 2012
Han J, Farin D, de With P (2011) A mixed-reality system for broadcasting sports video to mobile devices. IEEE Multimedia 18(2):72–84
Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimedia Tools Appl 39(2):169–188
Han D, Wu X, Sonka M (2009) Optimal multiple surfaces searching for video/image resizing-a graph-theoretic approach. In Proceedings of IEEE International Conference on Computer Vision pp. 1026–1033
Jamieson M, Fazly A, Stevenson S, Dickinson S, Wachsmuth S (2010) Using language to learn structured appearance models for image annotation. IEEE Trans Patterns Anal Mach Intell 32:148–164
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 2169–2178
Leibe B, Schiele B (2004) Scale-invariant object categorization using a scale-adaptive mean-shift search. Pattern Recognition 3175:145–153
Li X, Liu T (2011) iPad for bioimage informatics. Dissertation, University of Georgia
Liu X, Hull J, Graham J, Moraleda J, Bailloeul T (2010) Mobile Visual Search, Linking Printed Documents to Digital Media. Demonstration of IEEE Conference on Computer Vision and Pattern Recognition
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Nokia (2009) Nokia Point and Find [Online]. Available: http://betalabs.nokia.com/trials/nokia-point-and-find. Accessed 5 April 2012
Opelt A, Pinz A, Zisserman A (2006) A boundary-fragment-model for object detection. In Proceedings of European Conference on Computer Vision pp. 575–588
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 1–8
Russell B, Torralba A (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173
Sadun E (2009) The iPhone developer’s cookbook, 2nd edn. Addison-Wesley Professional Press, Boston
Savarese S, Winn J, Criminisi A (2006) Discriminative object class models of appearance and shape by correlations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 2033–2040
Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. In Proceedings of IEEE International Conference on Computer Vision pp. 1470–1477
Sivic J, Zisserman A (2006) Video Google: Efficient visual search of videos. In Toward Category-Level Object Recognition pp. 127–144
Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606
Takacs G, Chandrasekhar V, Gelfand N, Xiong Y, Chen W, Bismpigiannis T, Grzeszczuk R, Pulli K, Girod B (2008) Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval pp. 427–434
Apache Tomcat [Online]. Available: http://tomcat.apache.org/download-60.cgi. Accessed 5 April 2012
Tsai S, Chen D, Chen H, Hsu C, Kim K, Singh J, Girod B (2011) Combining image and text features: a hybrid approach to mobile book spine recognition. In Proceedings of the 19th ACM International Conference on Multimedia pp. 1029–1032
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems pp. 319–326
Wagner D, Reitmayr G, Mulloni A, Drummond T, Schmalstieg D (2008) Pose tracking from natural features on mobile phones. In Proceedings of the 7th IEEE/ACM International Symposium on Mixed and Augmented Reality pp. 125–134
Wu Z, Ke Q, Isard M, Sun J (2009) Bundling features for large scale partial-duplicate web image search. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 25–32
Yeh T, Tollmar K, Darrell T (2004) Searching the web with mobile images for location recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 76–81
Zhang S, Tian Q, Hua G, Huang Q, Li S (2009) Descriptive visual words and visual phrases for image applications. In Proceedings of the 17th ACM International Conference on Multimedia pp. 75–84
Acknowledgments
This work was supported by the National Science Foundation of China under Grant 61005018 and 91120005, NPU-FFR-JC20120237, and Program for New Century Excellent Talents in University under grant NCET-10-0079.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Han, J., Xu, M., Li, X. et al. Interactive object-based image retrieval and annotation on iPad. Multimed Tools Appl 72, 2275–2297 (2014). https://doi.org/10.1007/s11042-013-1509-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1509-6