Skip to main content
Log in

Interactive object-based image retrieval and annotation on iPad

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Apple iPad is a portable tablet computer that offers users a generic platform for consumer media including games, books, and movies. Though iPad is gaining popularity very quickly, its application in content-based image retrieval and annotation is still in its infancy. This paper aims to develop an interactive system to efficiently retrieve and annotate image objects on iPad, which mainly consists of two components of the front-end GUI (graphical user interface) and the back-end retrieval model. In the first component, an iPad-based GUI is implemented, which can provide users with an efficient way to select query objects and facilitate annotations. In the second component, we propose an object-based image retrieval algorithm that combines a novel feature descriptor based on context-preserving bags-of-words (BoW) and a two-stage re-ranking technique to measure the similarity between the query image and each image in the database. The retrieval results are returned and visualized on the iPad-based GUI, and annotations offered by users can be propagated among them. The communication between the front-end GUI and the back-end module is through the use of wireless networks. Comprehensive experiments on several benchmark datasets demonstrated the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. www.robots.ox.ac.uk/~vgg/data/oxbuildings/

  2. http://www.vis.uky.edu/~stewe/ukbench/

  3. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2011/

References

  1. Abramson Y, Freund Y (2005) Semi-automatic visual learning (seville): a tutorial on active learning for visual object recognition. Tutorial of IEEE Conference on Computer Vision and Pattern Recognition

  2. ASIHTTPRequest documentation [Online]. Available: http://allseeing-i.com/ASIHTTPRequest/. Accessed 5 April 2012

  3. Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans. on Graphics, 26(3):10

    Google Scholar 

  4. Cao Y, Wang C, Li Z, Zhang L, Zhang L (2010) Spatial-bag-of-features. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 3352–3359

  5. Carneiro G, Jepson A (2007) Flexible spatial configuration of local image features. IEEE Trans Patterns Anal Mach Intell 26:2089–2104

    Article  Google Scholar 

  6. Chandrasekhar V, Chen DM, Lin A, Takacs G, Tsai S, Cheung N-M, Reznik Y, Grzeszczuk R, Girod B (2010) Comparison of local feature descriptors for mobile visual search. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 3885–3888

  7. Chen D, Baatz G, Koser K, Tsai S, Vedantham R, Pylvanainen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R (2011) City-scale landmark identification on mobile devices. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition pp. 737–744

  8. Erol B, Antúnez E, Hull J (2008) HOTPAPER: Multimedia interaction with paper using mobile phones. In Proceedings of the 16th ACM International Conference on Multimedia pp. 399–408

  9. Ferrari V, Jurie F, Schmid C (2009) From images to shape models for object detection. Int J Comput Vis 87:284–303

    Article  Google Scholar 

  10. Flickr Photo Sharing Service. [Online]. Available: http://www.flickr.com. Accessed 5 April 2012

  11. Google Goggles [Online]. Available: http://www.google.com/mobile/goggles/. Accessed 5 April 2012

  12. Han J, Farin D, de With P (2011) A mixed-reality system for broadcasting sports video to mobile devices. IEEE Multimedia 18(2):72–84

    Google Scholar 

  13. Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimedia Tools Appl 39(2):169–188

    Article  Google Scholar 

  14. Han D, Wu X, Sonka M (2009) Optimal multiple surfaces searching for video/image resizing-a graph-theoretic approach. In Proceedings of IEEE International Conference on Computer Vision pp. 1026–1033

  15. Jamieson M, Fazly A, Stevenson S, Dickinson S, Wachsmuth S (2010) Using language to learn structured appearance models for image annotation. IEEE Trans Patterns Anal Mach Intell 32:148–164

    Article  Google Scholar 

  16. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 2169–2178

  17. Leibe B, Schiele B (2004) Scale-invariant object categorization using a scale-adaptive mean-shift search. Pattern Recognition 3175:145–153

    Google Scholar 

  18. Li X, Liu T (2011) iPad for bioimage informatics. Dissertation, University of Georgia

  19. Liu X, Hull J, Graham J, Moraleda J, Bailloeul T (2010) Mobile Visual Search, Linking Printed Documents to Digital Media. Demonstration of IEEE Conference on Computer Vision and Pattern Recognition

  20. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  21. Nokia (2009) Nokia Point and Find [Online]. Available: http://betalabs.nokia.com/trials/nokia-point-and-find. Accessed 5 April 2012

  22. Opelt A, Pinz A, Zisserman A (2006) A boundary-fragment-model for object detection. In Proceedings of European Conference on Computer Vision pp. 575–588

  23. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 1–8

  24. Russell B, Torralba A (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173

    Article  Google Scholar 

  25. Sadun E (2009) The iPhone developer’s cookbook, 2nd edn. Addison-Wesley Professional Press, Boston

  26. Savarese S, Winn J, Criminisi A (2006) Discriminative object class models of appearance and shape by correlations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 2033–2040

  27. Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. In Proceedings of IEEE International Conference on Computer Vision pp. 1470–1477

  28. Sivic J, Zisserman A (2006) Video Google: Efficient visual search of videos. In Toward Category-Level Object Recognition pp. 127–144

  29. Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606

    Google Scholar 

  30. Takacs G, Chandrasekhar V, Gelfand N, Xiong Y, Chen W, Bismpigiannis T, Grzeszczuk R, Pulli K, Girod B (2008) Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval pp. 427–434

  31. Apache Tomcat [Online]. Available: http://tomcat.apache.org/download-60.cgi. Accessed 5 April 2012

  32. Tsai S, Chen D, Chen H, Hsu C, Kim K, Singh J, Girod B (2011) Combining image and text features: a hybrid approach to mobile book spine recognition. In Proceedings of the 19th ACM International Conference on Multimedia pp. 1029–1032

  33. von Ahn L, Dabbish L (2004) Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems pp. 319–326

  34. Wagner D, Reitmayr G, Mulloni A, Drummond T, Schmalstieg D (2008) Pose tracking from natural features on mobile phones. In Proceedings of the 7th IEEE/ACM International Symposium on Mixed and Augmented Reality pp. 125–134

  35. Wu Z, Ke Q, Isard M, Sun J (2009) Bundling features for large scale partial-duplicate web image search. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 25–32

  36. Yeh T, Tollmar K, Darrell T (2004) Searching the web with mobile images for location recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition pp. 76–81

  37. Zhang S, Tian Q, Hua G, Huang Q, Li S (2009) Descriptive visual words and visual phrases for image applications. In Proceedings of the 17th ACM International Conference on Multimedia pp. 75–84

Download references

Acknowledgments

This work was supported by the National Science Foundation of China under Grant 61005018 and 91120005, NPU-FFR-JC20120237, and Program for New Century Excellent Talents in University under grant NCET-10-0079.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Junwei Han or Tianming Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, J., Xu, M., Li, X. et al. Interactive object-based image retrieval and annotation on iPad. Multimed Tools Appl 72, 2275–2297 (2014). https://doi.org/10.1007/s11042-013-1509-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1509-6

Keywords

Navigation