Skip to main content
Log in

Pointwise and pairwise clothing annotation: combining features from social media

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we present effective algorithms to automatically annotate clothes from social media data, such as Facebook and Instagram. Clothing annotation can be informally stated as recognizing, as accurately as possible, the garment items appearing in the query photo. This task brings huge opportunities for recommender and e-commerce systems, such as capturing new fashion trends based on which clothes have been used more recently. It also poses interesting challenges for existing vision and recognition algorithms, such as distinguishing between similar but different types of clothes or identifying a pattern of a cloth even if it has different colors and shapes. We formulate the annotation task as a multi-label and multi-modal classification problem: (i) both image and textual content (i.e., tags about the image) are available for learning classifiers, (ii) the classifiers must recognize a set of labels (i.e., a set of garment items), and (iii) the decision on which labels to assign to the query photo comes from a set of instances that is used to build a function, which separates labels that should be assigned to the query photo, from those that should not be assigned. Using this configuration, we propose two approaches: (i) the pointwise one, called MMCA, which receives a single image as input, and (ii) a multi-instance classification, called M3CA, also known as pairwise approach, which uses pair of images to create the classifiers. We conducted a systematic evaluation of the proposed algorithms using everyday photos collected from two major fashion-related social media, namely pose.com and chictopia.com. Our results show that the proposed approaches provide improvements when compared to popular first choice multi-label, multi-modal, multi-instance algorithms that range from 20 % to 30 % in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. Hereafter we refer each f i as the corresponding interval.

  2. Labels for which \(\hat {p}(l_{i}|\tilde {q})>0\).

  3. L1 distance function calculates the difference between two feature vectors by summing the absolute value of each keyword: \(L1(P,Q) = {\sum }_{i=1}^{N} |p_{i} - q_{i}|\)

  4. Both, Chictopia and Pose, datasets used in this paper are available for download at: http://www.patreo.dcc.ufmg.br/downloads/fashion-datasets/

  5. The processing time computed is only the time spent by the classification algorithm.

References

  1. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: International conference on management of data, pp 207–216

  2. Alahi A, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. In: Conference on computer vision and pattern recognition, pp 510–517

  3. Atrey PK, Hossain MA, El-Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed Syst 16(6):345–379

    Article  Google Scholar 

  4. Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval—the concepts and technology behind search, 2nd edn, Pearson Education Ltd., Harlow

  5. Bay H, Ess A, Tuytelaars T, Gool LJV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  6. Bekele D, Teutsch M, Schuchert T (2013) Evaluation of binary keypoint descriptors. In: International conference on image processing, pp 3652–3656

  7. Blei DM, Jordan MI (2003) Modeling annotated data. In: ACM special interest group on information retrieval, pp 127–134

  8. Boureau Y, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Conference on computer vision and pattern recognition, pp 2559–2566

  9. Briggs F, Fern XZ, Raich R (2012) Rank-loss support instance machines for miml instance annotation. In: International conference on knowledge discovery and data mining, pp 534–542

  10. Calonder M, Lepetit V, Strecha C, Fua P (2010) BRIEF: binary robust independent elementary features. In: European conference on computer vision, pp 778–792

  11. da Silva Torres R, Falcȧo AX (2006) Content-based image retrieval: theory and applications. RITA 13(2):161–185

    Google Scholar 

  12. de Avila SEF, Thome N, Cord M, Valle E, de Albuquerque Araújo A (2011) BOSSA: extended bow formalism for image classification. In: International conference on image processing, pp 2909–2912

  13. Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. CVPR 2009, pp 248–255

  14. dos Santos JA, Penatti OAB, da Silva Torres R (2010) Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classification. In: International conference on computer vision theory and applications, pp 203–208

  15. dos Santos JA, Faria FA, da Silva Torres R, Rocha A, Gosselin PH, Philipp-Foliguet S, Falcão AX (2012) Descriptor correlation analysis for remote sensing image multi-scale classification. In: International conference on pattern recognition, pp 3078–3081

  16. Duygulu P, Barnard K, de Freitas JFG, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: European conference on computer vision, pp 97–112

  17. Escalante HJ, Montes M, Sucar E (2012) Multimodal indexing based on semantic cohesion for image retrieval. Inf Retr 15(1):1–32

    Article  Google Scholar 

  18. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  19. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: International joint conference on artificial intelligence, pp 1022–1029

  20. Feng S, Xu D (2010) Transductive multi-instance multi-label learning algorithm with application to automatic image annotation. Expert Syst Appl 37(1):661–670

    Article  Google Scholar 

  21. Gallagher AC, Chen T (2008) Clothing cosegmentation for recognizing people. In: Conference on computer vision and pattern recognition

  22. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res

  23. Guillaumin M, Mensink T, Verbeek JJ, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: International Conference on Computer Vision, pp 309–316

  24. Guillaumin M, Verbeek JJ, Schmid C (2010) Multimodal semi-supervised learning for image classification. In: Conference on computer vision and pattern recognition, pp 902–909

  25. Huang C, Liu Q (2007) An orientation independent texture descriptor for image retireval. In: International conference on computer and computational sciences, pp 772–776

  26. Huang J, Kumar R, Mitra M, Zhu W, Zabih R (1997) Image indexing using color correlograms. In: Conference on computer vision and pattern recognition, pp 762–768

  27. Kalantidis Y, Kennedy L, Li L (2013) Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In: International conference on multimedia retrieval, pp 105–112

  28. Leutenegger S, Chli M, Siegwart R (2011) BRISK: binary robust invariant scalable keypoints. In: International conference on computer vision, pp 2548–2555

  29. Li R, Lu J, Zhang Y, Zhao T (2010) Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl-Based Syst 23(3):195–201

    Article  Google Scholar 

  30. Liu T (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331

    Article  Google Scholar 

  31. Liu S, Song Z, Liu G, Xu C, Lu H, Yan S (2012) Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set. In: Conference on computer vision and pattern recognition, pp 3330–3337

  32. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  33. Mahmoudi F, Shanbehzadeh J, Eftekhari-Moghadam A, Soltanian-Zadeh H (2003) Image retrieval based on shape similarity by edge orientation autocorrelogram. Pattern Recogn 36(8):1725–1736

    Article  Google Scholar 

  34. Makadia A, Pavlovic V, Kumar S, 2008 A new baseline for image annotation. In: European conference on computer vision. Springer, pp 316–329

  35. Maron O, Lozano-Pérez T (1997) A framework for multiple-instance learning. In: Neural information processing systems, pp 570–576

  36. Moran S, Lavrenko V (2014) Sparse kernel learning for image annotation. In: International conference on multimedia retrieval, p 113

  37. Nguyen C, Zhan D, Zhou Z (2013) Multi-modal image annotation with multi-instance multi-label LDA. In: International joint conference on artificial intelligence

  38. Nogueira K, Veloso AA, dos Santos JA (2014) Learning to annotate clothes in everyday photos: multi-modal, multi-label, multi-instance approach. In: 27th conference on graphics, patterns and images, SIBGRAPI 2014. IEEE Computer Society, pp 327–334

  39. Ntalianis K, Tsapatsoulis N, Doulamis A, Matsatsinis N (2014) Automatic annotation of image databases based on implicit crowdsourcing, visual concept modeling and evolution. Multimed Tools Appl 69(2):397–421

    Article  Google Scholar 

  40. Oliva A, Torralba A (2006) Building the gist of a scene: the role of global image features in recognition. Visual perception. Prog Brain Res 155:23–36

    Article  Google Scholar 

  41. Pass G, Zabih R, Miller J (1996) Comparing images using color coherence vectors. In: International conference on multimedia, pp 65–73

  42. Penatti OAB, Valle E, da Silva Torres R (2012) Comparative study of global color and texture descriptors for web image retrieval. J Vis Commun Image Represent 23(2):359–380

    Article  Google Scholar 

  43. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Conference on computer vision and pattern recognition

  44. Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: International conference on data mining, pp 995–1000

  45. Rublee E, Rabaud V, Konolige K, Bradski GR (2011) ORB: an efficient alternative to SIFT or SURF. In: International conference on computer vision, pp 2564–2571

  46. Shen EY, Lieberman H, Lam F (2007) What am I gonna wear?: Scenario-oriented recommendation. In: International conference on intelligent user interfaces, pp 365–368

  47. Simo-Serra E, Fidler S, Moreno-Noguer F, Urtasun R (2014) A high performance CRF model for clothes parsing. In: Asian conference on computer vision

  48. Simo-Serra E, Fidler S, Moreno-Noguer F, Urtasun R (2015) Neuroaesthetics in fashion: modeling the perception of fashionability. In: Conference on computer vision and pattern recognition

  49. Sivic J, Zisserman A (2006) Video google: efficient visual search of videos. In: Toward category-level object recognition, pp 127–144

  50. Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Conference on empirical methods in natural language processing, pp 254–263

  51. Socher R, Lin CC, Ng AY, Manning CD (2011) Parsing natural scenes and natural language with recursive neural networks. In: International conference on machine learning, pp 129–136

  52. Stehling RO, Nascimento MA, Falcão AX (2002) A compact and efficient image retrieval approach based on border/interior pixel classification. In: International conference on information and knowledge management, pp 102–109

  53. Suh B, Bederson BB (2007) Semi-automatic photo annotation strategies using event based clustering and clothing based person recognition. Interact Comput 19 (4):524–544

    Article  Google Scholar 

  54. Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32

    Article  Google Scholar 

  55. Tang J, Li H, Qi G, Chua T (2010) Image annotation by graph-based inference with integrated multiple/single instance representations. IEEE Trans Multimed 12 (2):131–141

    Article  Google Scholar 

  56. Tao B, Dickinson BW (2000) Texture recognition and image retrieval using gradient indexing. J Vis Commun Image Represent 11(3):327–342

    Article  Google Scholar 

  57. Tokumaru M, Fujibayashi T, Muranaka N, Imanishi S (2002) Virtual stylist project—dress up support system considering user’s subjectivity. In: International conference on fuzzy systems and knowledge discovery: computational intelligence for the E-Age, pp 207–211

  58. Tsoumakas G, Katakis I (2006) Multi-label classification: an overview. Dept of Informatics, Aristotle University of Thessaloniki, Greece

    Google Scholar 

  59. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehouse Min 3(3):1–13

    Article  Google Scholar 

  60. Tuytelaars T (2010) Dense interest points. In: Conference on computer vision and pattern recognition, pp 2281–2288

  61. Tuytelaars T, Mikolajczyk K (2007) Local invariant feature detectors: a survey. Found Trends Comput Graph Vis 3(3):177–280

    Article  Google Scholar 

  62. Unser M (1986) Sum and difference histograms for texture classification. IEEE Trans Pattern Anal Mach Intell 8(1):118–125

    Article  MathSciNet  Google Scholar 

  63. van Gemert J, Geusebroek J, Veenman CJ, Smeulders AWM (2008) Kernel codebooks for scene categorization. In: European conference on computer vision, pp 696–709

  64. Veloso A, Jr WM, Zaki MJ (2006) Lazy associative classification. In: International conference on data mining, pp 645–654

  65. Veloso A, Jr WM, Gonçalves MA, Zaki MJ (2007) Multi-label lazy associative classification. In: Conference on principles and practice of knowledge discovery in databases, pp 605–612

  66. Vens C, Struyf J, Schietgat L, Dzeroski S, Blockeel H (2008) Decision trees for hierarchical multi-label classification. Mach Learn 73(2):185–214

    Article  Google Scholar 

  67. Vogiatzis D, Pierrakos D, Paliouras G, Jenkyn-Jones S, Possen BJHHA (2012) Expert and community based style advice. Expert Syst Appl 39(12):10:647–10:655

    Article  Google Scholar 

  68. Weber M, Bäuml M, Stiefelhagen R (2011) Part-based clothing segmentation for person retrieval. In: International conference on advanced video and signal-based surveillance, pp 361–366

  69. Xie L, Pan P, Lu Y (2015) Markov random field based fusion for supervised and semi-supervised multi-modal image classification. Multimed Tools Appl 613–634

  70. Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL, 2012 Parsing clothing in fashion photographs. In: Conference on computer vision and pattern recognition, pp 3570–3577

  71. Yamaguchi K, Kiapour MH, Berg TL (2013) Paper doll parsing: retrieving similar styles to parse clothing items. In: International conference on computer vision, pp 3519–3526

  72. Yang M, Yu K (2011) Real-time clothing recognition in surveillance videos. In: International conference on image processing, pp 2937–2940

  73. Yang S, Zha H, Hu B (2009) Dirichlet-bernoulli alignment: a generative model for multi-class multi-label multi-instance corpora. In: Neural information processing systems, pp 2143–2150

  74. Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: Conference on computer vision and pattern recognition, pp 1385–1392

  75. Zegarra J, Leite N, Torres R (2008) Wavelet-based feature extraction for fingerprint image retrieval. J Comput Appl Math

  76. Zhang D, Lu G (2004) Review of shape representation and description techniques. Pattern Recogn 37(1):1–19

    Article  Google Scholar 

  77. Zhang D, Islam M M, Lu G (2012) A review on automatic image annotation techniques. Pattern Recogn 45(1):346–362

    Article  Google Scholar 

  78. Zhaolao L, Zhou M, Wang X, Fu Y, Tan X (2013) Semantic annotation method of clothing image. In: International conference on human-computer interaction, pp 289–298

  79. Zhou Z, Zhang M, Huang S, Li Y (2012) Multi-instance multi-label learning. Artif Intell 176(1):2291–2320

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge grants from CNPq (grant 449638/2014-6), CAPES, Fundação de Apoio à Pesquisa do Estado de Minas Gerais (Fapemig, under the grant APQ-00768-14), PRPq/Universidade Federal de Minas Gerais, Finep, and InWeb − the Brazilian National Institute of Science and Technology for the Web.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keiller Nogueira.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nogueira, K., Veloso, A.A. & dos Santos, J.A. Pointwise and pairwise clothing annotation: combining features from social media. Multimed Tools Appl 75, 4083–4113 (2016). https://doi.org/10.1007/s11042-015-3087-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3087-2

Keywords

Navigation