skip to main content
research-article

Towards decrypting attractiveness via multi-modality cues

Authors Info & Claims
Published:19 August 2013Publication History
Skip Abstract Section

Abstract

Decrypting the secret of beauty or attractiveness has been the pursuit of artists and philosophers for centuries. To date, the computational model for attractiveness estimation has been actively explored in computer vision and multimedia community, yet with the focus mainly on facial features. In this article, we conduct a comprehensive study on female attractiveness conveyed by single/multiple modalities of cues, that is, face, dressing and/or voice, and aim to discover how different modalities individually and collectively affect the human sense of beauty. To extensively investigate the problem, we collect the Multi-Modality Beauty (M2B) dataset, which is annotated with attractiveness levels converted from manual k-wise ratings and semantic attributes of different modalities. Inspired by the common consensus that middle-level attribute prediction can assist higher-level computer vision tasks, we manually labeled many attributes for each modality. Next, a tri-layer Dual-supervised Feature-Attribute-Task (DFAT) network is proposed to jointly learn the attribute model and attractiveness model of single/multiple modalities. To remedy possible loss of information caused by incomplete manual attributes, we also propose a novel Latent Dual-supervised Feature-Attribute-Task (LDFAT) network, where latent attributes are combined with manual attributes to contribute to the final attractiveness estimation. The extensive experimental evaluations on the collected M2B dataset well demonstrate the effectiveness of the proposed DFAT and LDFAT networks for female attractiveness prediction.

References

  1. Aarabi, P., Hughes, D., Mohajer, K., and Emami, M. 2001. The automatic measurement of facial beauty. In Proceedings of the International Conference on Systems, Man and Cybernetics, 2644--2647.Google ScholarGoogle Scholar
  2. Alley, T. and Cunningham, M. 1991. Average faces are attractive, but very attractive faces are not average. Psych. Sci. 2, 123--125.Google ScholarGoogle ScholarCross RefCross Ref
  3. Beaupre, M. 2006. An ingroup advantage for confidence in emotion recognition judgments: The moderating effect of familiarity with the expressions of outgroup members. Personality Soc. Psych. Bull. 32, 16--26.Google ScholarGoogle ScholarCross RefCross Ref
  4. Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 4, 509--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Berg, T. L., Berg, A. C., and Shih, J. 2010. Automatic attribute discovery and characterization from noisy web data. In Proceedings of the European Conference on Computer Vision. 663--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bourdev, L., Maji, S., and Malik, J. 2011. Describing people: A poselet-based approach to attribute classification. In Proceedings of the International Conference on Computer Vision. 1543--1550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brinton, D. 1890. Races and Peoples: Lectures on the Science of Ethnography. N.D.C. Hodges.Google ScholarGoogle Scholar
  8. Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham, J. 1995. Active shape models-their training and application. Computer Vision Image Understand. 61, 1, 38--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dalal, N. and Triggs, B. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Daugman, J. 1985. Uncertainty relations for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. 2, 1160--1169.Google ScholarGoogle ScholarCross RefCross Ref
  11. Dion, K., Berscheid, E., and Walster, E. 1972. What is beautiful is good. J. Appl. Soc. Psych. 24, 90.Google ScholarGoogle Scholar
  12. Eisenthal, Y., Dror, G., and Ruppin, E. 2006. Facial attractiveness: Beauty and the machine. Neural Comput. 18, 1, 119--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Glassenberg, A., Feinberg, D., Jones, B., Little, A., and Debruine, L. 2009. Sex-dimorphic face shape preference in heterosexual and homosexual men and women. Arch. Sexual Behav. 39, 6, 1289--1296.Google ScholarGoogle ScholarCross RefCross Ref
  14. Gray, D., Yu, K., Xu, W., and Gong, Y. 2010. Predicting facial beauty without landmarks. In Proceedings of the European Conference on Computer Vision. 434--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Green, C. 1995. All that glitters: A review of psychological research on the aesthetics of the golden section. Perception 24, 937--968.Google ScholarGoogle ScholarCross RefCross Ref
  16. Guo, D. and Sim, T. 2009. Digital face makeup by example. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 73--79.Google ScholarGoogle Scholar
  17. Haykin, S. 1999. Neural Networks. Prentice Hall.Google ScholarGoogle Scholar
  18. Hughes, S., Dispenza, F., and Gallup, G. 2004. Ratings of voice attractiveness predict sexual behavior and body configuration. Evolution Human Behav. 25, 5, 295--304.Google ScholarGoogle ScholarCross RefCross Ref
  19. Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining. 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kagian, A., Dror, G., Leyvand, T., Cohen-Or, D., and Ruppin, E. 2005. A humanlike predictor of facial attractiveness. Adv. Neural Inf. Process. Sys. 649--656.Google ScholarGoogle Scholar
  21. Kumar, N., Belhumeur, P. N., and Nayar, S. K. 2008. Facetracer: A search engine for large collections of images with faces. In Proceedings of the European Conference on Computer Vision. 340--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lartillot, O. and Toiviainen, P. 2007. MIR in Matlab: A toolbox for musical feature extraction from audio. In Proceedings of the International Society for Music Information Retrieval Conference. 127--130.Google ScholarGoogle Scholar
  23. Lennon, S. 1990. Effects of clothing attractiveness on perceptions. Home Economics Res. J. 18, 303--310.Google ScholarGoogle ScholarCross RefCross Ref
  24. Likert, R. 1932. A technique for the measurement of attitudes. Arch. Psych. 22, 140, 1--55.Google ScholarGoogle Scholar
  25. Liu, S., Nguyen, T., Feng, J., Wang, M., and Yan, S. 2012a. Hi, magic closet, tell me what to wear! In Proceedings of the 20th ACM International Conference on Multimedia. 1333--1334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., and Yan, S. 2012b. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3330--3337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mittal, A., Zisserman, A., and Torr, P. H. S. 2011. Hand detection using multiple proposals. In Proceedings of the British Machine Vision Conference. 1--11.Google ScholarGoogle Scholar
  28. Nguyen, T., Liu, S., Ni, B., Tan, J., Rui, Y., and Yan, S. 2012. Sense beauty via face, dressing and/or voice. In Proceedings of the ACM International Conference on Multimedia. 239--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ojala, T., Pietikäinen, M., and Mäenpää, T. 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 7, 971--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision 42, 3, 145--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pallett, P., Link, S., and Lee, K. 2009. New golden ratios for facial beauty. Vision Res. 50, 149--154.Google ScholarGoogle ScholarCross RefCross Ref
  32. Parikh, D. and Grauman, K. 2011. Relative attributes. In Proceedings of the International Conference on Computer Vision. 503--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Riesenhuber, M. and Poggio, T. 1999. Hierarchical models of object recognition in cortex. Nature Neurosci. 2, 1019--1025.Google ScholarGoogle ScholarCross RefCross Ref
  34. Song, Z., Wang, M., Hua, X.-S., and Yan, S. 2011. Predicting occupation via human clothing and contexts. In Proceedings of the IEEE International Conference on Computer Vision. 1084--1091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tanaka, J., Kiefer, M., and Bukach, C. 2004. A holistic account of the own-race effect in face recognition: evidence from a cross-cultural study. Cognition 93, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  36. Viola, P. and Jones, M. 2004. Robust real-time face detection. Int. J. Comput. Vision 57, 2, 137--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yang, Y. and Ramanan, D. 2011. Articulated pose estimation with flexible mixtures-of-parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1385--1392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zuckerman, M. and Miyake, K. 1993. The attractive voice: What makes it so? J. Nonverbal Behavior 17, 119--135.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Towards decrypting attractiveness via multi-modality cues

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 9, Issue 4
      August 2013
      168 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/2501643
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 August 2013
      • Accepted: 1 February 2013
      • Revised: 1 December 2012
      • Received: 1 September 2012
      Published in tomm Volume 9, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader