skip to main content
10.1145/2072572acmconferencesBook PagePublication PagesmmConference Proceedingsconference-collections
J-HGBU '11: Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding
ACM2011 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
MM '11: ACM Multimedia Conference Scottsdale Arizona USA 1 December 2011
ISBN:
978-1-4503-0998-1
Published:
01 December 2011
Sponsors:
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN

Reflects downloads up to 19 Feb 2025Bibliometrics
Skip Abstract Section
Abstract

MA3HO'11: The First International ACM Workshop on Multimedia access to 3D Human Objects (MA3HO'11) is held on November 2011 at Scottsdale, Arizona, USA in conjunction with ACM Multimedia 2011.

Motivations behind this initiative are strong: 3D is becoming increasingly popular in a number of economically relevant fields of application, including movies, graphic entertainments, security applications, data archives storage, search and retrieval. 3D cinema, online gaming and virtual reality, surveillance and security, mechanic parts management, medical imaging, structural and molecular biology, cultural heritage asset reproduction, improved human computer interaction, natural and multimodal interactivity are just a few of the potential applications. While 3D digital data was obtained by manual CAD and 3D modeling software until a few years ago, nowadays laser scanners and computer vision technology make it possible to get high resolution textured 3D models from real world data at very fast pace. Dynamic 3D models can be captured from moving targets as well. Low cost devices like the Kinect 3D scanner permit to obtain low resolution 3D full scans in real time at short distance. Computer vision solutions permit fast extraction of interest points in the images, compute their geometrical relationships and perform approximate 3D reconstruction of observed objects or scenes. Smart tracking algorithms, while observing a target from different viewpoints permit to reconstruct its 3D silhouette and provide a realistic avatar of the moving target. Pan Tilt Zoom cameras make it possible to capture high resolution images of far targets and potentially permit their 3D reconstruction from such a sequence. 3D object databases are rapidly emerging in many application fields, so paving the way to large scale 3D content-based retrieval over the Internet. Web3D is near to come and will enable access to 3D materials of high quality. Sharing, retrieving and reusing 3D content will be soon exchanged between professionals of 3D data.

To restrict the scope of interest this Workshop was focused on 3D human objects, intended both as parts of 3D human bodies and 3D parts for humans, i.e.: people silhouettes, head and torso models, arms and hand models, body, faces and faces parts such as lips or objects handled and interacting humans and eventually 3D environment where humans acts. We particularly envision the task of matching 3D models with 3D models or 2D image data. In surveillance and security, for example, matching of 2D face images with 3D face models permits to exploit both appearance and structural information to perform target identification, so superseding the limitations of traditional 2D face matching. While 3D face databases are becoming more and more available, 3D face matching is becoming an important topic of investigation for advanced security applications. New recognition and tracking applications will fully exploit 3D body behaviors. Real time reconstruction of the 3D target body and face from multiple 2D views, makes live 3D body modeling, identification, re-identification a new opportunity in surveillance long term tracking and and forensic applications, easing the task of behavior analysis and recognition. Expression analysis, human machine interaction with natural interfaces are all fields where 3D can improve with respect the the current state of the art. A growing number of benchmark and dataset of 3D human objects was made available from research projects. Examples are the TRICTRAC project where a number of video clips were rendered in 3D, the Carnegie Mellon University Motion Capture Database, for human bodies and interactions (http://mocap.cs.cmu.edu/); the Multi-view 3D Human Pose Estimation benchmark at CVPR2009 (http://www.gavrila.net/Research/3-D_Human_Body_Tracking) and the 3D multiview object modeling for re-identification, by the EU project THIS (http://imagelab.ing.unimore.it/3dpes/).

The MA3HO workshop is aimed at taking a leap forward in emerging research of multimedia access of 3D human objects, merging researchers in 3D graphics, 3D object recognition and retrieval, Multimedia, with attention to application fields where humans are highly significant, such as security, surveillance and biometry, animation and entertainment, video retrieval, sport analytics, natural interaction, cultural heritage, augmented and virtual reality and world wide web. Main subjects addressed are among the others:

  • 3D human objects reconstruction from 2D views

  • 3D pose estimation from 2D information

  • 2D to 3D human object matching

  • 3D human object categorization

  • 3D people identification and re-identification

  • 3D object/face similarity matching, indexing, and mining

  • Feature extraction for 3D model segmentation

  • Feature extraction for 3D motion detection and behavior classification

  • 3D shape descriptors

  • Retrieval with large distributed and heterogeneous 3D datasets and benchmarking

  • Semantics-driven 3D object retrieval and classification

  • 3D natural interfaces and search modalities

The workshop has attracted 18 good quality submissions fairly distributed among different countries: China, Japan, Canada, USA, France, Italy and Germany. Many of the key arguments of the workshop call were addressed. The MA3HO Technical Program Committee, after careful review and evaluation, only selected 6 papers for oral presentation and 7 papers for poster presentation, in order to have a selective high quality event, in the spirit of the ACM MULTIMEDIA conference.

SSPW'11: It is a pleasure and an honor to have organized the Third International Workshop on Social Signal Processing (SSPW'11), held on December 1, 2011, in Scottsdale, Arizona, USA in conjunction with ACM Multimedia 2011.

Machine analysis of human social behaviors and machine synthesis of human-like socially-aware interactions is of utmost importance for research on next-generation computing and multimedia including ambient intelligence, smart environments/ multimedia, and perceptual interfaces/multimedia. This field -- widely know as Social Signal Processing -- has witness a surge of interest in the past couple of years and is progressing rapidly with new or pending applications in HCI, psychology, biomedicine, politics, and entertainment technology, among other domains. With these advances come new conceptual and methodological challenges. The SSPW'11 workshop is the third edition of the Social Signal Processing Workshop series and it presents cutting-edge research and new challenges in automatic analysis and synthesis of social interactions and social signals in an interdisciplinary forum of computer and behavioral scientists.

The workshop series is the premier forum for presenting research in social signal processing and the related topics. The workshop provides a rich forum for sharing and generating allied technologies: the generation of new ideas, new approaches, new techniques, and new evaluation. The workshop is organized under the auspices of the SSPNet, the FP7 European Network of Excellence on Social Signal Processing (EC FP7 grant agreement no. 231287), and continues the tradition of the previous SSPW workshops by maintaining the high standard set by its predecessors.

Main topics discussed during the SSPW workshop series include the following:

  • Social Intelligence, Social Cognition and Social Behavior Modeling

  • Facial behavior analysis and synthesis in social interactions

  • Expressive speech analysis and synthesis in social interactions

  • Human gesture and action recognition and synthesis in social interactions

  • Multimodal human behavior analysis and synthesis in social interactions

  • Perceptual, multimodal, and socially-aware user interfaces

  • Socially-adept Embodied Conversational Agents

  • Data Mining, Machine Learning, Information Retrieval, Artificial Intelligence in Social Contexts

  • Databases for training and testing

  • Socially-aware computing and applications (reality mining, implicit multimedia tagging, etc.)

The SSPW'11 workshop program includes a number of Keynote talks and a poster session. For the workshop we have received 13 good quality submissions. Each of these was assessed by no fewer than two reviewers. The final SSPW'11 program consists of four Keynote talks by Hatice Gunes (Queen Mary University London, UK), Shri Narayanan (University of Southern California, USA), Matthias Mehl (University of Arizona, USA), and Louis-Philippe Morency (Institute of Creative Technologies, USC, USA), and a poster session with 4 papers. The Keynote and poster presentations bring together related communities to share the latest findings and ideas and pursue continuing and new collaborations in research on social signal processing.

Skip Table Of Content Section
SESSION: Keynote address 1
keynote
The sounds of social life: naturalistic (acoustic) observation sampling

This paper reviews a novel methodology called the Electronically Activated Recorder or EAR. The EAR is a portable audio recorder that periodically records snippets of ambient sounds from participants' momentary environments. In tracking moment-to-moment ...

SESSION: Keynote address 2
keynote
Behavioral signal processing for understanding (distressed) dyadic interactions: some recent developments

The expression and experience of human behavior manifestations are complex and are characterized by individual and contextual heterogeneity. Many domains rely on interpreting behavior -- especially those that are distressed and atypical -- through the ...

SESSION: Keynote address 3
keynote
Computational study of human communication dynamic

Face-to-face communication is a highly dynamic process where participants mutually exchange and interpret linguistic and gestural signals. Even when only one person speaks at the time, other participants exchange information continuously amongst ...

SESSION: Keynote address 4
keynote
A survey of perception and computation of human beauty

Perception of (facial or bodily) beauty has long been debated amongst philosophers, artists, psychologists and anthropologists. Ancient philosophers claimed that there is a timeless, aesthetic ideal concept of beauty based on proportions, symmetry, ...

POSTER SESSION: SSPW poster session
poster
Automatic recognition of coordination level in an imitation task

Automatic analysis of human-human degree of coordination bears challenging questions. In this paper, we propose to automatically predict the degree of coordination between dyadic partners performing an imitation task. A subjective evaluation of their ...

poster
Multimodal recognition of personality during short self-presentations

Personality plays an important role in the way people manage the images they convey in self-presentations and employment interviews, trying to affect the other's first impressions and increase effectiveness. This paper addresses the automatically ...

poster
Incorporating uncertainty in a layered HMM architecture for human activity recognition

In this study, conditioned HMM (CHMM), which inherit the structure from the latent-dynamic conditional random field(LDCRF) proposed by Morency et al. but is also based on a Bayesian network [1, 2]. Within the model a sequence of class labels is ...

SESSION: MA3HO Session 1
research-article
Person authentication using 3D human motion

This paper presents a novel approach to identify and/or verify persons by using three-dimensional dynamic and structural features extracted from human motion depicted on image streams. These features are extracted from body landmarks which are detected ...

research-article
Estimation and utilization of articulations in recovering non-rigid structure from motion using motion subspaces

Estimation of non-rigid structure from motion (NRSFM) has often been performed as a linear combination of basis shapes. However, when dealing with scenes containing human articulated motion (especially in presence of clothing), the number of basis ...

research-article
Human action recognition using multiple views: a comparative perspective on recent developments

This paper presents a review and comparative study of recent multi-view 2D and 3D approaches for human action recognition. The approaches are reviewed and categorized due to their nature. We report a comparison of the most promising methods using two ...

research-article
Fully automatic 3D facial expression recognition using a region-based approach

In this paper, we address the problem of automatic 3D facial expression recognition. Automatic 3D Facial Expression Recognition techniques are generally limited in that they require manual, precise landmark points. Here, we propose a framework capable ...

research-article
3DPeS: 3D people dataset for surveillance and forensics

The interest of the research community in creating reference datasets for performance analysis is always very high. Although new datasets, collecting large amounts of video footage are spreading in surveillance and forensics, few bench-marks with ...

research-article
3D partial face matching using local shape descriptors

In this work, we propose and experiment an original solution to 3D face recognition that supports accurate face matching also in cases where just some parts of probe scans are available. In the proposed approach, distinguishing traits of the face are ...

SESSION: MA3HO session 2
abstract
Multi-stage feature point detection for 3D human data

In this paper, we present an automatic approach to detect feature points on 3D human models. Instead of simultaneously detecting all feature points, as previous approaches do, our algorithm recursively detect feature points by using a multi-stage ...

abstract
Human motion classification and management based on mocap data analysis

Human motion understanding based on motion capture (mocap) data is investigated. Recent rapid developments and applications of mocap systems have resulted in a large corpus of mocap sequences, and an automated annotation technique that can classify ...

abstract
3D perceptual shape feature-based body parts classification and pose estimation

Human body motion and gesture analysis has been boosted by the latest developments of 3D cameras and the high demands of emerging applications. Body parts classification and pose estimation are essential for the human body tracking and motion ...

abstract
Landmark recognition and retrieval: from 2D to 3D

Existing landmark retrieval methods cannot provide a comprehensive solution, by which user can view different angles of landmark. In this paper, we propose a novel approach to reconstruct and retrieve 3D landmark models by direct 2D to 3D matching. In ...

abstract
The florence 2D/3D hybrid face dataset

This article describes a new dataset under construction at the Media Integration and Communication Center and the University of Florence. The dataset consists of high-resolution 3D scans of human faces along with several video sequences of varying ...

Contributors
  • University of Modena and Reggio Emilia
  • Imperial College London
  • University of Lille
  • University of Florence
  • Massachusetts Institute of Technology
  • University of Glasgow
Index terms have been assigned to the content through auto-classification.

Recommendations