MA3HO'11: The First International ACM Workshop on Multimedia access to 3D Human Objects (MA3HO'11) is held on November 2011 at Scottsdale, Arizona, USA in conjunction with ACM Multimedia 2011.
Motivations behind this initiative are strong: 3D is becoming increasingly popular in a number of economically relevant fields of application, including movies, graphic entertainments, security applications, data archives storage, search and retrieval. 3D cinema, online gaming and virtual reality, surveillance and security, mechanic parts management, medical imaging, structural and molecular biology, cultural heritage asset reproduction, improved human computer interaction, natural and multimodal interactivity are just a few of the potential applications. While 3D digital data was obtained by manual CAD and 3D modeling software until a few years ago, nowadays laser scanners and computer vision technology make it possible to get high resolution textured 3D models from real world data at very fast pace. Dynamic 3D models can be captured from moving targets as well. Low cost devices like the Kinect 3D scanner permit to obtain low resolution 3D full scans in real time at short distance. Computer vision solutions permit fast extraction of interest points in the images, compute their geometrical relationships and perform approximate 3D reconstruction of observed objects or scenes. Smart tracking algorithms, while observing a target from different viewpoints permit to reconstruct its 3D silhouette and provide a realistic avatar of the moving target. Pan Tilt Zoom cameras make it possible to capture high resolution images of far targets and potentially permit their 3D reconstruction from such a sequence. 3D object databases are rapidly emerging in many application fields, so paving the way to large scale 3D content-based retrieval over the Internet. Web3D is near to come and will enable access to 3D materials of high quality. Sharing, retrieving and reusing 3D content will be soon exchanged between professionals of 3D data.
To restrict the scope of interest this Workshop was focused on 3D human objects, intended both as parts of 3D human bodies and 3D parts for humans, i.e.: people silhouettes, head and torso models, arms and hand models, body, faces and faces parts such as lips or objects handled and interacting humans and eventually 3D environment where humans acts. We particularly envision the task of matching 3D models with 3D models or 2D image data. In surveillance and security, for example, matching of 2D face images with 3D face models permits to exploit both appearance and structural information to perform target identification, so superseding the limitations of traditional 2D face matching. While 3D face databases are becoming more and more available, 3D face matching is becoming an important topic of investigation for advanced security applications. New recognition and tracking applications will fully exploit 3D body behaviors. Real time reconstruction of the 3D target body and face from multiple 2D views, makes live 3D body modeling, identification, re-identification a new opportunity in surveillance long term tracking and and forensic applications, easing the task of behavior analysis and recognition. Expression analysis, human machine interaction with natural interfaces are all fields where 3D can improve with respect the the current state of the art. A growing number of benchmark and dataset of 3D human objects was made available from research projects. Examples are the TRICTRAC project where a number of video clips were rendered in 3D, the Carnegie Mellon University Motion Capture Database, for human bodies and interactions (http://mocap.cs.cmu.edu/); the Multi-view 3D Human Pose Estimation benchmark at CVPR2009 (http://www.gavrila.net/Research/3-D_Human_Body_Tracking) and the 3D multiview object modeling for re-identification, by the EU project THIS (http://imagelab.ing.unimore.it/3dpes/).
The MA3HO workshop is aimed at taking a leap forward in emerging research of multimedia access of 3D human objects, merging researchers in 3D graphics, 3D object recognition and retrieval, Multimedia, with attention to application fields where humans are highly significant, such as security, surveillance and biometry, animation and entertainment, video retrieval, sport analytics, natural interaction, cultural heritage, augmented and virtual reality and world wide web. Main subjects addressed are among the others:
3D human objects reconstruction from 2D views
3D pose estimation from 2D information
2D to 3D human object matching
3D human object categorization
3D people identification and re-identification
3D object/face similarity matching, indexing, and mining
Feature extraction for 3D model segmentation
Feature extraction for 3D motion detection and behavior classification
3D shape descriptors
Retrieval with large distributed and heterogeneous 3D datasets and benchmarking
Semantics-driven 3D object retrieval and classification
3D natural interfaces and search modalities
The workshop has attracted 18 good quality submissions fairly distributed among different countries: China, Japan, Canada, USA, France, Italy and Germany. Many of the key arguments of the workshop call were addressed. The MA3HO Technical Program Committee, after careful review and evaluation, only selected 6 papers for oral presentation and 7 papers for poster presentation, in order to have a selective high quality event, in the spirit of the ACM MULTIMEDIA conference.
SSPW'11: It is a pleasure and an honor to have organized the Third International Workshop on Social Signal Processing (SSPW'11), held on December 1, 2011, in Scottsdale, Arizona, USA in conjunction with ACM Multimedia 2011.
Machine analysis of human social behaviors and machine synthesis of human-like socially-aware interactions is of utmost importance for research on next-generation computing and multimedia including ambient intelligence, smart environments/ multimedia, and perceptual interfaces/multimedia. This field -- widely know as Social Signal Processing -- has witness a surge of interest in the past couple of years and is progressing rapidly with new or pending applications in HCI, psychology, biomedicine, politics, and entertainment technology, among other domains. With these advances come new conceptual and methodological challenges. The SSPW'11 workshop is the third edition of the Social Signal Processing Workshop series and it presents cutting-edge research and new challenges in automatic analysis and synthesis of social interactions and social signals in an interdisciplinary forum of computer and behavioral scientists.
The workshop series is the premier forum for presenting research in social signal processing and the related topics. The workshop provides a rich forum for sharing and generating allied technologies: the generation of new ideas, new approaches, new techniques, and new evaluation. The workshop is organized under the auspices of the SSPNet, the FP7 European Network of Excellence on Social Signal Processing (EC FP7 grant agreement no. 231287), and continues the tradition of the previous SSPW workshops by maintaining the high standard set by its predecessors.
Main topics discussed during the SSPW workshop series include the following:
Social Intelligence, Social Cognition and Social Behavior Modeling
Facial behavior analysis and synthesis in social interactions
Expressive speech analysis and synthesis in social interactions
Human gesture and action recognition and synthesis in social interactions
Multimodal human behavior analysis and synthesis in social interactions
Perceptual, multimodal, and socially-aware user interfaces
Socially-adept Embodied Conversational Agents
Data Mining, Machine Learning, Information Retrieval, Artificial Intelligence in Social Contexts
Databases for training and testing
Socially-aware computing and applications (reality mining, implicit multimedia tagging, etc.)
The SSPW'11 workshop program includes a number of Keynote talks and a poster session. For the workshop we have received 13 good quality submissions. Each of these was assessed by no fewer than two reviewers. The final SSPW'11 program consists of four Keynote talks by Hatice Gunes (Queen Mary University London, UK), Shri Narayanan (University of Southern California, USA), Matthias Mehl (University of Arizona, USA), and Louis-Philippe Morency (Institute of Creative Technologies, USC, USA), and a poster session with 4 papers. The Keynote and poster presentations bring together related communities to share the latest findings and ideas and pursue continuing and new collaborations in research on social signal processing.
Proceeding Downloads
The sounds of social life: naturalistic (acoustic) observation sampling
This paper reviews a novel methodology called the Electronically Activated Recorder or EAR. The EAR is a portable audio recorder that periodically records snippets of ambient sounds from participants' momentary environments. In tracking moment-to-moment ...
Behavioral signal processing for understanding (distressed) dyadic interactions: some recent developments
The expression and experience of human behavior manifestations are complex and are characterized by individual and contextual heterogeneity. Many domains rely on interpreting behavior -- especially those that are distressed and atypical -- through the ...
Computational study of human communication dynamic
Face-to-face communication is a highly dynamic process where participants mutually exchange and interpret linguistic and gestural signals. Even when only one person speaks at the time, other participants exchange information continuously amongst ...
A survey of perception and computation of human beauty
Perception of (facial or bodily) beauty has long been debated amongst philosophers, artists, psychologists and anthropologists. Ancient philosophers claimed that there is a timeless, aesthetic ideal concept of beauty based on proportions, symmetry, ...
Automatic recognition of coordination level in an imitation task
Automatic analysis of human-human degree of coordination bears challenging questions. In this paper, we propose to automatically predict the degree of coordination between dyadic partners performing an imitation task. A subjective evaluation of their ...
Multimodal recognition of personality during short self-presentations
Personality plays an important role in the way people manage the images they convey in self-presentations and employment interviews, trying to affect the other's first impressions and increase effectiveness. This paper addresses the automatically ...
Incorporating uncertainty in a layered HMM architecture for human activity recognition
In this study, conditioned HMM (CHMM), which inherit the structure from the latent-dynamic conditional random field(LDCRF) proposed by Morency et al. but is also based on a Bayesian network [1, 2]. Within the model a sequence of class labels is ...
Person authentication using 3D human motion
This paper presents a novel approach to identify and/or verify persons by using three-dimensional dynamic and structural features extracted from human motion depicted on image streams. These features are extracted from body landmarks which are detected ...
Estimation and utilization of articulations in recovering non-rigid structure from motion using motion subspaces
Estimation of non-rigid structure from motion (NRSFM) has often been performed as a linear combination of basis shapes. However, when dealing with scenes containing human articulated motion (especially in presence of clothing), the number of basis ...
Human action recognition using multiple views: a comparative perspective on recent developments
This paper presents a review and comparative study of recent multi-view 2D and 3D approaches for human action recognition. The approaches are reviewed and categorized due to their nature. We report a comparison of the most promising methods using two ...
Fully automatic 3D facial expression recognition using a region-based approach
In this paper, we address the problem of automatic 3D facial expression recognition. Automatic 3D Facial Expression Recognition techniques are generally limited in that they require manual, precise landmark points. Here, we propose a framework capable ...
3DPeS: 3D people dataset for surveillance and forensics
The interest of the research community in creating reference datasets for performance analysis is always very high. Although new datasets, collecting large amounts of video footage are spreading in surveillance and forensics, few bench-marks with ...
3D partial face matching using local shape descriptors
In this work, we propose and experiment an original solution to 3D face recognition that supports accurate face matching also in cases where just some parts of probe scans are available. In the proposed approach, distinguishing traits of the face are ...
Multi-stage feature point detection for 3D human data
In this paper, we present an automatic approach to detect feature points on 3D human models. Instead of simultaneously detecting all feature points, as previous approaches do, our algorithm recursively detect feature points by using a multi-stage ...
Human motion classification and management based on mocap data analysis
Human motion understanding based on motion capture (mocap) data is investigated. Recent rapid developments and applications of mocap systems have resulted in a large corpus of mocap sequences, and an automated annotation technique that can classify ...
3D perceptual shape feature-based body parts classification and pose estimation
Human body motion and gesture analysis has been boosted by the latest developments of 3D cameras and the high demands of emerging applications. Body parts classification and pose estimation are essential for the human body tracking and motion ...
Landmark recognition and retrieval: from 2D to 3D
Existing landmark retrieval methods cannot provide a comprehensive solution, by which user can view different angles of landmark. In this paper, we propose a novel approach to reconstruct and retrieve 3D landmark models by direct 2D to 3D matching. In ...
The florence 2D/3D hybrid face dataset
This article describes a new dataset under construction at the Media Integration and Communication Center and the University of Florence. The dataset consists of high-resolution 3D scans of human faces along with several video sequences of varying ...
Index Terms
- Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding