It is our great pleasure to welcome you to the 1st International Workshop on Human-Centric Multimedia Analysis (HuMA20). The workshop is co-located with ACM Multimedia 2020 in Seattle, United States. It addresses a very timely topic, the Human-centric multimedia analysis, which is one of the fundamental problems of multimedia understanding. It is a very challenging problem, which involves multiple tasks such as face detection and recognition, human body pattern analysis, person re-identification, human action detection, person tracking, human-object interaction, and so on. Today, multiple multimedia sensing technologies and large-scale computing infrastructures are producing at a rapid velocity a wide variety of big multi-modality data for human-centric analysis, which provide rich knowledge to help tackle these challenges. Researchers have strived to push the limits of human-centric multimedia analysis in a wide variety of applications, such as intelligent surveillance, retailing, fashion design, and services. Therefore, the purpose of this workshop is to: 1) bring together the state of the art research on human-centric multimedia analysis; 2) call for a coordinated effort to understand the opportunities and challenges emerging in human-centric multimedia analysis; 3) identify key tasks and evaluate the state-of-the-art methods; 4) showcase innovative methodologies and ideas; 5) introduce interesting real-world human-centric multimedia analysis systems or applications; and 6) propose new real-world datasets and discuss future directions. We solicit original contributions in all fields of human-centric multimedia analysis that explore the multi-modality data to help us understand the heavier of humans and promote the multimodal human-machine interaction. We believe the workshop will offer a timely collection of research updates to benefit the researchers and practitioners working in the broad multimedia communities. The call for papers for the workshop attracted 18 high-quality submissions from around the world of which 10 were accepted (55.6%).
Proceeding Downloads
Human-Centric Object Interactions - A Fine-Grained Perspective from Egocentric Videos
This talk aims to argue for a fine(r)-grained perspective onto human-object interactions. Motivation: Observe a person chopping some parsley. Can you detect the moment at which the parsley was first chopped? Whether the parsley was chopped coarsely or ...
Sensing, Understanding and Synthesizing Humans in an Open World
Sensing, understanding and synthesizing humans in images and videos have been a long-pursuing goal of computer vision and graphics, with extensive real-life applications. It is at the core of embodied intelligence. In this talk, I will discuss our work ...
Intra and Inter-modality Interactions for Audio-visual Event Detection
The presence of auditory and visual sensory streams enables human beings to obtain a profound understanding of a scene. While audio and visual signals are able to provide relevant information separately, the combination of both modalities offers more ...
Personalized User Modelling for Sleep Insight
Sleep is critical to leading a healthy lifestyle. Each day, most people go to sleep without any idea about how their night's rest is going to be. For an activity that humans spend around a third of their life doing, there is a surprising amount of ...
AI at the Disco: Low Sample Frequency Human Activity Recognition for Night Club Experiences
Human activity recognition (HAR) has grown in popularity as sensors have become more ubiquitous. Beyond standard health applications, there exists a need for embedded low cost, low power, accurate activity sensing for entertainment experiences. We ...
Unseen Activity Recognition in Space and Time
Progress in video understanding has been astonishing in the past decade. Classifying, localizing, tracking and even segmenting actor instances at the pixel level is now common place, thanks to label-supervised machine learning. Yet, it is becoming ...
Towards Purely Unsupervised Disentanglement of Appearance and Shape for Person Images Generation
There have been a fairly of research interests in exploring the disentanglement of appearance and shape from human images. Most existing endeavours pursuit this goal by either using training images with annotations or regulating the training process ...
R-FENet: A Region-based Facial Expression Recognition Method Inspired by Semantic Information of Action Units
Facial expression recognition is a challenging problem in real-world scenarios owing to obstacles of illumination, occlusion, pose variations, and low-quality images. Recent works have paid attention to the concept of the region of interest (RoI) to ...
StarGAN-EgVA: Emotion Guided Continuous Affect Synthesis
Recent advancement of Generative Adversarial Network (GAN) based architectures has achieved impressive performance on static facial expression synthesis. Continuous affect synthesis, which has applications in generating videos and movies, is ...
Human-Object Interaction Detection: A Quick Survey and Examination of Methods
Human-object interaction detection is a relatively new task in the world of computer vision and visual semantic information extraction. With the goal of machines identifying interactions that humans perform on objects, there are many real-world use ...
Online Video Object Detection via Local and Mid-Range Feature Propagation
This work proposes a new Local and Mid-range feature Propagation (LMP) method for video object detection to well capture feature correlations and reduce the redundant computation. Specifically, the proposed LMP model contains two modules with two ...
iWink: Exploring Eyelid Gestures on Mobile Devices
Although gaze has been widely studied for mobile interactions, eyelid-based gestures are relatively understudied and limited to few basic gestures (e.g., blink). In this work, we propose a gesture grammar to construct both basic and compound eyelid ...
Commonsense Learning: An Indispensable Path towards Human-centric Multimedia
Learning commonsense knowledge and conducting commonsense reasoning are basic human ability to make presumptions about the type and essence of ordinary situation in daily life, which serve as very important goals in human-centric Artificial Intelligence ...