It is our great pleasure to welcome you to the 2nd Human-centric Human Analysis Workshop (HUMA'21), which is co-located with ACM Multimedia 2021 in Chengdu, China. This workshop is concentrated on the human-centric multimedia analysis, which is one of the fundamental problems in multimedia understanding. It is a very challenging problem that involves multiple tasks such as face detection and recognition, human pose estimation, human action detection, person tracking, and so on. Today, ubiquitous multimedia sensors and large-scale computing infrastructures are producing a wide variety of big multimodality data for human-centric analysis, which provides rich knowledge to tackle these challenges. Researchers have strived to push the limits of human-centric multimedia analysis in various applications, such as intelligent surveillance, retailing, fashion design, and services. Therefore, the purpose of this workshop is to: 1) bring together the state-of-the-art research on human-centric multimedia analysis; 2) call for a coordinated effort to understand the opportunities and challenges emerging in human-centric multimedia analysis; 3) identify key tasks and evaluate the state-of-the-art methods; 4) showcase innovative methodologies and ideas; 5) introduce interesting real-world human-centric multimedia analysis systems or applications; and 6) propose new real-world datasets and discuss future directions. We solicit original contributions in all fields of human-centric multimedia analysis that explore the multi-modality data to understand the behavior of humans. We believe this workshop can offer a timely collection of research updates to benefit researchers and practitioners in the broad multimedia communities.
Proceeding Downloads
Modern Learning Methodologies for Co-Saliency Detection
Visual saliency computing aims to imitate the human visual attention mechanism to identify the most prominent or unique areas or objects from a visual scene. It is one of the basic low-level image processing techniques and can be applied to many ...
Learning Positional Priors for Pretraining 2D Pose Estimators
The target of 2D human pose estimation is to locate the keypoints of body parts from 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning neural networks, which are usually ...
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric
Temporal Sentence Grounding in Videos (TSGV), \ie, grounding a natural language sentence which indicates complex human activities in a long and untrimmed video sequence, has received unprecedented attentions over the last few years. Although each newly ...
NLOS Imaging Assisted Navigation for BVI
Assistive navigation techniques support the activities of blind or visually impaired (BVI) people and improve their life quality. However, current navigation systems cannot detect hidden objects that may run out and become obstacles. In this paper, we ...
Using Feature Interaction among GPS Data for Road Intersection Detection
Road intersection plays a vital role in road network construction, automatic drive, and intelligent transportation systems. Most methods detect road intersections only using geometrical features without spatio-temporal features, leading to insufficient ...
Modeling 3D Objects: Implications for Neuroscience, Behavioral and Medical Studies: A Case Demo
We have designed, developed and adapted 3D objects (3DOs) within the interactive environment for in-lab neuroscience research of motor control and the mirror neuron system (MNS) (Figure 1b; 3D view: https://p3d.in/0B202). The modeled 3DOs are ...
Cited By
- Yao M, Zhuang L, Li H and Wei J Learning label dependencies for visual information extraction Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, (6615-6623)
- Ji L, Park C, Rao Z and Chen Q Neural Image Popularity Assessment with Retrieval-augmented Transformer Proceedings of the 31st ACM International Conference on Multimedia, (2427-2436)
Index Terms
- Proceedings of the 2nd International Workshop on Human-centric Multimedia Analysis