It is our great pleasure to welcome you to the 2018 ACM Multimedia Workshop on Audio-Visual Scene Understanding for Immersive Multimedia - AVSU 2018. Audio-visual data is the most familiar format of multimedia information acquired in our daily life, but audio and video processing have been researched in separate research areas for long time ignoring their synergy when they work together. Integrated audio-visual processing, building on leading research in each domain, has the potential to contribute significant advances in immersive multimedia production and reproduction. This workshop aims to provide a forum to exchange ideas in scene understanding techniques researched in audio and visual communities, and to ultimately unlock the creative potential of joint audio-visual signal processing to deliver a step change in various multimedia applications.
This workshop is following two successful UK-Korea Focal Point Workshops for Deep Audio-Visual Representation Learning for Multimedia Perception and Reproduction. The first workshop was held in the UK in conjunction with CVMP 2017. 3 demo systems and 4 talks were presented, and 40 people attended the workshop. The second workshop was held in South Korea in early 2018. 60 people attended, and 7 talks were given by invited speakers including the CTO of G'Audio Lab in the USA.
Proceeding Downloads
Multimodal Fusion Strategies: Human vs. Machine
Two-hour movie or a short movie clip as its subset is intended to capture and present a meaningful (or significant) story in video to be recognized and understood by human audience. What if we substitute the task of human audience with that of an ...
An Audio-Visual Method for Room Boundary Estimation and Material Recognition
In applications such as virtual and augmented reality, a plausible and coherent audio-visual reproduction can be achieved by deeply understanding the reference scene acoustics. This requires knowledge of the scene geometry and related materials. In this ...
A Deep Learning-based Stress Detection Algorithm with Speech Signal
In this paper, we propose a deep learning-based psychological stress detection algorithm using speech signals. With increasing demands for communication between human and intelligent systems, automatic stress detection is becoming an interesting ...
Spatial Audio on the Web - Create, Compress, and Render
The recent surge of VR and AR has spawned an interest in spatial audio beyond its traditional delivery over loudspeakers in, e.g., home theater environments, to headphone delivery over, e.g., mobile devices. In this talk we'll discuss a web-based ...
Generation Method for Immersive Bullet-Time Video Using an Omnidirectional Camera in VR Platform
This paper proposes a generation method of immersive bullet-time video that continuously switches the images captured by multi-viewpoint omnidirectional cameras arranged around the subject. In ordinary bullet-time processing, it is possible to observe a ...
Audio-Visual Attention Networks for Emotion Recognition
We present a spatiotemporal attention based multimodal deep neu- ral networks for dimensional emotion recognition in multimodal audio-visual video sequence. To learn the temporal attention that discriminatively focuses on emotional sailient parts within ...
Towards Realistic Immersive Audiovisual Simulations for Hearing Research: Capture, Virtual Scenes and Reproduction
Most current hearing research laboratories and hearing aid evaluation setups are not sufficient to simulate real-life situations and to evaluate future generations of hearing aids that might include gaze information and brain signals. Thus, new ...
Index Terms
- Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia