research-article

Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals

Authors:

Héctor P. Martínez,

Georgios N. YannakakisAuthors Info & Claims

ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Pages 34 - 41

https://doi.org/10.1145/2663204.2663236

Published: 12 November 2014 Publication History

Abstract

Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when studying human behaviour it is common to annotate actions performed by the participant over several other modalities such as video recordings of the face or physiological signals. These events are nominal, not frequent and are not sampled at a continuous rate while signals are numeric and often sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among these modalities which is often studied after each modality has been summarised or reduced. This paper investigates a novel approach to model the relation between such modality types bypassing the need for summarising each modality independently of each other. For that purpose, we introduce a deep learning model based on convolutional neural networks that is adapted to process multiple modalities at different time resolutions we name deep multimodal fusion. Furthermore, we introduce and compare three alternative methods (convolution, training and pooling fusion) to integrate sequences of events with continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events. Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher prediction accuracies compared to single-modality models. In addition, it appears that pooling fusion, based on a novel filter-pooling method provides the more effective fusion approach for the investigated types of data.

References

[1]

L. Barrett, B. Mesquita, K. Ochsner, and J. Gross. The experience of emotion. Annual review of psychology, 58:373, 2007.

[2]

V. E. Farrugia, H. P. Martínez, and G. N. Yannakakis. The preference learning toolbox. Technical Report IDG-2014-01, Institute of Digital Games, University of Malta, 2014.

[3]

J. Fürnkranz and E. Hüllermeier. Preference learning. Springer, 2010.

Digital Library

[4]

G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504--507, 2006.

[5]

G. E. Hinton and R. S. Zemel. Autoencoders, minimum description length, and helmholtz free energy. In Advances in neural information processing systems (NIPS), 1994.

[6]

A. Jain and D. Zongker. Feature selection: evaluation, application, and small sample performance. IEEE transactions on pattern analysis and machine intelligence, 19(2):153--158, 1997.

Digital Library

[7]

W. Krzanowski. The performance of fisher's linear discriminant function under non-optimal conditions. Technometrics, 19(2):191--200, 1977.

[8]

Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. In The handbook of brain theory and neural networks, volume 3361. Cambridge, MA: MIT Press, 1995.

Digital Library

[9]

H. P. Martínez, Y. Bengio, and G. N. Yannakakis. Learning deep physiological models of affect. Computational Intelligence Magazine, IEEE, 9(1):20--33, 2013.

Digital Library

[10]

H. P. Martínez and G. N. Yannakakis. Genetic search feature selection for affective modeling: a case study on reported preferences. In Proceedings of international workshop on Affective interaction in natural environments (AFFINE), pages 15--20. ACM, 2010.

Digital Library

[11]

H. P. Martínez and G. N. Yannakakis. Mining multimodal sequential patterns: a case study on affect detection. In Proceedings of International Conference on Multimodal Interfaces (ICMI), pages 3--10. ACM, 2011.

Digital Library

[12]

S. Mcquiggan, B. Mott, and J. Lester. Modeling self-efficacy in intelligent tutoring systems: An inductive approach. User Modeling and User-Adapted Interaction, 18(1):81--123, 2008.

Digital Library

[13]

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 689--696, 2011.

Digital Library

[14]

P. O'Connor, D. Neil, S.-C. Liu, T. Delbruck, and M. Pfeiffer. Real-time classification and sensor fusion with a spiking deep belief network. Frontiers in neuroscience, 7, 2013.

[15]

T. Pahikkala, E. Tsivtsivadze, A. Airola, J. Järvinen, and J. Boberg. An efficient algorithm for learning to rank from preference graphs. Machine Learning, 75(1):129--165, 2009.

Digital Library

[16]

M. Pantic and L. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9):1370--1390, 2003.

[17]

S. Rifai, Y. Bengio, A. Courville, P. Vincent, and M. Mirza. Disentangling factors of variation for facial expression recognition. In Proceedings of European Conference on Computer Vision (ECCV), 2012.

Digital Library

[18]

D. Rumelhart. Backpropagation: theory, architectures, and applications. Lawrence Erlbaum, 1995.

Digital Library

[19]

N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2222--2230. Curran Associates, Inc., 2012.

[20]

S. Wold, K. Esbensen, and P. Geladi. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1):37--52, 1987.

[21]

G. Yannakakis, H. Martínez, and A. Jhala. Towards affective camera control in games. User Modeling and User-Adapted Interaction, 20(4):313--340, 2010.

Digital Library

Cited By

Wu DLiu ZChen ZGan STan KWan QWang Y(2025)LRMM: Low rank multi-scale multi-modal fusion for person re-identification based on RGB-NI-TIExpert Systems with Applications10.1016/j.eswa.2024.125716263(125716)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125716
Cao YDu H(2024)IDDNet: a deep interactive dual-domain convolutional neural network with auxiliary modality for fast MRI reconstructionJUSTC10.52396/JUSTC-2023-016954:3(0302)Online publication date: 2024
https://doi.org/10.52396/JUSTC-2023-0169
Zhukov ARivero ABenois-Pineau JZemmari AMosbah M(2024)A Hybrid System for Defect Detection on Rail Lines through the Fusion of Object and Context InformationSensors10.3390/s2404117124:4(1171)Online publication date: 10-Feb-2024
https://doi.org/10.3390/s24041171
Show More Cited By

Index Terms

Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Information systems
  1. Information systems applications
    1. Decision support systems
      1. Expert systems

Recommendations

Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking
iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion

Video hyperlinking represents a classical example of multimodal problems. Common approaches to such problems are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially ...
Exploring Fusion Strategies in Deep Multimodal Affect Prediction
Image Analysis and Processing – ICIAP 2022
Abstract
In this work, we explore the effectiveness of multimodal models for estimating the emotional state expressed continuously in the Valence/Arousal space. We consider four modalities typically adopted for the emotion recognition, namely audio (voice),...
Multimodal fusion: a new hybrid strategy for dialogue systems
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

This is a new hybrid fusion strategy based primarily on the implementation of two former and differentiated approaches to multimodal fusion [11] in multimodal dialogue systems. Both approaches, their predecessors and their respective advantages and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

November 2014

558 pages

ISBN:9781450328852

DOI:10.1145/2663204

General Chairs:
Albert Ali Salah
Boğaziçi University, Turkey
,
Jeffrey Cohn
University of Pittsburgh, USA
,
Björn Schuller
University of Passau, Germany and Imperial College London, UK
,
Program Chairs:
Oya Aran
Idiap Research Institute, Switzerland
,
Louis-Philippe Morency
University of Southern California, USA
,
Philip R. Cohen
Adapx, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Seventh Framework Programme

Conference

ICMI '14

Sponsor:

SIGCHI

ICMI '14: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 12 - 16, 2014

Istanbul, Turkey

Acceptance Rates

ICMI '14 Paper Acceptance Rate 51 of 127 submissions, 40%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

53
Total Citations
View Citations
841
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)5

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu DLiu ZChen ZGan STan KWan QWang Y(2025)LRMM: Low rank multi-scale multi-modal fusion for person re-identification based on RGB-NI-TIExpert Systems with Applications10.1016/j.eswa.2024.125716263(125716)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125716
Cao YDu H(2024)IDDNet: a deep interactive dual-domain convolutional neural network with auxiliary modality for fast MRI reconstructionJUSTC10.52396/JUSTC-2023-016954:3(0302)Online publication date: 2024
https://doi.org/10.52396/JUSTC-2023-0169
Zhukov ARivero ABenois-Pineau JZemmari AMosbah M(2024)A Hybrid System for Defect Detection on Rail Lines through the Fusion of Object and Context InformationSensors10.3390/s2404117124:4(1171)Online publication date: 10-Feb-2024
https://doi.org/10.3390/s24041171
Abdali SShaham SKrishnamachari B(2024)Multi-modal Misinformation Detection: Approaches, Challenges and OpportunitiesACM Computing Surveys10.1145/369734957:3(1-29)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3697349
Xie ZYu XGao XLi KShen S(2024)Recent Advances in Conventional and Deep Learning-Based Depth Completion: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320153435:3(3395-3415)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2022.3201534
Dietz SAltstidl TZanca DEskofier BNguyen A(2024)How Intermodal Interaction Affects the Performance of Deep Multimodal Fusion for Mixed-Type Time Series2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650421(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650421
Zhou JLiu XHe SLi MGu HChen T(2024)IBFNet: A Dual Auxiliary Branch Network for Multimodal Hidden Emotion Recognition2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM62325.2024.10822163(4098-4103)Online publication date: 3-Dec-2024
https://doi.org/10.1109/BIBM62325.2024.10822163
Hu YXu SCheng XZhou CXiong M(2023)AFSFusion: An Adjacent Feature Shuffle Combination Network for Infrared and Visible Image FusionApplied Sciences10.3390/app1309564013:9(5640)Online publication date: 3-May-2023
https://doi.org/10.3390/app13095640
Wang JWang GZhang XLiu LZeng HXiao LCao ZGu LLi T(2023)PATCHProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108857:3(1-24)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3610885
Pinitas KRenaudie DThomsen MBarthet MMakantasis KLiapis AYannakakis G(2023)Predicting Player Engagement in Tom Clancy's The Division 2: A Multimodal Approach via Pixels and Gamepad ActionsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614203(488-497)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3577190.3614203
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten