skip to main content
10.1145/2663204.2663236acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals

Published: 12 November 2014 Publication History

Abstract

Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when studying human behaviour it is common to annotate actions performed by the participant over several other modalities such as video recordings of the face or physiological signals. These events are nominal, not frequent and are not sampled at a continuous rate while signals are numeric and often sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among these modalities which is often studied after each modality has been summarised or reduced. This paper investigates a novel approach to model the relation between such modality types bypassing the need for summarising each modality independently of each other. For that purpose, we introduce a deep learning model based on convolutional neural networks that is adapted to process multiple modalities at different time resolutions we name deep multimodal fusion. Furthermore, we introduce and compare three alternative methods (convolution, training and pooling fusion) to integrate sequences of events with continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events. Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher prediction accuracies compared to single-modality models. In addition, it appears that pooling fusion, based on a novel filter-pooling method provides the more effective fusion approach for the investigated types of data.

References

[1]
L. Barrett, B. Mesquita, K. Ochsner, and J. Gross. The experience of emotion. Annual review of psychology, 58:373, 2007.
[2]
V. E. Farrugia, H. P. Martínez, and G. N. Yannakakis. The preference learning toolbox. Technical Report IDG-2014-01, Institute of Digital Games, University of Malta, 2014.
[3]
J. Fürnkranz and E. Hüllermeier. Preference learning. Springer, 2010.
[4]
G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504--507, 2006.
[5]
G. E. Hinton and R. S. Zemel. Autoencoders, minimum description length, and helmholtz free energy. In Advances in neural information processing systems (NIPS), 1994.
[6]
A. Jain and D. Zongker. Feature selection: evaluation, application, and small sample performance. IEEE transactions on pattern analysis and machine intelligence, 19(2):153--158, 1997.
[7]
W. Krzanowski. The performance of fisher's linear discriminant function under non-optimal conditions. Technometrics, 19(2):191--200, 1977.
[8]
Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. In The handbook of brain theory and neural networks, volume 3361. Cambridge, MA: MIT Press, 1995.
[9]
H. P. Martínez, Y. Bengio, and G. N. Yannakakis. Learning deep physiological models of affect. Computational Intelligence Magazine, IEEE, 9(1):20--33, 2013.
[10]
H. P. Martínez and G. N. Yannakakis. Genetic search feature selection for affective modeling: a case study on reported preferences. In Proceedings of international workshop on Affective interaction in natural environments (AFFINE), pages 15--20. ACM, 2010.
[11]
H. P. Martínez and G. N. Yannakakis. Mining multimodal sequential patterns: a case study on affect detection. In Proceedings of International Conference on Multimodal Interfaces (ICMI), pages 3--10. ACM, 2011.
[12]
S. Mcquiggan, B. Mott, and J. Lester. Modeling self-efficacy in intelligent tutoring systems: An inductive approach. User Modeling and User-Adapted Interaction, 18(1):81--123, 2008.
[13]
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 689--696, 2011.
[14]
P. O'Connor, D. Neil, S.-C. Liu, T. Delbruck, and M. Pfeiffer. Real-time classification and sensor fusion with a spiking deep belief network. Frontiers in neuroscience, 7, 2013.
[15]
T. Pahikkala, E. Tsivtsivadze, A. Airola, J. Järvinen, and J. Boberg. An efficient algorithm for learning to rank from preference graphs. Machine Learning, 75(1):129--165, 2009.
[16]
M. Pantic and L. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9):1370--1390, 2003.
[17]
S. Rifai, Y. Bengio, A. Courville, P. Vincent, and M. Mirza. Disentangling factors of variation for facial expression recognition. In Proceedings of European Conference on Computer Vision (ECCV), 2012.
[18]
D. Rumelhart. Backpropagation: theory, architectures, and applications. Lawrence Erlbaum, 1995.
[19]
N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2222--2230. Curran Associates, Inc., 2012.
[20]
S. Wold, K. Esbensen, and P. Geladi. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1):37--52, 1987.
[21]
G. Yannakakis, H. Martínez, and A. Jhala. Towards affective camera control in games. User Modeling and User-Adapted Interaction, 20(4):313--340, 2010.

Cited By

View all
  • (2025)LRMM: Low rank multi-scale multi-modal fusion for person re-identification based on RGB-NI-TIExpert Systems with Applications10.1016/j.eswa.2024.125716263(125716)Online publication date: Mar-2025
  • (2024)IDDNet: a deep interactive dual-domain convolutional neural network with auxiliary modality for fast MRI reconstructionJUSTC10.52396/JUSTC-2023-016954:3(0302)Online publication date: 2024
  • (2024)A Hybrid System for Defect Detection on Rail Lines through the Fusion of Object and Context InformationSensors10.3390/s2404117124:4(1171)Online publication date: 10-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction
November 2014
558 pages
ISBN:9781450328852
DOI:10.1145/2663204
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. auto-encoders
  2. behaviour
  3. convolutional neural networks
  4. deep learning
  5. multimodal fusion
  6. physiology
  7. pooling method
  8. sequence classification
  9. sequence fusion

Qualifiers

  • Research-article

Funding Sources

Conference

ICMI '14
Sponsor:

Acceptance Rates

ICMI '14 Paper Acceptance Rate 51 of 127 submissions, 40%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)5
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)LRMM: Low rank multi-scale multi-modal fusion for person re-identification based on RGB-NI-TIExpert Systems with Applications10.1016/j.eswa.2024.125716263(125716)Online publication date: Mar-2025
  • (2024)IDDNet: a deep interactive dual-domain convolutional neural network with auxiliary modality for fast MRI reconstructionJUSTC10.52396/JUSTC-2023-016954:3(0302)Online publication date: 2024
  • (2024)A Hybrid System for Defect Detection on Rail Lines through the Fusion of Object and Context InformationSensors10.3390/s2404117124:4(1171)Online publication date: 10-Feb-2024
  • (2024)Multi-modal Misinformation Detection: Approaches, Challenges and OpportunitiesACM Computing Surveys10.1145/369734957:3(1-29)Online publication date: 22-Nov-2024
  • (2024)Recent Advances in Conventional and Deep Learning-Based Depth Completion: A SurveyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320153435:3(3395-3415)Online publication date: Mar-2024
  • (2024)How Intermodal Interaction Affects the Performance of Deep Multimodal Fusion for Mixed-Type Time Series2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650421(1-8)Online publication date: 30-Jun-2024
  • (2024)IBFNet: A Dual Auxiliary Branch Network for Multimodal Hidden Emotion Recognition2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM62325.2024.10822163(4098-4103)Online publication date: 3-Dec-2024
  • (2023)AFSFusion: An Adjacent Feature Shuffle Combination Network for Infrared and Visible Image FusionApplied Sciences10.3390/app1309564013:9(5640)Online publication date: 3-May-2023
  • (2023)PATCHProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108857:3(1-24)Online publication date: 27-Sep-2023
  • (2023)Predicting Player Engagement in Tom Clancy's The Division 2: A Multimodal Approach via Pixels and Gamepad ActionsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614203(488-497)Online publication date: 9-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media