ABSTRACT
The use of 360° content with sensory effects can enhance user immersion. However, creating such effects is complex and time-consuming as authors must annotate the spatial position (i.e., "origin of the effect") in 360°. To tackle this mutimedia authoring issue, this paper presents an extensible architecture to automatically recognize sensory effects in 360° images. The architecture is based on a data treatment strategy that divides multimedia content into several manageable parts, operates on each part independently, and then joins the responses. The proposed architecture is capable of taking advantage of the diversity of recognition solutions and adapting to a possible author configuration. We also propose an implementation that provides three effect recognition modules, including a neural network for locating effects in equirectangular projections and a computer vision algorithm for sun localization. The results offer valuable insights into the effectiveness of the system and highlight areas for improvement.
- Raphael Abreu, Joel dos Santos, and Eduardo Bezerra. 2018. A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization. In IJCNN '18 (Rio de Janeiro, Brazil). IEEE.Google ScholarCross Ref
- Raphael Abreu, Douglas Mattos, Joel Santos, George Guinea, and Débora C Muchaluat-Saade. 2023. Semi-automatic mulsemedia authoring analysis from the user's perspective. In Proceedings of the 14th Conference on ACM Multimedia Systems. 249--256.Google ScholarDigital Library
- James Clark et al. 1999. Xsl transformations (xslt). World Wide Web Consortium (W3C). URL http://www.w3.org/TR/xslt 103 (1999).Google Scholar
- Benjamin Coors, Alexandru Paul Condurache, and Andreas Geiger. 2018. Spherenet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision (ECCV). 518--533.Google ScholarDigital Library
- Alexandra Covaci, Ramona Trestian, Estêvão Bissoli Saleme, Ioan-Sorin Comsa, Gebremariam Assres, Celso AS Santos, and Gheorghita Ghinea. 2019. 360° Mulsemedia: a way to improve subjective QoE in 360° videos. In Proceedings of the 27th ACM International Conference on Multimedia. 2378--2386.Google ScholarDigital Library
- Raphael Silva de Abreu, Douglas Mattos, Joel dos Santos, Gheorghita Ghinea, and Débora Christina Muchaluat-Saade. 2020. Toward content-driven intelligent authoring of mulsemedia applications. IEEE MultiMedia 28, 1 (2020), 7--16.Google ScholarCross Ref
- Marcello Novaes de Amorim, Estêvão Bissoli Saleme, Fábio Ribeiro de Assis Neto, Celso AS Santos, and Gheorghita Ghinea. 2019. Crowdsourcing authoring of sensory effects on videos. Multimedia Tools and Applications 78 (2019), 19201--19227.Google ScholarDigital Library
- René Octivio Queiroz Dias and Díbio Leandro Borges. 2016. Recognizing Plant Species in the Wild: Deep Learning Results and a New Database. In 2016 IEEE International Symposium on Multimedia (ISM). 197--202. https://doi.org/10.1109/ISM.2016.0047Google ScholarCross Ref
- Wilfried Elmenreich. 2002. An introduction to sensor fusion. Vienna University of Technology, Austria 502 (2002), 1--28.Google Scholar
- Gabriel Giraldo, Myriam Servières, and Guillaume Moreau. 2020. Perception of multisensory wind representation in virtual reality. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 45--53.Google ScholarCross Ref
- Marina Josué, Raphael Abreu, Fábio Barreto, Douglas Mattos, Glauco Amorim, Joel dos Santos, and Débora Muchaluat-Saade. 2018. Modeling sensory effects as first-class entities in multimedia applications. In Proceedings of the 9th ACM Multimedia Systems Conference. 225--236.Google ScholarDigital Library
- Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, et al. 2020. The open images dataset v4. International Journal of Computer Vision (2020), 1--26.Google Scholar
- Matthew Lombard, Theresa B Ditton, and Lisa Weinstein. 2009. Measuring presence: the temple presence inventory. In Proceedings of the 12th annual international workshop on presence. 1--15.Google Scholar
- Anh Nguyen, Zhisheng Yan, and Klara Nahrstedt. 2018. Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In Proceedings of the 26th ACM international conference on Multimedia. 1190--1198.Google ScholarDigital Library
- Karol J Piczak. 2015. Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarCross Ref
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.Google ScholarCross Ref
- Thomhert S Siadari, Mikyong Han, and Hyunjin Yoon. 2017. 4D Effect Video Classification with Shot-Aware Frame Selection and Deep Neural Networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1148--1155.Google ScholarCross Ref
- Kristin Van Damme, Anissa All, Lieven De Marez, and Sarah Van Leuven. 2019. 360 video journalism: Experimental study on the effect of immersion on news experience and distant suffering. Journalism Studies 20, 14 (2019), 2053--2076.Google ScholarCross Ref
- Yuhao Zhou, Makarand Tapaswi, and Sanja Fidler. 2018. Now You Shake Me: Towards Automatic 4D Cinema. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7425--7434.Google Scholar
- Ziqi Zhu, Li Zhuo, Panling Qu, Kailong Zhou, and Jing Zhang. 2016. Extreme Weather Recognition Using Convolutional Neural Networks. In 2016 IEEE International Symposium on Multimedia (ISM). 621--625. https://doi.org/10.1109/ISM.2016.0133Google ScholarCross Ref
Index Terms
- An Extensible Architecture for Recognizing Sensory Effects in 360° Images
Recommendations
Sensory Effect Extraction for 360° Media Content
WebMedia '21: Proceedings of the Brazilian Symposium on Multimedia and the WebThe presentation of sensory effects in sync with 360° content has the potential to increase user immersion. However, the authoring process for such effects is laborious and slow. It requires the author to specify in time and space the presentation ...
Beyond Multimedia Authoring: On the Need for Mulsemedia Authoring Tools
The mulsemedia (Multiple Sensorial Media (MulSeMedia)) concept has been explored to provide users with new sensations using other senses beyond sight and hearing. The demand for producing such applications has motivated various studies in the mulsemedia ...
Crowdsourcing authoring of sensory effects on videos
Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the human senses. One ...
Comments