Abstract
The emergence of IoT and advanced multimedia information systems have undoubtedly created a proliferation of video sensor data. Although diverse machine learning approaches are utilized to extract useful insights from these data, limitations occur when processing and accommodating the large volumes of video data, which are unlabeled and have previously unseen data structures. This brings out the importance of using self-structuring intelligence that can adapt to the nature of the data and with the ability to learn from multi-modal, spatiotemporal and unstructured data. Encompassing these advances, we propose a recurrent self-structuring machine learning approach for video processing using multi-stream hierarchical recurrent growing self-organizing maps (RGSOM) architecture. We have designed, implemented and evaluated the said approach using a human activity recognition video dataset (Weizmann dataset), achieving state-of-the-art accuracy of 93.5% in the unsupervised domain. We used both spatial and temporal data from the video as separate input feature streams, where RGSOMs were used to self-structure the video data in multi-streams for visual exploratory analysis and video classification. As potential implications, this study can contribute to the existing literature in advancing self-adaptation techniques for video sensor data processing.





Similar content being viewed by others
References
Alahakoon D, Halgamuge SK, Srinivasan B (2000) Dynamic Self-Organizing Maps with Controlled Growth for Knoledge Discovery. IEEE Trans. Neural Netw. 11(3):601–614
Amarasiri R, Alahakoon D, Smith K, Premaratne M (2005) HDGSOMr: a high dimensional growing self-organizing map using randomness for efficient web and text mining, in IEEE/WIC/ACM International Conference on Web Intelligence (WI'05), 215–221
Cardullo F, Sweet B, Hosman R, Coon C (2011) The human visual system and its role in motion perception’, in AIAA Modeling and Simulation Technologies Conference, American Institute of Aeronautics and Astronautics
Chappell GJ, Taylor JG (Mar. 1993) The temporal Kohonen map. Neural Netw 6(3):441–445
Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 1932–1939
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 vol. 1
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features, in Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on, 65–72
Elloumi S, Cosar S, Pusiol G, Bremond F, Thonnat M (2015) Unsupervised discovery of human activities from long-time videos. IET Comput Vis 9(4):522–530
Fritzke B (1994) Growing cell structures—a self-organizing network for unsupervised and supervised learning. Neural Netw 7(9):1441–1460
Goldbeck J, Huertgen B (1999) Lane detection and tracking by video sensors, in Proceedings 199 IEEE/IEEJ/JSAI International Conference on Intelligent Transportation Systems (Cat. No.99TH8383), 74–79
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (Dec. 2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Hassabis D, Kumaran D, Summerfield C, Botvinick M (Jul. 2017) Neuroscience-Inspired Artificial Intelligence. Neuron 95(2):245–258
He Z, Wu D (May 2006) Resource allocation and performance analysis of wireless video sensors. IEEE Trans. Circuits Syst. Video Technol. 16(5):590–599
Kohonen T (Nov. 1998) The self-organizing map. Neurocomputing 21(1–3):1–6
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies, in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 1–8
Liu H, Chen S, Kubota N (Aug. 2013) Intelligent video systems and analytics: a survey. IEEE Trans Ind Inform 9(3):1222–1233
López-Rubio E, Luque-Baena RM, Domínguez E (Jun. 2011) Foreground detection in video sequences with probabilistic self-organizing maps. Int J Neural Syst 21(03):225–246
Lungarella M, Sporns O (2005) Information Self-Structuring: Key Principle for Learning and Development, in Proceedings. The 4th International Conference on Development and Learning, 2005, 25–30
Maddalena L, Petrosino A (Jul. 2008) A self-organizing approach to background subtraction for visual surveillance applications. IEEE Trans Image Process 17(7):1168–1177
Marrow P (Oct. 2000) Nature-inspired computing technology and applications. BT Technol J 18(4):13–23
Marsland S, Shapiro J, Nehmzow U (2002) A self-organising network that grows when required. Neural Netw 15(8–9):1041–1058
Nallaperuma D et al. (2019) Online incremental machine learning platform for big data-driven smart traffic management, IEEE Trans Intell Transp Syst, pp. 1–12
Nawaratne R, Alahakoon D, De Silva D, Yu X (2019) Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans Ind Inform 16(1):393–402
Nawaratne R, Bandaragoda T, Adikari A, Alahakoon D, De Silva D, Yu X (2017) Incremental knowledge acquisition and self-learning for autonomous video surveillance, in IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society, 4790–4795
Parisi GI, Magg S, Wermter S (2016) Human motion assessment in real time using recurrent self-organization, in 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 71–76
Parisi GI, Tani J, Weber C, Wermter S (Dec. 2017) Lifelong learning of human actions with deep neural network self-organization. Neural Netw 96:137–149
Peng B, Lei J, Fu H, Zhang C, Chua T, Li X (2018) Unsupervised video action clustering via motion-scene interaction constraint, IEEE Trans Circuits Syst Video Technol, pp. 1–1
Petrushin VA (2005) Mining rare and frequent events in multi-camera surveillance video using self-organizing maps, in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, New York, NY, USA, 794–800
Poggio T, Smale S The mathematics of learning: Dealing with data. Not. AMS 50(5):537–544
Sargano AB, Angelov P, Habib Z (Jan. 2017) A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition. Appl. Sci. 7(1):110
Strickert M, Hammer B (Mar. 2005) Merge SOM for temporal data. Neurocomputing 64:39–71
Voegtlin T (Oct. 2002) Recursive self-organizing maps. Neural Netw 15(8):979–991
Wang H, Kläser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Xu Z, Mei L, Hu C, Liu Y (Sep. 2016) The big data analytics and applications of the surveillance system using video structured description technology. Clust Comput 19(3):1283–1292
Yang Y, Saleemi I, Shah M (Jul. 2013) Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. IEEE Trans Pattern Anal Mach Intell 35(7):1635–1648
Acknowledgements
This work was supported by a La Trobe University Postgraduate Research Scholarship.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nawaratne, R., Adikari, A., Alahakoon, D. et al. Recurrent Self-Structuring Machine Learning for Video Processing using Multi-Stream Hierarchical Growing Self-Organizing Maps. Multimed Tools Appl 79, 16299–16317 (2020). https://doi.org/10.1007/s11042-020-08886-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08886-7