ABSTRACT
Yoga is gaining popularity for fitness and medical purposes due to its benefits in many aspects. However, inappropriate performance of yoga poses may have some disadvantageous effects. To help reduce inappropriate performance, the computer vision method, which is helpful in recognizing different types of yoga poses to provide suggestions and guidance, can be employed. In this work, a dataset was created by combining two existing datasets. Considering the relatively small scale of the dataset, transfer learning was used to perform yoga pose classification task. The Vision Transformer model was selected, fine-tuned, and evaluated for classifying yoga poses, reaching 92.61% accuracy and 92.62% F1 score. Compared with several typical CNN-based models of different scales, including GoogLeNet, ResNet, Inception, DenseNet, ShuffleNet, MobileNet, EfficientNet, the Vision Transformer had the best performance in every metric used in this work, including accuracy, precision, recall and F1 score, making it possible to be applied to real yoga pose classification tasks. Notably, due to the largest number of the parameters of Vision Transformer, reducing the parameters of Vision Transformer may be one of the research directions in the future.
- Vivek Anand Thoutam, Anugrah Srivastava, Tapas Badal, Vipul Kumar Mishra, G. R. Sinha, Aditi Sakalle, Harshit Bhardwaj, and Manish Raj. 2022. Yoga Pose Estimation and Feedback Generation Using Deep Learning. Computational Intelligence and Neuroscience 2022 (24 Mar 2022), 4311350. https://doi.org/10.1155/2022/4311350Google ScholarCross Ref
- Faisal Bin Ashraf, Muhammad Usama Islam, Md Rayhan Kabir, and Jasim Uddin. 2023. YoNet: A Neural Network for Yoga Pose Classification. SN Computer Science 4, 2 (08 Feb 2023), 198. https://doi.org/10.1007/s42979-022-01618-8Google ScholarDigital Library
- Kuan-Yu Chen, Jungpil Shin, Md. Al Mehedi Hasan, Jiun-Jian Liaw, Okuyama Yuichi, and Yoichi Tomioka. 2022. Fitness Movement Types and Completeness Detection Using a Transfer-Learning-Based Deep Neural Network. Sensors 22, 15 (2022). https://doi.org/10.3390/s22155700Google ScholarCross Ref
- Ujjwal Chowdhury. 2022. Yoga Pose Classification. Retrieved June 17, 2023 from https://www.kaggle.com/datasets/ujjwalchowdhury/yoga-pose-classification/Google Scholar
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arxiv:2010.11929 [cs.CV]Google Scholar
- Lakshmanaraja Kannan. 2022. Yoga Pose Dataset. Retrieved June 17, 2023 from https://www.kaggle.com/datasets/lakshmanarajak/yoga-datasetGoogle Scholar
- Hema Krishnan, Anagha Jayaraj, Anagha S, Christy Thomas, and Grace Mol Joy. 2022. Pose Estimation of Yoga Poses using ML Techniques. In 2022 IEEE 19th India Council International Conference (INDICON). 1–6. https://doi.org/10.1109/INDICON56171.2022.10040162Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90. https://doi.org/10.1145/3065386Google ScholarDigital Library
- A. Mooventhan and L. Nivethitha. 2017. Evidence based effects of yoga in neurological disorders. Journal of Clinical Neuroscience 43 (2017), 61–67. https://doi.org/10.1016/j.jocn.2017.05.012Google ScholarCross Ref
- Sachin Kumar Sharma, Shirley Telles, Kumar Gandharva, and Acharya Balkrishna. 2022. Yoga instructors’ reported benefits and disadvantages associated with functioning online: A convenience sampling survey. Complementary Therapies in Clinical Practice 46 (2022), 101509. https://doi.org/10.1016/j.ctcp.2021.101509Google ScholarCross Ref
- Debabrata Swain, Santosh Satapathy, Biswaranjan Acharya, Madhu Shukla, Vassilis C. Gerogiannis, Andreas Kanavos, and Dimitris Giakovis. 2022. Deep Learning Models for Yoga Pose Monitoring. Algorithms 15, 11 (2022). https://doi.org/10.3390/a15110403Google ScholarCross Ref
- Shirley Telles, Sachin Kumar Sharma, Dipak Chetry, and Acharya Balkrishna. 2021. Benefits and adverse effects associated with yoga practice: A cross-sectional survey from India. Complementary Therapies in Medicine 57 (2021), 102644. https://doi.org/10.1016/j.ctim.2020.102644Google ScholarCross Ref
- Manisha Verma, Sudhakar Kumawat, Yuta Nakashima, and Shanmuganathan Raman. 2020. Yoga-82: A New Dataset for Fine-grained Classification of Human Poses. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 4472–4479. https://doi.org/10.1109/CVPRW50498.2020.00527Google ScholarCross Ref
- Yubin Wu, Qianqian Lin, Mingrun Yang, Jing Liu, Jing Tian, Dev Kapil, and Laura Vanderbloemen. 2022. A Computer Vision-Based Yoga Pose Grading Approach Using Contrastive Skeleton Feature Representations. Healthcare 10, 1 (2022). https://doi.org/10.3390/healthcare10010036Google ScholarCross Ref
- Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2021. A Comprehensive Survey on Transfer Learning. Proc. IEEE 109, 1 (Jan 2021), 43–76. https://doi.org/10.1109/JPROC.2020.3004555Google ScholarCross Ref
Index Terms
- Yoga Pose Classification Based on Transfer Learning of Vision Transformer
Recommendations
The Application of Vision Transformer in Image Classification
ICVARS '22: Proceedings of the 6th International Conference on Virtual and Augmented Reality SimulationsThis project aims to study the different performance between the Vision Transformer and a Convolu- tional Nerual Network. Google Colab will be used as the environment in this project. The dataset will use CIFAR-100 image dataset to train vision ...
Classification of yoga, meditation, combined yoga–meditation EEG signals using L-SVM, KNN, and MLP classifiers
AbstractIn this study, we compare the classification accuracy achievable with linear support vector machine (L-SVM), K-nearest neighbor (KNN), and multilayer perceptron (MLP) methods for a multi-class EEG signal. This can be done in three phases. In phase ...
Multi-scale vision transformer classification model with self-supervised learning and dilated convolution
AbstractBenefiting from the advantages of good parallelism and features that support long-distance dependency modeling, a variety of ViT models based on the self-attention mechanism show outstanding performance in image classification tasks. ...
Graphical abstractDisplay Omitted
Highlights- Constructed a self-supervised vision Transformer classification model.
- Proposed ...
Comments