Skip to main content

Advertisement

Log in

Modified self-training based statistical models for image classification and speaker identification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Building a high precision statistical model requires ample amounts of supervised (labeled) data to train the models. In certain domains, it is difficult to acquire large amounts of labeled data, especially applications involving images, speech and video data. At the same time, lots of unlabeled data is available in such applications. Self-training is one of the semi-supervised approaches that enables the use of vast unlabeled data to boost the efficiency of the model along with minimal labeled data. In this work, we propose a variant of the self-training approach that embraces soft labeling of unlabeled examples rather than the hard labeling used in conventional self-training. As our work focuses on image and speaker recognition tasks, Gaussian Mixture Model (GMM) based Bayesian classifier is used as a wrapper in the self-training approach. Our experimental studies on STL10, CIFAR10, MIT (image recognition task) and NIST (speaker recognition task) benchmark datasets indicate that the proposed modified self-training approach offers enhanced efficiency over conventional self-training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Blum, A., & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the eighteenth international conference on machine learning (pp. 19–26).

  • Blum, A., Lafferty, J., Rwebangira, M. R., & Reddy, R. (2004). Semi-supervised learning using randomized mincuts. In Proceedings of the twenty-first international conference on Machine learning (p. 13). ACM.

  • Bodapati, J. D., & Veeranjaneyulu, N. (2017). Abnormal network traffic detection using support vector data description. In Proceedings of the 5th international conference on frontiers in intelligent computing: Theory and applications (pp. 497–506). Springer.

  • Bodapat, J. D., Veeranjaneyulu, N., & Shareef Shaik (2019). Sentiment analysis from movie reviews using LSTMs. Ingénierie des Systèmes d Inf, 24(1), 125–129.

    Article  Google Scholar 

  • Bodapati, J. D., Vijay, A., & Veeranjaneyulu, N. (2020). Brain tumor detection using deep features in the latent space. Ingénierie des Systèmes d’Information, 25, 259–265.

    Article  Google Scholar 

  • Chapelle, O., Scholkopf, B., & Zien, A. (2009). Semi-supervised learning (Chapelle, O. et al., eds.; 2006) [book reviews]. IEEE Transactions on Neural Networks, 20(3), 542–542.

    Article  Google Scholar 

  • Čular, L., Tomaić, M., Subašić, M., Šarić, T., Sajković, V., & Vodanović, M. (2017). Dental age estimation from panoramic X-ray images using statistical models. In Proceedings of the 10th international symposium on image and signal processing and analysis (pp. 25–30). IEEE.

  • Davari, A., Aptoula, E., Yanikoglu, B., Maier, A., & Riess, C. (2018). GMM-based synthetic samples for classification of hyperspectral images with limited training data. IEEE Geoscience and Remote Sensing Letters, 15(6), 942–946.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (methodological), 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Duan, R., Jiang, W., & Man, H. (2006). Semi-supervised image classification in likelihood space. In 2006 IEEE international conference on image processing (pp. 957–960). IEEE.

  • Garla, V., Taylor, C., & Brandt, C. (2013). Semi-supervised clinical text classification with Laplacian SVMs: An application to cancer case management. Journal of Biomedical Informatics, 46(5), 869–875.

    Article  Google Scholar 

  • Jaakkola, T., & Szummer, M. (2002). Partially labeled classification with Markov random walks. In Advances in neural information processing systems (pp. 945–952).

  • Joachims, T. (2003). Transductive learning via spectral graph partitioning. In International conference on machine learning (pp. 290–297).

  • Kveton, B., Valko, M., Rahimi, A., & Huang, L. (2010). Semi-supervised learning with max-margin graph cuts. In International conference on artificial intelligence and statistics (pp. 421–428).

  • Li, Y.-F., & Zhou, Z.-H. (2011). Improving semi-supervised support vector machines through unlabeled instances selection. In Proceedings of the twenty-fifth AAAI conference on artificial intelligence (pp. 386–391).

  • Maurya, A., Kumar, D., & Agarwal, R. K. (2018). Speaker recognition for Hindi speech signal using MFCC-GMM approach. Procedia Computer Science, 125, 880–887.

    Article  Google Scholar 

  • Miller, D. J. (2003). A mixture model and EM-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled/unlabeled data sets. Pattern Analysis and Machine Intelligence, 25(11), 1468–1483.

    Article  Google Scholar 

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  Google Scholar 

  • Patel, P., Chaudhari, A., Kale, R., & Pund, M. (2017). Emotion recognition from speech with Gaussian mixture models & via boosted GMM. International Journal of Research in Science and Engineering, 3.

  • Sajjad, H., Schmid, H., Fraser, A., & Schütze, H. (2017). Statistical models for unsupervised, semi-supervised, and supervised transliteration mining. Computational Linguistics, 43(2), 349–375.

    Article  MathSciNet  Google Scholar 

  • Shahshahani, B. M. (1994). The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. Geoscience and Remote Sensing, 32(5), 1087–1095.

    Article  Google Scholar 

  • Sindhwani, V., Niyogi, P., & Belkin, M. (2005). Beyond the point cloud: from transductive to semi-supervised learning. In Proceedings of the 22nd international conference on Machine learning (pp. 824–831). ACM.

  • Tanha, J., van Someren, M., & Afsarmanesh, H. (2017). Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics, 8(1), 355–370.

    Article  Google Scholar 

  • Vatsavai, R. R., Badhuri, B., Shekhar, S., & Burk, T. E. (2008). Multisource data classification using a hybrid semi-supervised learning scheme. In IEEE international geoscience and remote sensing symposium, 2008. IGARSS 2008 (Vol. 3, pp. III-1016). IEEE.

  • Veeranjaneyulu, N., Bodapati, J. D., & Buradagunta, S. (2020). Classifying limited resource data using semi-supervised SVM classifying limited resource data using semi-supervised SVM. Ingénierie des Systèmes d’Information, 25, 391–395.

    Article  Google Scholar 

  • Wang, Y., Chen, S., & Zhou, Z.-H. (2012). New semi-supervised classification method based on modified cluster assumption. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 689–702.

    Article  Google Scholar 

  • Woo, J., Xing, F., Stone, M., Green, J., Reese, T. G., Brady, T. J., Prince, J. L., El Fakhri, G. (2019). Speech map: A statistical multimodal atlas of 4D tongue motion during speech from tagged and cine MR images. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 7(4), 361–373.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jyostna Devi Bodapati.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bodapati, J.D. Modified self-training based statistical models for image classification and speaker identification. Int J Speech Technol 24, 1007–1015 (2021). https://doi.org/10.1007/s10772-021-09861-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-021-09861-9

Keywords

Navigation