Skip to main content
Log in

Conquering insufficient/imbalanced data learning for the Internet of Medical Things

  • S.I. : Neural Computing for IOT based Intelligent Healthcare Systems
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This study presents a data augmentation technique that solves insufficient/imbalanced data problems during crowdsensing by the Internet of Medical Things (IoMT) or wireless sensor networks (WSNs), owing to diversified locations and heterogeneous conditions. This may cause problems because the samples in various categories may vary in quantities, which create skew distributions. Besides, pattern analysis of insufficient observed samples also generates biased models. In view of such, this work proposes synthetic minority oversampling generative adversarial networks (SMOGANs) for processing imbalanced data, where insufficient samples in quantities can be automatically expanded, so that different classes contain equal numbers of samples, subsequently avoiding biased modeling. The SMOGAN consists of two modules, where the first one is the synthetic minority oversampling technique (SMOTE), and the second involves a GAN. The former is used to initialize the proposed system, in which insufficient/imbalanced data samples are roughly augmented in quantities. Subsequently, the GAN enriches feature diversities of those pseudoreal samples formerly augmented by the SMOTE. Experiments on open datasets were carried out for evaluation. To assess the capability of data augmentation, only 4.00% of the real data were reserved as minority classes and then sent into different data augmentation methods for comparison. Analytical results showed that the proposed SMOGANs outperformed the baselines. Accuracy was increased compared with the baselines. Such results showed that the proposed SMOGAN could improve data collection problems of insufficient/imbalanced datasets by enhancing data quantities and qualities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aadil F, Ahsan W, Rehman ZU, Shah PA, Rho S, Mehmood I (2018) Clustering algorithm for Internet of Vehicles (IoV) based on dragonfly optimizer (CAVDO). J Supercomput 74(9):4542–4567

    Article  Google Scholar 

  2. Lin JC-W, Srivastava G, Zhang Y, Djenouri Y, Aloqaily M (2021) Privacy-preserving multiobjective sanitization model in 6G IoT environments. IEEE Internet Things J 8(7):5340–5349

    Article  Google Scholar 

  3. Carli R, Dotoli M, Pellegrino R (2017) A hierarchical decision-making strategy for the energy management of smart cities. IEEE Trans Autom Sci Eng 14(2):505–523

    Article  Google Scholar 

  4. Chen J, Low KH, Yao Y, Jaillet P (2015) Gaussian process decentralized data fusion and active sensing for spatiotemporal traffic modeling and prediction in mobility-on-demand systems. IEEE Trans Autom Sci Eng 12(3):901–921

    Article  Google Scholar 

  5. Shu Z, Wan J, Lin J, Wang S, Li D, Rho S, Yang C (2016) Traffic engineering in software-defined networking: measurement and management. IEEE Access 4:3246–3256

    Article  Google Scholar 

  6. Elmisery AM, Rho S, Botvich D (2016) A fog based middleware for automated compliance with OECD privacy principles in internet of healthcare things. IEEE Access 4:8418–8441

    Article  Google Scholar 

  7. Ji W, Xu J, Qiao H, Zhou M, Liang B (2019) Visual IoT: enabling internet of things visualization in smart cities. IEEE Network 33(2):102–110

    Article  Google Scholar 

  8. Ji W, Liang B, Wang Y, Qiu R, Yang Z (2020) Crowd V-IoE: visual Internet of Everything architecture in AI-driven fog computing. IEEE Wirel Commun 27(2):51–57

    Article  Google Scholar 

  9. Ji W, Duan LY, Huang X, Chai Y (2020) Astute video transmission for geographically dispersed devices in Visual IoT systems. IEEE Trans Mobile Comput 21(2):448–464

    Article  Google Scholar 

  10. Lopez J, Rios R, Bao F, Wang G (2017) Evolving privacy: from sensors to the Internet of Things. Futur Gener Comput Syst 75:46–57

    Article  Google Scholar 

  11. Li P, Li T, Ye H, Li J, Chen X, Xiang Y (2018) Privacy-preserving machine learning with multiple data providers. Futur Gener Comput Syst 87:341–350

    Article  Google Scholar 

  12. Wu F, Li X, Xu L, Kumari S (2020) A privacy-preserving scheme with identity traceable property for smart gri. Comput Commun 157(1):38–44

    Article  Google Scholar 

  13. Haddad BM, Yang S, Karam LJ, Ye J, Patel NS, Braun MW (2016) Multifeature, sparse-based approach for defects detection and classification in semiconductor units. IEEE Trans Autom Sci Eng 15(1):145–159

    Article  Google Scholar 

  14. Niu S, Li B, Wang X, Lin H (2020) Defect image sample generation with GAN for improving defect recognition. IEEE Trans Autom Sci Eng 17(3):1611–1622

    Google Scholar 

  15. Jiang X, Ge Z (2020) Data augmentation classifier for imbalanced fault classification. IEEE Trans Autom Sci Eng 18(3):1206–1217

    Article  Google Scholar 

  16. Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive GA-based model for closed high-utility itemset mining. Appl Soft Comput 108(18):2021

    Google Scholar 

  17. Liu X-Y, Wu J, Zhou Z-H (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B Cybern 39(2):539–550

    Article  Google Scholar 

  18. Yap BW, Abd Rani K, Abd Rahman HA, Fong S, Khairudin Z, Abdullah NN (2013) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Proc. 1st international conference on advanced data and information engineering, Kuala Lumpur, Malaysia, Dec 16–18, pp 13–22

  19. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Exp Newsl 6(1):20–29

    Article  Google Scholar 

  20. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artificial Intell Res 16:321–357

    Article  MATH  Google Scholar 

  21. Agrawal A, Viktor HL, Paquet E (2015) SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In: Proc. 2015 international joint conference on knowledge discovery, Knowledge Engineering and Knowledge Management, Lisbon, Portugal, Nov 12–14, pp. 226–234

  22. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  23. Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proc. 2005 international conference on intelligent computing, Hefei, China, Aug 23–26, pp 878–887

  24. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proc. 2008 IEEE international joint conference on neural networks, Hong Kong, China, Jun 01–08, pp 1322–1328

  25. Sanabila HR, Jatmiko W (2018) Ensemble learning on large scale financial imbalanced data, In: Proc. 2018 international workshop on big data and information security, Jakarta, Indonesia, May 12–13, pp 93–98

  26. Goodfellow J, Pouget-Abadie I, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proc. 28th international conference on neural information processing systems, Montreal, Quebec, Canada, Dec 08–13

  27. Mathew J, Pang CK, Luo M, Leong WH (2017) Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans Neural Netw Learn Syst 29(9):4065–4076

    Article  Google Scholar 

  28. Tang B, He H (2015) KernelADASYN: kernel based adaptive synthetic data generation for imbalanced learning. In: Proc. 2015 IEEE congress on evolutionary computation, Sendai, Japan, May 25–28, pp 664–671

  29. Hong X, Chen S, Harris CJ (2007) A kernel-based two-class classifier for imbalanced data sets. IEEE Trans Neural Netw 18(1):28–41

    Article  Google Scholar 

  30. Tanaka FHKdS, Aranha C (2019) Data augmentation using GANs,” ArXiv

  31. Scott M, Plested J (2019) GAN-SMOTE: a generative adversarial network approach to synthetic minority oversampling for one-hot encoded data. In: Proc. 26th international conference on neural information processing, Sydney, New South Wales, Australia, Dec 12–15, pp 29–35

  32. Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (GANs): a survey. IEEE Access 7:36322–36333

    Article  Google Scholar 

  33. Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling, In: Proc. 2019 international conference on computer vision, Seoul, South Korea, Oct 27–Nov 02, pp 1695–1704

  34. Bertorello P, Koh LP (2019) SMate: synthetic minority adversarial technique,” Social Science Research Network

  35. Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Handling imbalanced datasets: a review. GESTS Int Trans Comput Sci Eng 30(1):25–36

    Google Scholar 

  36. Liu M, Gao D, Liu G, He J, Jin L, Zhou C, Yang F (2019) Learning based adaptive network immune mechanism to defense eavesdropping attacks. IEEE Access 7:182814–182826

    Article  Google Scholar 

  37. Qu N, Li Z, Zuo J, Chen J (2020) Fault detection on insulated overhead conductors based on DWT-LSTM and partial discharge. IEEE Access 8:87060–87070

    Article  Google Scholar 

  38. Nguyen T, Le T, Vu H, Phung D (2017) Dual discriminator generative adversarial nets. In: Proc. 31st international conference on neural information processing systems, Long Beach, California, United States, Dec 04–09, pp 2670–2680

  39. Nagarajan V, Kolter JZ (2017) Gradient descent GAN optimization is locally stable. In: Proc. 31st international conference on neural information processing systems, Long Beach, California, United States, Dec 04–09, pp 5585–5595

  40. Lin JCW, Shao Y, Djenouri Y, Yun U (2021) ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowledge-Based Systems

  41. Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O (2018) Are GANs created equal? A large-scale study. In: Proc. 32rd international conference on neural information processing systems, Montréal, Canada, Dec 03–08, pp 698–707

Download references

Acknowledgements

This work is supported in part by the Ministry of Science and Technology, Taiwan (107-2218-E-110-013-MY3) and by 2019 NVIDIA Data Science GPU Grants (Project: SMO-GANs for Extremely Imbalanced Data).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo-Wei Chen.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest between themselves.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lan, ZC., Huang, GY., Li, YP. et al. Conquering insufficient/imbalanced data learning for the Internet of Medical Things. Neural Comput & Applic 35, 22949–22958 (2023). https://doi.org/10.1007/s00521-022-06897-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-06897-z

Keywords

Navigation