Abstract
This study presents a data augmentation technique that solves insufficient/imbalanced data problems during crowdsensing by the Internet of Medical Things (IoMT) or wireless sensor networks (WSNs), owing to diversified locations and heterogeneous conditions. This may cause problems because the samples in various categories may vary in quantities, which create skew distributions. Besides, pattern analysis of insufficient observed samples also generates biased models. In view of such, this work proposes synthetic minority oversampling generative adversarial networks (SMOGANs) for processing imbalanced data, where insufficient samples in quantities can be automatically expanded, so that different classes contain equal numbers of samples, subsequently avoiding biased modeling. The SMOGAN consists of two modules, where the first one is the synthetic minority oversampling technique (SMOTE), and the second involves a GAN. The former is used to initialize the proposed system, in which insufficient/imbalanced data samples are roughly augmented in quantities. Subsequently, the GAN enriches feature diversities of those pseudoreal samples formerly augmented by the SMOTE. Experiments on open datasets were carried out for evaluation. To assess the capability of data augmentation, only 4.00% of the real data were reserved as minority classes and then sent into different data augmentation methods for comparison. Analytical results showed that the proposed SMOGANs outperformed the baselines. Accuracy was increased compared with the baselines. Such results showed that the proposed SMOGAN could improve data collection problems of insufficient/imbalanced datasets by enhancing data quantities and qualities.
Similar content being viewed by others
References
Aadil F, Ahsan W, Rehman ZU, Shah PA, Rho S, Mehmood I (2018) Clustering algorithm for Internet of Vehicles (IoV) based on dragonfly optimizer (CAVDO). J Supercomput 74(9):4542–4567
Lin JC-W, Srivastava G, Zhang Y, Djenouri Y, Aloqaily M (2021) Privacy-preserving multiobjective sanitization model in 6G IoT environments. IEEE Internet Things J 8(7):5340–5349
Carli R, Dotoli M, Pellegrino R (2017) A hierarchical decision-making strategy for the energy management of smart cities. IEEE Trans Autom Sci Eng 14(2):505–523
Chen J, Low KH, Yao Y, Jaillet P (2015) Gaussian process decentralized data fusion and active sensing for spatiotemporal traffic modeling and prediction in mobility-on-demand systems. IEEE Trans Autom Sci Eng 12(3):901–921
Shu Z, Wan J, Lin J, Wang S, Li D, Rho S, Yang C (2016) Traffic engineering in software-defined networking: measurement and management. IEEE Access 4:3246–3256
Elmisery AM, Rho S, Botvich D (2016) A fog based middleware for automated compliance with OECD privacy principles in internet of healthcare things. IEEE Access 4:8418–8441
Ji W, Xu J, Qiao H, Zhou M, Liang B (2019) Visual IoT: enabling internet of things visualization in smart cities. IEEE Network 33(2):102–110
Ji W, Liang B, Wang Y, Qiu R, Yang Z (2020) Crowd V-IoE: visual Internet of Everything architecture in AI-driven fog computing. IEEE Wirel Commun 27(2):51–57
Ji W, Duan LY, Huang X, Chai Y (2020) Astute video transmission for geographically dispersed devices in Visual IoT systems. IEEE Trans Mobile Comput 21(2):448–464
Lopez J, Rios R, Bao F, Wang G (2017) Evolving privacy: from sensors to the Internet of Things. Futur Gener Comput Syst 75:46–57
Li P, Li T, Ye H, Li J, Chen X, Xiang Y (2018) Privacy-preserving machine learning with multiple data providers. Futur Gener Comput Syst 87:341–350
Wu F, Li X, Xu L, Kumari S (2020) A privacy-preserving scheme with identity traceable property for smart gri. Comput Commun 157(1):38–44
Haddad BM, Yang S, Karam LJ, Ye J, Patel NS, Braun MW (2016) Multifeature, sparse-based approach for defects detection and classification in semiconductor units. IEEE Trans Autom Sci Eng 15(1):145–159
Niu S, Li B, Wang X, Lin H (2020) Defect image sample generation with GAN for improving defect recognition. IEEE Trans Autom Sci Eng 17(3):1611–1622
Jiang X, Ge Z (2020) Data augmentation classifier for imbalanced fault classification. IEEE Trans Autom Sci Eng 18(3):1206–1217
Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive GA-based model for closed high-utility itemset mining. Appl Soft Comput 108(18):2021
Liu X-Y, Wu J, Zhou Z-H (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B Cybern 39(2):539–550
Yap BW, Abd Rani K, Abd Rahman HA, Fong S, Khairudin Z, Abdullah NN (2013) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Proc. 1st international conference on advanced data and information engineering, Kuala Lumpur, Malaysia, Dec 16–18, pp 13–22
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Exp Newsl 6(1):20–29
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artificial Intell Res 16:321–357
Agrawal A, Viktor HL, Paquet E (2015) SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In: Proc. 2015 international joint conference on knowledge discovery, Knowledge Engineering and Knowledge Management, Lisbon, Portugal, Nov 12–14, pp. 226–234
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proc. 2005 international conference on intelligent computing, Hefei, China, Aug 23–26, pp 878–887
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proc. 2008 IEEE international joint conference on neural networks, Hong Kong, China, Jun 01–08, pp 1322–1328
Sanabila HR, Jatmiko W (2018) Ensemble learning on large scale financial imbalanced data, In: Proc. 2018 international workshop on big data and information security, Jakarta, Indonesia, May 12–13, pp 93–98
Goodfellow J, Pouget-Abadie I, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proc. 28th international conference on neural information processing systems, Montreal, Quebec, Canada, Dec 08–13
Mathew J, Pang CK, Luo M, Leong WH (2017) Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans Neural Netw Learn Syst 29(9):4065–4076
Tang B, He H (2015) KernelADASYN: kernel based adaptive synthetic data generation for imbalanced learning. In: Proc. 2015 IEEE congress on evolutionary computation, Sendai, Japan, May 25–28, pp 664–671
Hong X, Chen S, Harris CJ (2007) A kernel-based two-class classifier for imbalanced data sets. IEEE Trans Neural Netw 18(1):28–41
Tanaka FHKdS, Aranha C (2019) Data augmentation using GANs,” ArXiv
Scott M, Plested J (2019) GAN-SMOTE: a generative adversarial network approach to synthetic minority oversampling for one-hot encoded data. In: Proc. 26th international conference on neural information processing, Sydney, New South Wales, Australia, Dec 12–15, pp 29–35
Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (GANs): a survey. IEEE Access 7:36322–36333
Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling, In: Proc. 2019 international conference on computer vision, Seoul, South Korea, Oct 27–Nov 02, pp 1695–1704
Bertorello P, Koh LP (2019) SMate: synthetic minority adversarial technique,” Social Science Research Network
Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Handling imbalanced datasets: a review. GESTS Int Trans Comput Sci Eng 30(1):25–36
Liu M, Gao D, Liu G, He J, Jin L, Zhou C, Yang F (2019) Learning based adaptive network immune mechanism to defense eavesdropping attacks. IEEE Access 7:182814–182826
Qu N, Li Z, Zuo J, Chen J (2020) Fault detection on insulated overhead conductors based on DWT-LSTM and partial discharge. IEEE Access 8:87060–87070
Nguyen T, Le T, Vu H, Phung D (2017) Dual discriminator generative adversarial nets. In: Proc. 31st international conference on neural information processing systems, Long Beach, California, United States, Dec 04–09, pp 2670–2680
Nagarajan V, Kolter JZ (2017) Gradient descent GAN optimization is locally stable. In: Proc. 31st international conference on neural information processing systems, Long Beach, California, United States, Dec 04–09, pp 5585–5595
Lin JCW, Shao Y, Djenouri Y, Yun U (2021) ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowledge-Based Systems
Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O (2018) Are GANs created equal? A large-scale study. In: Proc. 32rd international conference on neural information processing systems, Montréal, Canada, Dec 03–08, pp 698–707
Acknowledgements
This work is supported in part by the Ministry of Science and Technology, Taiwan (107-2218-E-110-013-MY3) and by 2019 NVIDIA Data Science GPU Grants (Project: SMO-GANs for Extremely Imbalanced Data).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest between themselves.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lan, ZC., Huang, GY., Li, YP. et al. Conquering insufficient/imbalanced data learning for the Internet of Medical Things. Neural Comput & Applic 35, 22949–22958 (2023). https://doi.org/10.1007/s00521-022-06897-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-06897-z