Abstract
Many existing works have studied the learning on imbalanced data, however, it is still very challenging to handle high-dimensional imbalanced data. One key challenge of learning on imbalanced data is that most learning models usually have a bias towards the majority and its performance will deteriorate in the presence of underrepresented data and severe class distribution skews. One solution is to synthesize the minority data to balance the class distribution, but it may lead to more overlapping, especially in the high-dimensional setting. To alleviate the above challenges, in this paper, we present a novel Rectified Encoder Network (REN) for high-dimensional imbalanced learning tasks. The main contribution is that: (1) To deal with high-dimensionality, REN encodes high-dimensional imbalanced data into low dimensional latent codes as a latent representation. (2) To obtain a discriminative representation, we introduce a Rectifier to match the latent codes with our proposed Predefined Codes, which disentangles the overlapping among classes. (3) During rectification, in the Predefined Latent Distribution, we can efficiently identify and generate informative samples to maintain the balance of class distribution, so that the minority classes will not be neglected. The experimental results on several high-dimensional and image imbalanced data sets indicate that our REN obtains good representation code for classification and visualize the reason why REN gets better performance in high-dimensional imbalanced learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ashby, F.G., Maddox, W.T.: Capturing human category representations by sampling in deep feature spaces, pp. 1–10 (2018)
Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of cad models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3762–3769 (2014)
Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Dong, Q., Gong, S., Zhu, X.: Imbalanced deep learning by minority class incremental rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1367–1381 (2018)
Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and costsensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II. vol. 11, pp. 1–8. Citeseer (2003)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks 2008. IJCNN 2008. IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)
Jimenez, L.O., Landgrebe, D.A.: Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 28(1), 39–54 (1998)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
Pedregosa, F., et al.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems, pp. 3483–3491 (2015)
Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: Sixth International Conference on Data Mining 2006. ICDM 2006, pp. 592–602. IEEE (2006)
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Sun, Y., Tang, K., Minku, L.L., Wang, S., Yao, X.: Online ensemble learning of data streams with gradually evolved classes. IEEE Trans. Knowl. Data Eng. 28(6), 1532–1545 (2016)
Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders, pp. 1–16 (2018). http://arxiv.org/abs/1711.01558
Wang, S., Yao, X.: Multiclass imbalance problems analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(4), 1119–1130 (2012)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Sys. 2(1–3), 37–52 (1987)
Zhu, T., Lin, Y., Liu, Y.: Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn. 72, 327–340 (2017)
Acknowledgment
This work was supported by the National Key R&D Program of China (Grant No. 2017YFC0804003), the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X386), Shenzhen Peacock Plan (Grant No. KQTD2016112514355531), the Science and Technology Innovation Committee Foundation of Shenzhen (Grant Nos. ZDSYS201703031748284, JCYJ20180504165652917), the Program for University Key Laboratory of Guangdong Province (Grant No. 2017KSYS008), the ARC Future Fellowship ARC LP150100671, DP180100106, and National Natural Science Foundation of China (Grant Nos. 61603338, 61866010, 61703370).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, T., Chen, WJ., Tsang, I., Yao, X. (2019). Rectified Encoder Network for High-Dimensional Imbalanced Learning. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11671. Springer, Cham. https://doi.org/10.1007/978-3-030-29911-8_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-29911-8_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29910-1
Online ISBN: 978-3-030-29911-8
eBook Packages: Computer ScienceComputer Science (R0)