A novel rhinitis prediction method for class imbalance

https://doi.org/10.1016/j.bspc.2021.102821Get rights and content

Highlights

  • We propose a cascaded under-sampling ensemble learning method to construct multiple batch classifiers, each of which is composed of a number of base classifiers with different structures. Through batch-by-batch under-sampling, the instances of class imbalance gradually reach the equalization.

  • The average accuracy, true positive rate, and G-mean of the proposed model were 90.71 %, 87.44 %, and 88.18 %, respectively. Compared to typical classifiers, the proposed model has higher accuracy, true positive rate and lower missed diagnosis rate.

  • We calculate the feature importance for rhinitis features on the grounds of the purity of nodes in decision-making tree inside Random Forest and study the correlation between rhinitis features and classifications.

Abstract

Rhinitis is a prevalent respiratory disease. Clinical rhinitis instances are characterized by multi-label and class imbalance, which is difficult to be accurately classified by typical machine learning methods. We propose a cascaded under-sampling ensemble learning method (CUEL) to construct multiple batch classifiers, each of which is composed of a few base classifiers with different structures. Through batch-by-batch under-sampling, the correctly classified instances of majority class are removed, and the samples that are difficult to classify are kept to gradually reach the equalization of class imbalance. We assign different weights to each of the batch classifiers to construct the final integrated classifier. Cross validation was performed on 2231 clinical rhinitis instances from Shanghai Tongji Hospital Affiliated to Tongji University. The experiment showed that the average accuracy, true positive rate, and G-mean of the CUEL model were 90.71 %, 87.44 %, and 88.18 %, respectively. Compared to typical classifiers, the CUEL model has higher accuracy, true positive rate and lower missed diagnosis rate, and has stronger generalization performance. It can make full use of all rhinitis instances and effectively reduce the prediction deviation caused by class imbalance. Therefore, it has a good auxiliary effect for the prevention and diagnosis of clinical rhinitis. In addition, we calculate the feature importance for rhinitis features on the grounds of the purity of nodes in decision-making tree inside Random Forest and study the correlation between rhinitis features and classifications.

Introduction

Rhinitis is an inflammation caused by certain external stimuli in some areas of the nasal cavity, and has become a widespread disease worldwide. It usually causes nasal congestion, runny nose and other symptoms, and affects the eyes and throat of patients. Rhinitis symptoms may seriously affect one’s work and life [1]. Rhinitis is the fifth most common chronic disease in the United States and the most common pediatric chronic disease in the United States [2]. Approximately 1/3 of the population in India suffer from rhinitis with varying degree [3]. However, the medical research on correct prediction and diagnosis of rhinitis is yet insufficient [4].

With the development of artificial intelligence technology, machine learning has been widely used in the field of biomedicine for clinical diagnosis, treatment selection and prognosis evaluation [5], etc. However, due to the general difficulty in collecting clinical medical instances, confidentiality and other reasons, many clinical medical data feature small instance size and class imbalance, which leads to poor generalization performance of classification by machine learning [6]. In particular, class imbalance will make the model predict a higher false negative rate or false positive rate, causing misdiagnosis and missed diagnosis. We propose a cascaded under-sampling ensemble learning method (CUEL) to construct multiple batch classifiers, each of which is composed of a number of base classifiers with different structures. We assign different weights to each of the batch classifiers. Through batch-by-batch under-sampling, the number of samples of different classes of each batch gradually realize equalization, so as to reduce the loss of information by the traditional under-sampling classification.

Section snippets

Related work

Recently, many scholars have been committed to the research of rhinitis diagnosis. Saidi et al. [7] proposed a method for detecting allergic rhinitis using an electronic nose with a series of chemical sensors. Some scholars use FIR to treat allergic rhinitis [8]. Infante et al. [9] judged allergic rhinitis by analyzing cough sounds, including automatic segmentation of cough sounds and feature extraction. Shi et al. [10] used complex entropy clustering technology to analyze rhinitis prescription

Problem description

Multi-label classification means that each instance in the datasets is associated with multiple labels [21]. The rhinitis instances in this article contains a variety of rhinitis types, which belong to multi-label classifications.

For a data set with n instances D=Xi,Yii=1n,Yi=yi1,yi2,,yim, the number of label m, yij represents the j-th sub-label of the i-th instance. Each sub-label is a two-class label, namely yij-1,1. Suppose the prediction function corresponding to the sub-label is hx,θj

The datasets

In this study, we conduct classification experiments regarding 2231 clinical rhinitis instances from Shanghai Tongji Hospital Affiliated to Tongji University. The features of rhinitis are described in Table 2. There are two types of rhinitis instance labels, i.e. the diagnostic types of rhinitis and the two symptoms, severity and duration. The rhinitis diagnosis belongs to multi-label classification, including 10 labels, i.e. Allergic rhinitis, Sinusitis, Nasal Septum Deviation, Chronic

Feature importance ranking

This article also studies the feature importance of each class of rhinitis instances. Fig. 11, Fig. 12 are the top 9 feature importance ranking affecting the classification of allergic rhinitis and nasal septal deviation respectively. According to the medical principles of rhinitis, rhinitis causes nose allergies, resulting in fewer nasal discharges, and whitening of the nose and nasal cavity. It is usually accompanied by nasal itching, yellow nasal discharge, high white blood cells, red and

Conclusion

This paper proposes a cascaded under-sampling ensemble learning (CUEL) method to classify rhinitis instances with class imbalance. Cross validation was performed on 2231 clinical rhinitis instances from Shanghai Tongji Hospital Affiliated to Tongji University. The experiment showed that the average accuracy, true positive rate, and G-mean of the CUEL model were 90.71 %, 87.44 %, and 88.18 % respectively. Compared to typical classifiers (i.e. DecT, DNN, GBDT, SVM, Gcforest) the CUEL model has

CRediT authorship contribution statement

Meng Zhang: Conceptualization, Methodology, Software, Data curation, Writing-Original draft preparation. Jingdong Yang: Supervision, Validation, Writing- Reviewing and Editing, submission. Shaoqing Yu: Supervision, Data curation, Medical Case Analysis.

Acknowledgements

We greatly appreciate the helpful comments of the anonymous reviewers, which have significantly improved the style and content of this paper. This research is funded by the National Natural Science Foundation of China (81973749, 8187040043).

Declaration of Competing Interest

The authors report no declarations of interest.

References (28)

  • T. Saidi et al.

    Detection of seasonal allergic rhinitis from exhaled breath VOCs using an electronic nose based on an array of chemical sensors

  • K. Hu et al.

    Clinical effects of Far-infrared therapy in patients with allergic rhinitis

  • C. Infante et al.

    Use of cough sounds for diagnosis and screening of pulmonary disease

  • C. Shi et al.

    Complex system entropy cluster based research on treating allergic rhinitis by traditional Chinese medicine

  • Cited by (0)

    View full text