A novel rhinitis prediction method for class imbalance

doi:10.1016/j.bspc.2021.102821

Biomedical Signal Processing and Control

Volume 69, August 2021, 102821

https://doi.org/10.1016/j.bspc.2021.102821 Get rights and content

Highlights

•
We propose a cascaded under-sampling ensemble learning method to construct multiple batch classifiers, each of which is composed of a number of base classifiers with different structures. Through batch-by-batch under-sampling, the instances of class imbalance gradually reach the equalization.
•
The average accuracy, true positive rate, and G-mean of the proposed model were 90.71 %, 87.44 %, and 88.18 %, respectively. Compared to typical classifiers, the proposed model has higher accuracy, true positive rate and lower missed diagnosis rate.
•
We calculate the feature importance for rhinitis features on the grounds of the purity of nodes in decision-making tree inside Random Forest and study the correlation between rhinitis features and classifications.

Abstract

Rhinitis is a prevalent respiratory disease. Clinical rhinitis instances are characterized by multi-label and class imbalance, which is difficult to be accurately classified by typical machine learning methods. We propose a cascaded under-sampling ensemble learning method (CUEL) to construct multiple batch classifiers, each of which is composed of a few base classifiers with different structures. Through batch-by-batch under-sampling, the correctly classified instances of majority class are removed, and the samples that are difficult to classify are kept to gradually reach the equalization of class imbalance. We assign different weights to each of the batch classifiers to construct the final integrated classifier. Cross validation was performed on 2231 clinical rhinitis instances from Shanghai Tongji Hospital Affiliated to Tongji University. The experiment showed that the average accuracy, true positive rate, and G-mean of the CUEL model were 90.71 %, 87.44 %, and 88.18 %, respectively. Compared to typical classifiers, the CUEL model has higher accuracy, true positive rate and lower missed diagnosis rate, and has stronger generalization performance. It can make full use of all rhinitis instances and effectively reduce the prediction deviation caused by class imbalance. Therefore, it has a good auxiliary effect for the prevention and diagnosis of clinical rhinitis. In addition, we calculate the feature importance for rhinitis features on the grounds of the purity of nodes in decision-making tree inside Random Forest and study the correlation between rhinitis features and classifications.

Introduction

Rhinitis is an inflammation caused by certain external stimuli in some areas of the nasal cavity, and has become a widespread disease worldwide. It usually causes nasal congestion, runny nose and other symptoms, and affects the eyes and throat of patients. Rhinitis symptoms may seriously affect one’s work and life [1]. Rhinitis is the fifth most common chronic disease in the United States and the most common pediatric chronic disease in the United States [2]. Approximately 1/3 of the population in India suffer from rhinitis with varying degree [3]. However, the medical research on correct prediction and diagnosis of rhinitis is yet insufficient [4].

With the development of artificial intelligence technology, machine learning has been widely used in the field of biomedicine for clinical diagnosis, treatment selection and prognosis evaluation [5], etc. However, due to the general difficulty in collecting clinical medical instances, confidentiality and other reasons, many clinical medical data feature small instance size and class imbalance, which leads to poor generalization performance of classification by machine learning [6]. In particular, class imbalance will make the model predict a higher false negative rate or false positive rate, causing misdiagnosis and missed diagnosis. We propose a cascaded under-sampling ensemble learning method (CUEL) to construct multiple batch classifiers, each of which is composed of a number of base classifiers with different structures. We assign different weights to each of the batch classifiers. Through batch-by-batch under-sampling, the number of samples of different classes of each batch gradually realize equalization, so as to reduce the loss of information by the traditional under-sampling classification.

Section snippets

Related work

Recently, many scholars have been committed to the research of rhinitis diagnosis. Saidi et al. [7] proposed a method for detecting allergic rhinitis using an electronic nose with a series of chemical sensors. Some scholars use FIR to treat allergic rhinitis [8]. Infante et al. [9] judged allergic rhinitis by analyzing cough sounds, including automatic segmentation of cough sounds and feature extraction. Shi et al. [10] used complex entropy clustering technology to analyze rhinitis prescription

Problem description

Multi-label classification means that each instance in the datasets is associated with multiple labels [21]. The rhinitis instances in this article contains a variety of rhinitis types, which belong to multi-label classifications.

For a data set with n instances ${D = \{X_{i}, Y_{i}\}}_{i = 1}^{n}, Y_{i} = \{{y_{i}}^{(1)}, {y_{i}}^{(2)}, \dots, {y_{i}}^{(m)}\}$ , the number of label m, ${y_{i}}^{(j)}$ represents the j-th sub-label of the i-th instance. Each sub-label is a two-class label, namely ${y_{i}}^{(j)} \in \{- 1,1\}$ . Suppose the prediction function corresponding to the sub-label is $h (x, θ_{j})$

The datasets

In this study, we conduct classification experiments regarding 2231 clinical rhinitis instances from Shanghai Tongji Hospital Affiliated to Tongji University. The features of rhinitis are described in Table 2. There are two types of rhinitis instance labels, i.e. the diagnostic types of rhinitis and the two symptoms, severity and duration. The rhinitis diagnosis belongs to multi-label classification, including 10 labels, i.e. Allergic rhinitis, Sinusitis, Nasal Septum Deviation, Chronic

Feature importance ranking

This article also studies the feature importance of each class of rhinitis instances. Fig. 11, Fig. 12 are the top 9 feature importance ranking affecting the classification of allergic rhinitis and nasal septal deviation respectively. According to the medical principles of rhinitis, rhinitis causes nose allergies, resulting in fewer nasal discharges, and whitening of the nose and nasal cavity. It is usually accompanied by nasal itching, yellow nasal discharge, high white blood cells, red and

Conclusion

This paper proposes a cascaded under-sampling ensemble learning (CUEL) method to classify rhinitis instances with class imbalance. Cross validation was performed on 2231 clinical rhinitis instances from Shanghai Tongji Hospital Affiliated to Tongji University. The experiment showed that the average accuracy, true positive rate, and G-mean of the CUEL model were 90.71 %, 87.44 %, and 88.18 % respectively. Compared to typical classifiers (i.e. DecT, DNN, GBDT, SVM, Gcforest) the CUEL model has

CRediT authorship contribution statement

Meng Zhang: Conceptualization, Methodology, Software, Data curation, Writing-Original draft preparation. Jingdong Yang: Supervision, Validation, Writing- Reviewing and Editing, submission. Shaoqing Yu: Supervision, Data curation, Medical Case Analysis.

Acknowledgements

We greatly appreciate the helpful comments of the anonymous reviewers, which have significantly improved the style and content of this paper. This research is funded by the National Natural Science Foundation of China (81973749, 8187040043).

Declaration of Competing Interest

The authors report no declarations of interest.

References (28)

T. De Graaf-in’t Veld et al.
The relationships between nasal hyperreactivity, quality of life, and nasal symptoms in patients with perennial allergic rhinitis
J. Allergy Clin. Immunol.
(1996)
J.A. Fornadley et al.
Allergic rhinitis: clinical practice guideline. Committee on practice standards, American academy of otolaryngic allergy
Otolaryngol. Head Neck Surg.
(1996)
G. Haixiang et al.
Learning from class imbalanced data: review of methods and applications
Expert Syst. Appl.
(2017)
B. Krawczyk et al.
Cost-sensitive decision tree ensembles for effective imbalanced classification
Appl. Soft Comput.
(2014)
V. López et al.
Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data
Fuzzy Sets Syst.
(2015)
Z. Sun et al.
A novel ensemble method for classifying imbalanced data
Pattern Recognit.
(2015)
R. Kumar et al.
Impact of domestic air pollution from cooking fuel our respiratory allergies in children in India, Asian Pac
J. Allergy Immunol.
(2008)
Ashok Shah et al.
Allergic rhinitis and Co-morbid asthma: perspective from India-ARIA asia-pacific workshop report, asian pac
J. Allergy Immunol.
(2009)
I. Yoo et al.
Data mining in healthcare and biomedicine: a survey of the literature
J. Med. Syst.
(2012)
Tianyu Liu et al.
A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset
Artif. Intell. Med.
(2019)

T. Saidi et al.

Detection of seasonal allergic rhinitis from exhaled breath VOCs using an electronic nose based on an array of chemical sensors

K. Hu et al.

Clinical effects of Far-infrared therapy in patients with allergic rhinitis

C. Infante et al.

Use of cough sounds for diagnosis and screening of pulmonary disease

C. Shi et al.

Complex system entropy cluster based research on treating allergic rhinitis by traditional Chinese medicine

Cited by (0)

View full text

A novel rhinitis prediction method for class imbalance

Highlights

Abstract

Introduction

Section snippets

Related work

Problem description

The datasets

Feature importance ranking

Conclusion

CRediT authorship contribution statement

Acknowledgements

Declaration of Competing Interest

J. Allergy Clin. Immunol.

Otolaryngol. Head Neck Surg.

Expert Syst. Appl.

Appl. Soft Comput.

Fuzzy Sets Syst.

Pattern Recognit.

Impact of domestic air pollution from cooking fuel our respiratory allergies in children in India, Asian Pac

J. Allergy Immunol.

Allergic rhinitis and Co-morbid asthma: perspective from India-ARIA asia-pacific workshop report, asian pac

J. Allergy Immunol.

Data mining in healthcare and biomedicine: a survey of the literature

J. Med. Syst.

A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset

Artif. Intell. Med.

Detection of seasonal allergic rhinitis from exhaled breath VOCs using an electronic nose based on an array of chemical sensors

Clinical effects of Far-infrared therapy in patients with allergic rhinitis

Use of cough sounds for diagnosis and screening of pulmonary disease

Complex system entropy cluster based research on treating allergic rhinitis by traditional Chinese medicine