A novel rhinitis prediction method for class imbalance
Introduction
Rhinitis is an inflammation caused by certain external stimuli in some areas of the nasal cavity, and has become a widespread disease worldwide. It usually causes nasal congestion, runny nose and other symptoms, and affects the eyes and throat of patients. Rhinitis symptoms may seriously affect one’s work and life [1]. Rhinitis is the fifth most common chronic disease in the United States and the most common pediatric chronic disease in the United States [2]. Approximately 1/3 of the population in India suffer from rhinitis with varying degree [3]. However, the medical research on correct prediction and diagnosis of rhinitis is yet insufficient [4].
With the development of artificial intelligence technology, machine learning has been widely used in the field of biomedicine for clinical diagnosis, treatment selection and prognosis evaluation [5], etc. However, due to the general difficulty in collecting clinical medical instances, confidentiality and other reasons, many clinical medical data feature small instance size and class imbalance, which leads to poor generalization performance of classification by machine learning [6]. In particular, class imbalance will make the model predict a higher false negative rate or false positive rate, causing misdiagnosis and missed diagnosis. We propose a cascaded under-sampling ensemble learning method (CUEL) to construct multiple batch classifiers, each of which is composed of a number of base classifiers with different structures. We assign different weights to each of the batch classifiers. Through batch-by-batch under-sampling, the number of samples of different classes of each batch gradually realize equalization, so as to reduce the loss of information by the traditional under-sampling classification.
Section snippets
Related work
Recently, many scholars have been committed to the research of rhinitis diagnosis. Saidi et al. [7] proposed a method for detecting allergic rhinitis using an electronic nose with a series of chemical sensors. Some scholars use FIR to treat allergic rhinitis [8]. Infante et al. [9] judged allergic rhinitis by analyzing cough sounds, including automatic segmentation of cough sounds and feature extraction. Shi et al. [10] used complex entropy clustering technology to analyze rhinitis prescription
Problem description
Multi-label classification means that each instance in the datasets is associated with multiple labels [21]. The rhinitis instances in this article contains a variety of rhinitis types, which belong to multi-label classifications.
For a data set with n instances , the number of label m, represents the j-th sub-label of the i-th instance. Each sub-label is a two-class label, namely . Suppose the prediction function corresponding to the sub-label is
The datasets
In this study, we conduct classification experiments regarding 2231 clinical rhinitis instances from Shanghai Tongji Hospital Affiliated to Tongji University. The features of rhinitis are described in Table 2. There are two types of rhinitis instance labels, i.e. the diagnostic types of rhinitis and the two symptoms, severity and duration. The rhinitis diagnosis belongs to multi-label classification, including 10 labels, i.e. Allergic rhinitis, Sinusitis, Nasal Septum Deviation, Chronic
Feature importance ranking
This article also studies the feature importance of each class of rhinitis instances. Fig. 11, Fig. 12 are the top 9 feature importance ranking affecting the classification of allergic rhinitis and nasal septal deviation respectively. According to the medical principles of rhinitis, rhinitis causes nose allergies, resulting in fewer nasal discharges, and whitening of the nose and nasal cavity. It is usually accompanied by nasal itching, yellow nasal discharge, high white blood cells, red and
Conclusion
This paper proposes a cascaded under-sampling ensemble learning (CUEL) method to classify rhinitis instances with class imbalance. Cross validation was performed on 2231 clinical rhinitis instances from Shanghai Tongji Hospital Affiliated to Tongji University. The experiment showed that the average accuracy, true positive rate, and G-mean of the CUEL model were 90.71 %, 87.44 %, and 88.18 % respectively. Compared to typical classifiers (i.e. DecT, DNN, GBDT, SVM, Gcforest) the CUEL model has
CRediT authorship contribution statement
Meng Zhang: Conceptualization, Methodology, Software, Data curation, Writing-Original draft preparation. Jingdong Yang: Supervision, Validation, Writing- Reviewing and Editing, submission. Shaoqing Yu: Supervision, Data curation, Medical Case Analysis.
Acknowledgements
We greatly appreciate the helpful comments of the anonymous reviewers, which have significantly improved the style and content of this paper. This research is funded by the National Natural Science Foundation of China (81973749, 8187040043).
Declaration of Competing Interest
The authors report no declarations of interest.
References (28)
- et al.
The relationships between nasal hyperreactivity, quality of life, and nasal symptoms in patients with perennial allergic rhinitis
J. Allergy Clin. Immunol.
(1996) - et al.
Allergic rhinitis: clinical practice guideline. Committee on practice standards, American academy of otolaryngic allergy
Otolaryngol. Head Neck Surg.
(1996) - et al.
Learning from class imbalanced data: review of methods and applications
Expert Syst. Appl.
(2017) - et al.
Cost-sensitive decision tree ensembles for effective imbalanced classification
Appl. Soft Comput.
(2014) - et al.
Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data
Fuzzy Sets Syst.
(2015) - et al.
A novel ensemble method for classifying imbalanced data
Pattern Recognit.
(2015) - et al.
Impact of domestic air pollution from cooking fuel our respiratory allergies in children in India, Asian Pac
J. Allergy Immunol.
(2008) - et al.
Allergic rhinitis and Co-morbid asthma: perspective from India-ARIA asia-pacific workshop report, asian pac
J. Allergy Immunol.
(2009) - et al.
Data mining in healthcare and biomedicine: a survey of the literature
J. Med. Syst.
(2012) - et al.
A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset
Artif. Intell. Med.
(2019)