Exploration of text matching methods in Chinese disease Q&A systems: A method using ensemble based on BERT and boosted tree models

doi:10.1016/j.jbi.2021.103683

Journal of Biomedical Informatics

Volume 115, March 2021, 103683

https://doi.org/10.1016/j.jbi.2021.103683 Get rights and content

Under an Elsevier user license

open archive

Highlights

•
Single pre-training model shows good performance results.
•
Pseudo-labeling technology is used to expand the training set.
•
Graph features show great importance.
•
Ensemble based on deep learning model and boosted tree model greatly improves the overall effect.

Abstract

Background

Text matching is one of the basic tasks in the field of natural language processing. Owing to the particularity of Chinese language and medical texts, text matching has greater application and research value in the medical field. In 2019, at the China Health Information Processing Conference (CHIP), 30,000 sets of real disease Q&A data in Chinese on diabetes, hypertension, hepatitis B, AIDS, and breast cancer were released for public evaluation. A total of 90 teams participated in the evaluation.

Purpose

To explore the best method of text matching of Chinese medical Q&A data by participating in an evaluation competition.

Method

After analyzing the Chinese medical Q&A data provided by the competition, we used the bidirectional encoder representations from transformers (BERT) model and a boosted tree model to compare the effects. At the same time, we analyzed the importance of the features extracted through feature engineering. Finally, we integrated the BERT and boosted tree models, and proved the effectiveness of the ensemble through a correlation analysis.

Results

The final F1 score of the ensemble model is 0.90825, ranking first among the 90 participating teams. The highest F1 score of the single BERT model is 0.87443, whereas the highest F1 score of the boosted tree single model is only 0.86915. The F1 score of the BERT multi-model ensemble is 0.87473 (an average increase of 0.756% compared to the single model), and the F1 score of the boosted tree multi-model ensemble is 0.86720 (an average decrease of 0.03% compared to the single model). In the feature importance experiment, the out-degree and in-degree of the Q&A sentence are of utmost importance. In the correlation experiment, the correlation coefficients between models of the same type are all as high as 0.9, which shows a high similarity. The correlation coefficient between different types of models is approximately 0.7, which shows a certain degree of discrimination. With the ensemble of the two types of models, the F1 score reached 0.90825, which is 3.88% higher than that of the optimal single model.

Conclusion

In our study, the proposed model ensemble method was shown to effectively improve the performance of a single model. It achieves good results in Chinese medical Q&A tasks and has a good generalization property.

Keywords

Text matching

Feature engineering

Boosted tree model

BERT