Skip to main content

Advertisement

Log in

Classification of Arabic healthcare questions based on word embeddings learned from massive consultations: a deep learning approach

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Automated question classification is a fundamental component of automated question-answering systems, which plays a critical role in promoting medical and healthcare services. Developing an automated question classification system depends heavily on natural language processing and data mining techniques. Question classification methods based on classical machine learning techniques face limitations in capturing the hidden relationships of features, as well as, handling complex languages and very large-scale datasets. Therefore, this paper proposes a deep learning approach for question classification, since deep learning methods have the powerful capability to extract implicit, hidden relationships and automatically generate dense representations of features. The proposed question classification model depends on unidirectional and bidirectional long short-term memory networks (LSTM and BiLSTM), which essentially developed to handle the Arabic language in the field of healthcare. The features are represented and created using a domain-specific word embedding model (Word2Vec) that is constructed by training around 1.5 million medical consultations from Altibbi company. Altibbi is a telemedicine company that is used as a case study and a source for curating and collecting the data. The proposed deep learning approach is a multi-class classification algorithm that automatically labels and maps the questions into 15 categories of medical specialities. The proposed deep learning model is evaluated using several evaluation metrics, including accuracy, precision, recall, and F1-score. Markedly, the proposed model achieved a superb classification capacity in terms of classification accuracy rate, which gained 87.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://www.altibbi.com/

References

  • Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2015) Tensorflow: large-scale machine learning on heterogeneous systems

  • Abdallah A, Kasem M, Hamada M, Sdeek S (2020) Automated question answer medical model based on deep learning technology. arXiv:200510416

  • Agrawal S, Mishra N (2019) Question classification system for health care: a review. In: Proceedings of the Third International Conference on Advanced Informatics for Computing Research, Association for Computing Machinery, New York, NY, USA, ICAICR ’19, 10.1145/3339311.3339341

  • Ahmed W, Ahmed A, Babu AP (2017) Web-based arabic question answering system using machine learning approach. Int J Adv Res Comput Sci 8:1

  • Akselrod-Ballin A, Chorev M, Shoshan Y, Spiro A, Hazan A, Melamed R, Barkan E, Herzel E, Naor S, Karavani E et al (2019) Predicting breast cancer by applying deep learning to linked health records and mammograms. Radiology 292(2):331–342

    Article  Google Scholar 

  • Aydoğan M, Karci A (2020) Improving the accuracy using pre-trained word embeddings on deep neural networks for turkish text classification. Phys A Stat Mech Appl 541(123):288

    Google Scholar 

  • Banerjee I, Ling Y, Chen MC, Hasan SA, Langlotz CP, Moradzadeh N, Chapman B, Amrhein T, Mong D, Rubin DL et al (2019) Comparative effectiveness of convolutional neural network (cnn) and recurrent neural network (rnn) architectures for radiology text report classification. Artif Intell Med 97:79–88

    Article  Google Scholar 

  • Chollet F et al (2015) Keras. https://keras.io

  • Dash S, Acharya BR, Mittal M, Abraham A, Kelemen A (2020) Deep learning techniques for biomedical and health informatics. Springer, Berlin

    Book  Google Scholar 

  • Edara DC, Vanukuri LP, Sistla V, Kolli VKK (2019) Sentiment analysis and text categorization of cancer medical records with lstm. J Ambient Intell Hum Comput 2019:1–17

  • Estrada S, Lu R, Conjeti S, Orozco-Ruiz X, Panos-Willuhn J, Breteler MM, Reuter M (2020) Fatsegnet: a fully automated deep learning pipeline for adipose tissue segmentation on abdominal dixon MRI. Magn Reson Med 83(4):1471–1483

    Article  Google Scholar 

  • Faes L, Wagner SK, Fu DJ, Liu X, Korot E, Ledsam JR, Back T, Chopra R, Pontikos N, Kern C et al (2019) Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. Lancet Dig Health 1(5):e232–e242

    Article  Google Scholar 

  • Faris H, Habib M, Faris M, Alomari M, Alomari A (2020) Medical speciality classification system based on binary particle swarms and ensemble of one vs. rest support vector machines. J Biomed Informatics 2020:103525

  • Florentia (2020) Florentia clinic

  • Gong JW, Cormack TG (2020) Re: vision loss as a presenting symptom of type ii diabetes mellitus. Br J Gener Pract 2020:5

  • Hasan AM, Rassem TH, Noorhuzaimi M et al (2018) Combined support vector machine and pattern matching for arabic islamic hadith question classification system. In: International conference of reliable information and communication technology, Springer, pp 278–290

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Jain DK, Jain R, Upadhyay Y, Kathuria A, Lan X (2019) Deep refinement: capsule network with attention mechanism-based system for text classification. Neural Comput Appl 2019:1–18

  • Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Document 1972:5

  • Kim J, Jang S, Park E, Choi S (2020) Text classification using capsules. Neurocomputing 376:214–221

    Article  Google Scholar 

  • Kumar A, Sarkar S, Pradhan C (2020) Malaria disease detection using cnn technique with sgd, rmsprop and adam optimizers. In: Deep learning techniques for biomedical and health informatics. Springer, pp 211–230

  • Kwak GHJ, Hui P (2019) Deephealth: Deep learning for health informatics. arXiv:190900384

  • Lauritsen SM, Kalør ME, Kongsgaard EL, Lauritsen KM, Jørgensen MJ, Lange J, Thiesson B (2020) Early detection of sepsis utilizing deep learning on electronic health record event sequences. Artif Intell Med 2020:101820

  • Li Y, Yang T (2018) Word embedding for understanding natural language: a survey. In: Guide to big data applications. Springer, pp 83–104

  • Liu F, Weng C, Yu H (2019a) Advancing clinical research through natural language processing on electronic health records: traditional machine learning meets deep learning. In: Clinical Research Informatics. Springer, pp 357–378

  • Liu G, Guo J (2019) Bidirectional lstm with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338

    Article  Google Scholar 

  • Liu HI, Ni CC, Hsu CH, Chen WL, Chen WM, Liu YT (2020) Attention based r&cnn medical question answering system in chinese. In: 2020 International conference on artificial intelligence in information and communication (ICAIIC), IEEE, pp 341–345

  • Liu J, Shang W, Lin W (2018) Improved stacking model fusion based on weak classifier and word2vec. In: 2018 IEEE/ACIS 17th international conference on computer and information science (ICIS), IEEE, pp 820–824

  • Liu J, Yang Y, Lv S, Wang J, Chen H (2019b) Attention-based bigru-cnn for chinese question classification. J Ambient Intell Hum Comput 2019:1–12

  • Longuenesse E, Chiffoleau S, Kronfol N, Dewachi O (2012) Book: Public health in the arab world section: the context of public health chapter: Public health, the medical profession and state building–a historical perspective. HAL multidisciplinary open archive

  • Luhn HP (1957) A statistical approach to mechanized encoding and searching of literary information. IBM J Res Dev 1(4):309–317

    Article  MathSciNet  Google Scholar 

  • Mairittha T, Mairittha N, Inoue S (2020) Improving fine-tuned question answering models for electronic health records. In: Adjunct Proceedings of the 2020 ACM international joint conference on pervasive and ubiquitous computing and proceedings of the 2020 ACM International Symposium on Wearable Computers, pp 688–691

  • Mayo (2020) mayo clinic

  • Mikolov T, Chen K, Corrado G, Dean J, Sutskever L, Zweig G (2013) word2vec. https://www.codegooglecom/p/word2vec22

  • Mulani J, Heda S, Tumdi K, Patel J, Chhinkaniwala H, Patel J (2020) Deep reinforcement learning based personalized health recommendations. In: deep learning techniques for biomedical and health informatics. Springer, pp 231–255

  • Naili M, Chaibi AH, Ghezala HHB (2017) Comparative study of word embedding methods in topic segmentation. Procedia Comput Sci 112:340–349

    Article  Google Scholar 

  • Nakua EK, Otupiri E, Dzomeku VM, Owusu-Dabo E, Agyei-Baffour P, Yawson AE, Folson G, Hewlett S (2015) Gender disparities of chronic musculoskeletal disorder burden in the elderly ghanaian population: study on global ageing and adult health (sage wave 1). BMC Musculoskel Disord 16(1):204

    Article  Google Scholar 

  • Novomed (2020) Novomed centers

  • Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Rawat BPS, Weng WH, Raghavan P, Szolovits P (2020) Entity-enriched neural models for clinical question answering. arXiv:200506587

  • Řehůřek R, Sojka P (2010) Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp 45–50

  • Ren J, Liu N, Wu X (2020) Clinical questionnaire filling based on question answering framework. Int J Med Informatics 141(104):225

    Google Scholar 

  • Romeo S, Da San MG, Belinkov Y, Barrón-Cedeño A, Eldesouki M, Darwish K, Mubarak H, Glass J, Moschitti A (2019) Language processing and learning models for community question answering in arabic. Inf Process Manag 56(2):274–290

    Article  Google Scholar 

  • Ryu JY, Kim HU, Lee SY (2018) Deep learning improves prediction of drug-drug and drug-food interactions. Proc Nat Acad Sci 115(18):E4304–E4311

    Article  Google Scholar 

  • Sammut C, Webb GI (eds) (2010) TF–IDF, Springer US, Boston, MA, pp 986–987. https://doi.org/10.1007/978-0-387-30164-8_832

  • Schmidt L, Weeds J, Higgins J (2020) Data mining in clinical trial text: transformers for classification and question answering tasks. arXiv:200111268

  • Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  • Shah AM, Yan X, Shah SAA, Mamirkulova G (2019) Mining patient opinion to evaluate the service quality in healthcare: a deep-learning approach. J Ambient Intell Hum Comput 2019:1–18

  • Soliman AB, Eissa K, El-Beltagy SR (2017) Aravec: a set of arabic word embedding models for use in arabic nlp. Procedia Comput Sci 117:256–265

    Article  Google Scholar 

  • Soltanolkotabi M, Javanmard A, Lee JD (2019) Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. IEEE Trans Inf Theory 65(2):742–769

    Article  MathSciNet  Google Scholar 

  • Statista (2020) The world’s most spoken languages

  • Vidhya K, Shanmugalakshmi R (2020) Deep learning based big medical data analytic model for diabetes complication prediction. J Ambient Intell Hum Comput 2020:1–12

  • Vu MH, Löfstedt T, Nyholm T, Sznitman R (2020) A question-centric model for visual question answering in medical imaging. IEEE Trans Med imaging 2020:8

  • Wang Y, Sohn S, Liu S, Shen F, Wang L, Atkinson EJ, Amin S, Liu H (2019) A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak 19(1):1

    Article  Google Scholar 

  • Worell J (2001) Encyclopedia of women and gender, two-volume set: sex similarities and differences and the impact of society on gender, vol 1. Academic Press, Cambridge

    Google Scholar 

  • Yegnanarayana B (2009) Artificial neural networks. PHI Learning Pvt, New York

    Google Scholar 

  • Yilmaz S, Toklu S (2020) A deep learning analysis on question classification task using word2vec representations. Neural Comput Appl 57:1–20

  • Zhang L, Lin J, Liu B, Zhang Z, Yan X, Wei M (2019) A review on deep learning applications in prognostics and health management. IEEE Access 7:162,415–162,438

    Article  Google Scholar 

  • Zhang Q, Mu L, Zhang K, Zan H, Li Y (2018) Research on question classification based on bi-lstm. In: Workshop on Chinese Lexical Semantics, Springer, pp 519–531

  • Zhou J, Lu Y, Dai HN, Wang H, Xiao H (2019) Sentiment analysis of chinese microblog based on stacked bidirectional lstm. IEEE Access 7:38,856–38,866

    Article  Google Scholar 

  • Zhu Y, Li L, Lu H, Zhou A, Qin X (2020) Extracting drug-drug interactions from texts with biobert and multiple entity-aware attentions. J Biomed Informatics 2020:103451

Download references

Acknowledgements

This work has been supported in part by: Ministerio español de Economía y Competitividad under project TIN2017-85727-C4-2-P (UGR-DeepBio).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro A. Castillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Faris, H., Habib, M., Faris, M. et al. Classification of Arabic healthcare questions based on word embeddings learned from massive consultations: a deep learning approach. J Ambient Intell Human Comput 13, 1811–1827 (2022). https://doi.org/10.1007/s12652-021-02948-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-02948-w

Keywords

Navigation