Investigating classification supervised learning approaches for the identification of critical patients’ posts in a healthcare social network☆
Introduction
Nowadays, all over the world, the number of investments in Information and Communication Technology (ICT) for health, aging and well-being (eHealth) is rapidly increasing. Global eHealth market is expected to reach USD 308.0 billion by 2022, according to a recent report by Grand View Research Inc. In particular, the transition of the healthcare industry into digital healthcare system for the management and analysis of patients’ health is expected to be the most vital driver for the market [1]. The European Commission’s eHealth Action Plan 2012–2020 has already provided a roadmap to empower patients and healthcare workers, to link up devices and technologies, and to invest in research towards the personalized medicine of the future [2]. In 2015, the European initiative called FICHE had the purpose to accelerate small medium enterprises for the development of new cutting-edge eHealth applications by means of the FIWARE technology [3]. In 2017, the European Commission set up an internal task force bringing together technology and health policy makers to examine EU policy actions to ensure the transformation of health care into a Digital Single Market (DSM) bringing benefits for people, health care systems and the economy [4]. Guaranteeing access to high-quality health care is a key objective of social protection systems in European countries and it represents the second largest social expenditure item after pensions. In this panorama, social media represent a tempting opportunity for healthcare operators for improving the patients’ well-being. Currently, many social media tools are available over the Internet such as social networking, professional networking, media sharing, content production including blogs (e.g., Tumblr) and micro-blogs (e.g., Twitter), knowledge/information aggregation (e.g., Wikipedia), virtual reality and gaming environments (e.g., second life). In particular, many Healthcare Social Networking (HSN) platforms have emerged with the purpose to enhance patient care and education. Popular HSN platforms include Sermo, Doximity, Orthomind, QuantiaMD, WeMedUp, Digital Healthcare and etc. However, these social networks require the massive action of medical personnel acting as moderator. In fact, healthcare social networks present potential risks for patients due to the possible distribution of poor-quality or wrong information along with their bad interpretation. On one hand, clinical operators want to promote the exchange of information among patients about a specific disease, but on the other hand, they do not have enough time to read patients’ posts and moderate them when required. Benefits includes:
- •
promoting networking and information exchange enabling self-education among patients about particular diseases.
- •
sharing patients’ experiences that can be helpful for other ones;
- •
supporting the treatment process;
- •
reducing the patient’s stress when he/she is waiting for a diagnosis or when he/she discovers to be affected by a particular disease;
- •
promoting information gathering and prevention campaign regarding specific diseases;
- •
optimizing the work of the clinical personnel who interact with patients skilled in their diseases;
- •
promoting knowledge management;
- •
promoting research and monitoring activities.
- •
On the other hand, HSNs present several risks including:
- •
possible distribution of poor-quality or wrong information among patients;
- •
need of qualified medical personnel who promptly read patients’ posts and reply them;
- •
often the medical operators do not have the time to read patients’ posts and to reply them;
- •
the medical operators do not want the responsibility of the consequences on patients (worsening, risk of death or death) when they do not reply in time.
- •
possible legal issues for the medical personnel;
- •
risks for the reputation of the medical personnel.
In our previous scientific work [5], in order to mitigate the aforementioned risks, we proposed a Patients’ Posts Moderator (PPM) architecture blueprint whose basic flowchart is shown in Fig. 1. Motivated by the fact that currently cognitive computing is emerging in all ICT fields [6], the basic idea around this scientific work was to carry out an automatic analysis of patients’ posts by means machine learning techniques in order to identify possible critical issues, hence helping medical operators to carry out actions when required. Both patients and medical personnel interact by means of a HSN platform. A Patients’ Posts Analysis System (PaPAS) works as a batch process that continuously analyzes patients’ posts of a HSN platform. When a critical issue is detected, it generates an event that is caught by a Complex Event Processing (CEP) component that elaborates it. An alert message is then sent to the interested medical personnel who can step in the HSN platform, replying to critical patients’ discussion groups and/or triggering medical interventions (doctors can directly contact the patient or send ambulance with a medical equipment if required). The main purpose of PaPAS is to analyze patients’ posts and evaluate possible critical issues that may trigger clinicians’ intervention. It includes the following sub-components: (i) Extractor, whose role is to extract patients’ content from the HSN platform; (ii) Selector, whose role is to select relevant keywords; (iii) Rank Generator, whose role is to rank selected keywords; (iv) Categorisator, whose role is to categorize the various levels of seriousness; (v) Classificator, whose role is to classify patients’ posts according to different categories; and (vi) Evaluator, whose role is to assess results’ quality.
This paper extends [5], specifically focusing on the Classificator sub-component. In particular, we implemented and compared different classification supervised learning algorithms (referred as classifiers) that we adopted for the classification of critical patients’ posts containing poor-quality or wrong information that can trigger the intervention of the medical personnel. The choice to consider supervised learning algorithms instead of unsupervised ones is motivated by the fact that the scientific literature has demonstrated that they well suit typical classification problems. Specifically, defining an n-gram as a contiguous sequence of n words in patients’ posts, we arranged several datasets, with which we trained Bayesian, Linear and Support Vector Machine (SVM) classifiers in order to analyze their accuracy.
The remainder of the paper is organized as follows. In Section 2 we provide a brief overview of the major recent initiatives in the fields of machine learning and social media for eHealth. In Section 3, we present the adopted training dataset and software tools used to train classifiers. In Section 4, we present the adopted method focusing on data collection, dictionary arrangement, dataset preparation and choice of classifiers. Experimental results and discussion are provided in Section 5. Conclusion and future developments are summarized in Section 6.
Section snippets
Background and related work
Social media aimed at improving healthcare quality is an emerging research topic [7]. In this Section, we provide a brief description regarding: (i) the impact and benefits of social media in healthcare; (ii) how social media are revolutionizing the whole healthcare marketing; (iii) the most recent best practices experienced in clinical centers; (iv) potential risks for patients; (v) recent initiatives regarding the use of Twitter in healthcare; (vi) data mining in healthcare, and (vii) the use
Training dataset and analysis tools
Before selecting the dataset with which the classifier istrained, we needed to choice the social network source.
Twitter is a famous social network. It is based on tweets, microblogs of 280 characters that can be shared (retweeted) or commented, thus feeding a chat. Tweets can include one or more selected hashtags (#) that are very important when topics-based data mining needs to be carried out.
Among the available hashtags available in the healthcare domain, we focused on the Child Sex Abuse
Classifiertraining method
The classification process follows the phases highlighted in Fig. 2. It helps to analyze the measurements of a generic object in order to identify the category or class to which that object belongs to. Typical examples of classification problem includes classification of credit card requests, placement of patient in a specific intensive-care unit, and so on. In our case study, object to be classified are users tweets related to the #CSAQT hashtag. #CSAQT tweets are pre-processed to remove urls,
Experimental results
The experiments have been carried out by comparing six classification algorithms, already described in Section 4, for three datasets, considering the notincr, incr2step and incr approaches.
Fig. 3, Fig. 3 shows how the vocabulary sizes changed considering respectively in train and test phases, according to the three cycling approaches used to build datasets and number of n-grams, whereas Fig. 3(c) shows how the vocabulary sizes changed considering both train and test phases, according to
Conclusion and future developments
In this work, we investigated a NLP approach in a real HSN case study based on Twitter. The idea behind the analysis was to classify tweets according to three message levels, i.e., alarm, notalarm and suspect, in order to create a tool, aimed at both patients, their familiars, and medical operators, and able to address emergency events warning possible changes of the patients’ health status.
The analysis followed three different n-grams datasets preparation approaches, called notincr, incr2step
CRediT authorship contribution statement
Lorenzo Carnevale: Investigation, Software, Data curation, Writing - original draft. Antonio Celesti: Conceptualization,Methodology, Investigation, Validation, Writing - original draft, Writing - reviewing & editing. Giacomo Fiumara: Conceptualization, Methodology, Investigation, Writing- original draft. Antonino Galletta: Validation, Visualization. Massimo Villari: Visualization, Supervision.
Declaration of Competing Interest
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2020.106155.
Acknowledgment
This work was supported by the Italian Healthcare Ministry founded project Young Researcher (under 40 years) entitled “Do Severe acquired brain injury patients benefit from Telerehabilitation? A Cost-effectiveness analysis study” - GR-2016-02361306.
References (46)
- et al.
Personalized healthcare cloud services for disease risk assessment and wellness management using social media
Pervasive Mob. Comput.
(2016) - et al.
Healthcare hashtag index development: Identifying global impact in social media
J. Biomed. Inform.
(2016) - et al.
Applying mining techniques to analyze vestibular data
Procedia Comput. Sci.
(2016) - et al.
An inverse Bayesian scheme for the denoising of ecg signals
J. Netw. Comput. Appl.
(2018) - et al.
User recommendation in healthcare social media by assessing user similarity in heterogeneous network
Artif. Intell. Med.
(2017) - et al.
Healthcare support for underserved communities using a mobile social media platform
Inf. Syst.
(2017) Ehealth market projected to reach USD 308.0 billion by 2022
(2016)European commission. ehealth action plan 2012-2020: Innovative healthcare for the 21st century
(2012)- et al.
Exploiting the fiware cloud platform to develop a remote patient monitoring system
European commission. transformation of health and care in the digital single market
(2014)
Intelligent equipment design assisted by cognitive internet of things and industrial big data
Neural Comput. Appl.
Social media and healthcare quality improvement: A nascent field
BMJ Qual. Saf.
Towards new social media logic in healthcare and its interplay with clinical logic
Social media disruptive change in healthcare: Responses of healthcare providers?
Social media use in healthcare: A systematic review of effects on patients and on their relationship with healthcare professionals
BMC Health Serv. Res.
Healthcare marketing and social media
Ethical information flows: Working with/against the healthcare industry’s fascination with social media
The importance of patient engagement and the use of social media marketing in healthcare
Technol. Health Care
The social media contribution into healthcare practices among Russian young people
Ekon. Sotsiologiya
Evolution of social media in scientific research: A case of technology and healthcare professionals in Saudi Universities
J. Med. Imaging Health Inform.
Social media and health care professionals: Benefits, risks, and best practices
P and T
The unintended consequences of social media in healthcare: New problems and new solutions
Yearb. Med. Inform.
Cited by (16)
A sophisticated semantic analysis framework using an intelligent tweet data clustering and classification methodologies
2023, Microprocessors and MicrosystemsCOVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets
2021, Applied Soft ComputingSome Observations on Social Media Mining tools for Health Applications
2024, Lecture Notes in Networks and SystemsIntelligent wearable healthcare monitoring framework: Trends in sensor-deep learning approaches
2023, Investigations in Pattern Recognition and Computer Vision for Industry 4.0Assessing the Usage of Various Data Mining Techniques for Analysis of Online Social Networks
2023, AI-Based Data Analytics: Applications for Business ManagementEarly prediction of hemodialysis complications employing ensemble techniques
2022, BioMedical Engineering Online
- ☆
This paper is an extended, improved version of the paper “Applying Artificial Intelligence in Healthcare Social Networks to Identity Critical Issues in Patients’ Posts” presented at AI4Health 2018 workshop and published in: BIOSTEC 2018, Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies, Volume 5: HEALTHINF, Funchal, Madeira, Portugal, 19–21 January, 2018, pp. 680-687, ISBN: 978-989-758-281-3, INSTICC, 2018.
- 1
on behalf of GNCS—Gruppo Nazionale per il Calcolo Scientifico - INdAM.