Elsevier

Information Fusion

Volume 55, March 2020, Pages 150-163
Information Fusion

Employing online social networks in precision-medicine approach using information fusion predictive model to improve substance use surveillance: A lesson from Twitter and marijuana consumption

https://doi.org/10.1016/j.inffus.2019.08.006Get rights and content

Highlights

  • Twitter data is a source of information to analyse marijuana prevalence.

  • A model for marijuana use surveillance to get information fusion to support policymakers.

  • Social network and opinion mining to analyse the population thoughts about marijuana.

Abstract

The impact that connected community have on precision health or medicine and vice versa offers opportunities for any type of research and survey information, e. g, to predict trends in health-related issues, specifically people behavior towards drug use. Here, precision medicine influences the way to treat the information and get a better outcome to support the stakeholder decision. Online social networks analysis seems to be good tools to quickly monitor the population behavior where users freely share large amounts of information related to their own lives on day- to- day basis. This novel kind of data can be used to get additional real time insights from people to understand their actual behavior related to drug use (Cortés et al., 2017). The aim of this research is to generated an information fusion model of marijuana use tendency confident enough to be employed by stakeholders. So, we will: (a) collect and process the data from Twitter; (b) design a set of algorithms to estimate the tendency of marijuana use in relation to age, localization and gender, moreover, used a set of processes and activities to verify if our model were performing as expected; and (c) fusion of the information in a model to fully characterize the marijuana use population comparable to the national marijuana consume survey for policy makers utilization to improve drug use prevention. First,we collect the data from Twitter accounts based in Chile using an algorithm for traversing graph data structures, we collected the data from Twitter accounts based in Chile. Then, we estimated marijuana user prevalence during a period from 2006 to 2018 and, within each of the years we predicted the prevalence of user population in relation with age (in range), the localization (regions) and gender. Finally, we built indicators to explore the similarity between data obtained through Twitter (our results) and the actual data collected by the National Service for the Prevention and Rehabilitation of Drug and Alcohol (SENDA) under the same variables analyze in their own survey. When we compare the results of the algorithms and methods developed by us with those provided by the SENDA, we observed that most of the indicators present similar trends, i.e., the variation of the prevalence by years in the age, location and gender, showed similar changes in both analyzes. Also, the algorithms effectiveness and capacity to predict variations of complex cases like marijuana use in Chilean population. This study is a key opportunity to obtain in a faster, low cost and continuous way information about marijuana use, also, is an excellent tool for marijuana surveillance to get information to support policy makers and stakeholder decisions.

Introduction

Precision medicine is a revolutionary field of modern medicine for disease treatment and prevention that include individual variability in genes, environment and lifestyle for each person [2], [3]. The prospect of precision medicine driven by Data Science (machine learning algorithms) and artificial intelligence applications in the modern world in tailoring the diagnostics, treatments, prevention, patient health and outcomes is very promising. Personalized precision medicine includes several benefits such as: (a) ensure that people get the correct treatment every time; (b) the emphasis is in prevention not reaction; (c) improved patient outcomes and (d) offers less cost effective and more efficient health care. In parallel presents ongoing challenges from regulation, reimbursement and clinical adoption to the economic value of the data. A precision medicine approaches, using shared-decision making with patients, policy makers and health workers offers an important opportunity to improve substance-use disorders (SUDs) prevention and treatment outcomes [3], [4]. Health and care experimented an exponential positive growth throughout eight innovative technologies as smart-phone, genome sequencing, machine learning, block-chain and the connected community among them [5]. The impact that connected community would have on precision health offering opportunities on research, distribution and survey information [1], [3], [6]. Everyday users generate large amounts of information to express themselves and behave as they naturally do in their daily lives [7], [8]. This novel type of data can be used to get additional real time insights from people to understand their actual behavior related to drug use. Social media analyses to assess marijuana use included manual labeling and the use of machine learning algorithms [9], [10], [11], [12], [13], [14], [15], [16].

Opinion Mining techniques have been used to label tweets related to marijuana [17], [18], [19], [20], [21]. Regardless of certain limitations, the system proposed in this project, works as complement to the already existent monitoring apparatus by identifying new trends in people behavior towards drug [1], [13], [14], [15], [16], [17]. United Nation Members are engaged in intensive discussions on the way forward to address the world drug problem. Risk factors and circumstances that can render people more vulnerable to illicit drugs (long-term effect on the brain), as well as facilitate the establishment and expansion of illegal markets, are often related to issues of development, rule of law and governance. Policies can never be pursued in isolation, and drug control is no exception [11], [17], [18]. Significant changes are taking place in the policy landscape surrounding cannabis legalization, production, use, prescription cannabis-derived medicinal products to patients with certain medical conditions and whether recreational cannabis use is associated with physical health problems later in life [20], [21], [22]. Most of the time the policymakers and stakeholders require ways to check or fast track data referring to drug use and control [23], [24], [25].

In Chile, marijuana has been given a particular attention due to extensive debate and relevance where Chile has the highest marijuana prevalence rate in Latin America1 [26]. The National Service for the Prevention and Rehabilitation of Drug and Alcohol (SENDA) and its Drug Chilean Observatory conduct studies to collect information on the extent of use and risk perception, among other variables, every two years. SENDA, (2016) reports showed a systematic increase of drug consumption from 2010 and the prevalence is higher in the male population rather than the female. The group of people from 19 to 25 years showed the higher increased in drug consumption passing from 24,0% in 2014 to 33,8% in 2016 in comparison with the rest of population. The social environment can increase the risk of substance use and school presented a high risk of consumption [27]. In late adolescence and early adulthood, people who relates to others who use marijuana increase their risk of utilization [28]. Social Network Analyses (SNA) shows that greater proximity to peer substance users may be related to substance use and people who connect groups are especially affected [18]. Studies on the relationship between online network features and substance use by young adults showed that drug uses was associated with more connections and a high proportion of the network that discusses and accepts the drug use [10], [13], [15], [21], [28], [29], [30]. Marijuana use by peers on social media networks has also been associated with individual marijuana use among young adults [21], [28].

The system proposed in this project works as a novel source of fast track information fusion and control. SENDA conducts studies every two years, in a very time-consuming cycle and with continuous obstruction of the monitoring system. Therefore, our approaches works as complementary sources of data to improve and enrich the quality of information to the existent SENDA monitoring apparatus. Furthermore, we propose a new way to increasing the frequency of data collection from Twitter. Such data must be transformed into information to support future decision making. It is important to highlight the economic value of the data in low income countries depends on cleaver ways to lower the cost and digital technology opens this possibility. In this context, to get information in a fast way about people substance use can help us to indirectly prevent population addiction or dependence and also is an excellent tool for marijuana surveillance to support policymaker decisions.

This paper is facing the research problem of characterizing a complex phenomenon such us the marijuana consume through the analysis of related tweets, considering as research goal the construction of a confidence model in comparison with the national marijuana consume survey. In that sense, we have two main research questions to answer: Q1 Is it possible to extract from the texts related with marijuana written by Twitter user’s valuable information about the marijuana use? Q2: Is it possible to develop a model whose outcome is comparable with the data provided by the National marijuana consume survey? We are working only with Twitter user’s account holders living in Chile. Also, we considered only the people whose language expressed Chilean slang consequently we are certain to include only Chilean people. This means if we want to applied a similar analysis in other country in South America we must considered the special slang language used only by them. We assumed the relation between the Twitter user’s account holder tweet about marijuana and the probability of marijuana consumption.

The remainder of this work is structured as follows. In Section 2 we show an overview of social network analysis in social analysis, in particular its application on data collected from Twitter for studying a complex phenomenon as the marijuana use. Section 3 shows the methods and algorithms applied for analyzing the marijuana use of account holders in Twitter. The main results of this work are summarized in Section 4. Finally, Section 5 exhibits the conclusions from this work.

Section snippets

Related work

The scientific community have recently applied social networks analysis (SNA) in social sciences. SNA found several solutions for social phenomena in a wide range of disciplines [6], [31]. It is possible to extract valuable information in important topics from the recorded opinions by social network’s users [32]. These opinions are texts written by the users themselves, with writing errors and mistakes, which must be considered in the creation of text processing algorithms for extracting

Framework

Precision health emphasizes in prevention and using shared-decision making with patients, policy makers and health workers offers an important opportunity to improve it in SUDs [3], [5], [6]. The connected community is one innovative technology that change health [5] and opens several opportunities of research, distribution and survey information [3], [4], [6], [39], [40]. This novel approach to obtain data can be used to get additional real time insights from people to understand their actual

Results

We combined precision health approach with online social networks analysis to get additional real time insights between Chilean Twitter users and estimated the tendency of marijuana use. Furthermore, we employed statistics, probability analysis and fusion of different models to characterize the user population by age, localization and gender. Finally, we established an inference model for policy makers to improve SUDs prevention. Our results are described in this section.

Marijuana use tendency estimation using information fusion from twitter data collection

Marijuana use of account holders. The data collection of account holders (connected community) is a key stage in the marijuana use analysis. We emphasized that the data associate to a user account holder could be wrong for several reasons e.g, the incorrect birth date, imprecise location, false gender or inclusive a written lie in the tweet. So, any effort to clean up and pre-process the data to obtain a version as close as possible to the reality, will benefit the whole action of extracting

Conclusions and future work

This study proposes a social network analysis whereas a significant information from a connected community in Twitter is generated. These conversations (data) can be used to predict trends in health-related issues specifically their behavior related to drug use. Here, precision medicine influences the way to treat the information and get a better outcome to support the stakeholder decision. Therefore, we generated an information fusion predictive model to improve substance use epidemiological

Acknowledgment

The authors would like to acknowledge the continuous support of Fondef Project IT16I10055, the Complex Engineering Systems Institute (CONICYT PIA FB0816) and the Chilean Society of Neurology Psychiatry and Neurosurgery (SONEPSYN).

References (65)

  • V.D. Cortés et al.

    Twitter for marijuana infodemiology

    Proceedings of the International Conference on Web Intelligence

    (2017)
  • G.S. Ginsburg et al.

    Precision medicine: from science to value

    Health Affairs

    (2018)
  • National Institutes of Health, Precision medicine initiative, 2015, (http://www.nih.insov/precisionmedicine/) [Online;...
  • U.E. Ghitza

    Needed relapse-prevention research on novel framework (ASPIRE model) for substance use disorders treatment

    Front. Psychiatry

    (2015)
  • C. Gretton, M. Honeyman, The digital revolution: eight technologies that will change health and care, 2016,...
  • S.P. Borgatti et al.

    Network analysis in the social sciences

    Science

    (2009)
  • J.D. Velasquez

    Web site keywords: a methodology for improving gradually the web site text content

    Intell. Data Anal.

    (2012)
  • H. Dai et al.

    Mining social media data for opinion polarities about electronic cigarettes

    Tob. Control

    (2017)
  • L. Shutler et al.

    Drug use in the twittersphere: a qualitative contextual analysis of tweets about prescription drugs

    J. Addict. Dis.

    (2015)
  • M.J. Paul et al.

    You are what you tweet: analyzing twitter for public health

    In Proc. of the 5th International AAAI Conference on Web logs and Social Media (ICWSM)

    (2011)
  • M. Chary et al.

    Leveraging social networks for toxicovigilance

    J. Med. Toxicol.

    (2013)
  • D. Nguyen et al.

    “how old do you think i am?” a study of language and age in twitter

    Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media

    (2013)
  • R. Daniulaityte et al.

    “When ‘bad’ is ‘good”: identifying personal communication and sentiment in drug-related tweets

    JMIR Public Health Surv.

    (2016)
  • Q. Tian et al.

    Finding needles of interested tweets in the haystack of twitter network

    Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

    (2016)
  • S.H. Cook et al.

    Online network influences on emerging adults’ alcohol and drug use

    J. Youth Adolesc.

    (2013)
  • S. Katragadda et al.

    Detecting adverse drug effects using link classification on twitter data

    Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

    (2015)
  • R. Harpaz et al.

    Text mining for adverse drug events: the promise, challenges, and state of the art

    Drug Saf.

    (2014)
  • The United Nations Office on Drugs and Crime (UNODC), Global overview of drug demand and supply: Latest trends,...
  • National Service for the Prevention and Rehabilitation of Drug and Alcohol (SENDA), Eleventh National Study of Drugs in...
  • S.A. Stoddard et al.

    Permissive norms and young adults alcohol and marijuana use: the role of online communities

    J. Stud. Alcohol Drugs

    (2012)
  • J. Villena Román et al.

    TASS-workshop on sentiment analysis at SEPLN

    Procesamiento del Lenguaje Natural

    (2013)
  • H. Kwak et al.

    What is twitter, a social network or a news media?

    Proceedings of the 19th International Conference on World Wide Web

    (2010)
  • Cited by (12)

    • Identifying user geolocation with Hierarchical Graph Neural Networks and explainable fusion

      2022, Information Fusion
      Citation Excerpt :

      These have not only changed our way of communication, reading, and social activities but also enabled a generation of an unprecedented volume of heterogeneous data, which, in turn, fosters business innovations and emerging industrial opportunities [1]. Among various applications, identifying the geographic locations of users receives lasting interest from both academia and industry and has become an essential Internet service for many industrial services, such as location-based targeted advertising, emergency location identification, political elections, substance use surveillance, local event/place recommendation and natural disaster response [2–4]. Fine-grained localization, such as various sensor-based tracking of assets and processes, have already been exploited in multiple industrial applications.

    • Using a mixed methods approach to identify public perception of vaping risks and overall health outcomes on Twitter during the 2019 EVALI outbreak

      2021, International Journal of Medical Informatics
      Citation Excerpt :

      For example, text mining and statistical analysis have been used in the examination of depression and suicidal content on the Twitter platform [18,22] and in the understanding of vaping flavors worldwide [21]. Further, sentiment analyses have been used on Twitter to better understand tobacco use attitudes and public perception [20,23]. Additionally, natural language processing (NLP) and machine learning (ML) have been applied to text mining of Twitter content to inform detection models aimed to identify tweets relevant to vaping, tweets with pro-vape sentiments [24] and purchasing and use behaviors surrounding illicit products [25].

    View all citing articles on Scopus
    View full text