From Social Media to Expert Reports: The Impact of Source Selection on Automatically Validating Complex Conceptual Models of Obesity

Sandhu, Mannila; Giabbanelli, Philippe J.; Mago, Vijay K.

doi:10.1007/978-3-030-21902-4_31

Mannila Sandhu¹⁵,
Philippe J. Giabbanelli¹⁶ &
Vijay K. Mago¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11578))

Included in the following conference series:

International Conference on Human-Computer Interaction

5174 Accesses

Abstract

Models are predominantly developed using either quantitative data (e.g., for structured equation models) or qualitative data obtained through questionnaires designed by researchers (e.g., for fuzzy cognitive maps). The wide availability of social media data and advances in natural language processing raise the possibility of developing models from qualitative data naturally produced by users. This is of particular interest for public health surveillance and policymaking, as social media provide the opinions of constituents. In this paper, we contrast a model produced by social media with one produced via expert reports. We use the same process to derive a model in each case, thus focusing our analysis on the impact of source selection. We found that three expert reports were sufficient to touch on more aspects of a complex problem (measured by the number of relationships) than several million tweets. Consequently, developing a model exclusively from social media may lead to oversimplifying a problem. This may be avoided by complementing social media with expert reports. Alternatively, future research should explore whether a much larger volume of tweets would be needed, which also calls for improvements in scalable methods to transform qualitative data into models.

Research funded by MITACS Globalink Research Award, Canada.

You have full access to this open access chapter, Download conference paper PDF

Mental Health Studies: A Review

What methods are used to examine representation of mental ill-health on social media? A systematic review

Article Open access 29 February 2024

Decision Support in the Era of Social Media and User-Generated Content

Keywords

1 Introduction

Overweight and obesity is now a global phenomenon, found in economically developed or developing countries (e.g., United States [1], European countries [2], South Africa [3], China [4]) as well as in regions that experience a double burden with the concomitant problem of malnutrition [5]. While there are ongoing debates on a possible plateau or even decrease of overweight and obesity in the next generation, updated prevalence data for children suggests that severe obesity is on the rise [6]. There is a plethora of interventions to prevent overweight and obesity in both children [7] and adults [8], and an equally impressive number of interventions for treatment [9, 10]. Yet, individual struggles to achieve a health weight over a sustained period of time. For example, a review of weight management interventions found a weight loss over two years of 1.54 kg [11], which is far from the 5% weight loss recommended to produce health benefits [12]. These challenges have led to the realization that a simple solution would not suffice [13]: the health system needs to cope with the complexity of obesity [14,15,16].

The notion of complexity covers multiple characteristics, such as the vast individual differences (or heterogeneity) between weight-related factors [17, 18], or the nonlinear ways in which factors interact to form a system. The obesity system has been the subject of numerous studies [19,20,21,22]. This system involves factors from a broad array of sectors (e.g., built environment, eating disorders, weight stigma [23, 24]), with interactions within as well as across sectors. Accurately modeling this system facilitates the development of integrated policies building on cross-sectoral efforts [25, 26]. If policies are developed separately along traditional themes (e.g., public planning works on the environment, doctors work on diseases and physiology, mental health experts work on psychology), then we have a heavily fragmented approach to obesity (Fig. 1a). Efforts such as the Foresight Obesity Map [20, 27], or the Public Health Services Authority’s series of maps [24, 28, 29] thus support the development of synergistic policies working on integrated thematic clusters (Fig. 1b).

Given the importance of developing accurate models of the obesity system, the modeling process often seeks to be comprehensive by including experts and community members [19, 24, 30,31,32,33,34]. While many qualitative modeling processes can produce models in the form of maps [35] (e.g., cognitive/concept mapping, causal loop diagrams), they are generally conducted with a facilitator. Some of the limitations (e.g., costs, trained facilitator) may be addressed through emerging technologies [36]. However, one limitation remains: participants may not openly express their beliefs (e.g., weight discrimination) when perceiving that they may not be well received by a facilitator or the research team. In contrast, the naturally occurring exchange of perspectives in social media provides an unobtrusive approach to collecting beliefs on causes and consequences of obesity. Mining social media may thus provide the views of community members [37,38,39,40].

While obtaining a model via social media can inform policymakers about popular support for possible policies [41], the model may stand in stark contrast with an expert-based model [34]. Identifying and reconciling these differences is an important step to integrate social computing (and specifically social web mining) with policy making. In this paper, we contrast how mining social media instead of expert reports affects the validation of a large conceptual model of obesity. This overarching goal is achieved through three consecutive steps. First, we assemble a social media dataset (consisting of several million tweets) and several expert reports (totaling hundred of pages). Second, we employ an innovative multi-step process to examine a conceptual model using both the social media dataset and the expert reports. Finally, we contrast the structure of these models using network methods.

The remainder of this paper is organized as follows. In Sect. 2, we provide background information on the application of social web mining to health, and on the use of conceptual models in obesity research. In Sect. 3, we briefly explain our approach to validate a conceptual model from text. In Sect. 4, we perform this inference on both expert reports and tweets, and we examine how the conceptual models differ. Finally, these differences are discussed and contextualized in Sect. 5.

2 Background

2.1 Social Web Mining for Health

The social media of interest in this paper is Twitter, in which users post and interact through short messages known as ‘tweets’. Twitter has been used for many studies on obesity and weight-related behaviors. For instance, Harris and colleagues collected 1,110 tweets and read them to understand how childhood obesity was discussed [42], while Lydecker et al. read 529 tweets to identify the main themes related to fatness [43]. Similarly, So and colleagues analyzed the common features of 120 tweets that were most frequently shared (i.e., retweets) to understand what information individuals preferred to relay when it came to obesity [37]. Reading the tweets to identify themes (i.e., content analysis) is a typical task to understand the arguments that a specific population uses on a subject of interest. Broader examples in health include the content analysis of 700 tweets [44] and 625 tweets [45] to examine the type of claims that health professionals make online, or an examination of 8,934 tweets documenting cyberincivility among nurses and nursing students [46]. While such content analyses make a valuable contribution to the body of knowledge on arguments in public health^{Footnote 1}, they do not employ computational methods to automate (parts of) the analysis and thus scale it to a larger dataset. Automation can be as simple as counting how many times keywords of interest appear across tweets. Turner-McGrievy and Beets used Hashtagify.me to automatically count keywords in tens of thousands of tweets on weight loss, health, diet, and fitness. By dividing the analysis across time periods, they were able to examine if there are times of the year when individuals would be likely to consider weight loss, thus contributing to the timing of interventions [48]. Similarly, Sui et al. used the intensity of topics on Twitter as part of an effort to identify the public interest in intensive obesity treatment [49]. Such studies illustrate the important shift from having humans read and code all tweets to relying on a machine to handle most of a (much larger) dataset. The latter is the focus of data mining applied to the ‘social web’ (i.e. social web mining) which includes social networking sites such as Twitter but also encompasses blogs and micro-blogging. As Twitter has been the social platform of interest for many studies, the term of ‘Twitter mining’ has also emerged to refer specifically to the application of social web mining to Twitter [50].

Social web mining started to garner attention in the late 2000’s to early 2010’s. The application of social web mining to health was discussed in 2010 by Boulos et al. [51] and in 2011 by Paul and Dredze [52], showing how a broad range of public health applications could benefit from mining Twitter. Studies have been able to mine a staggering volume of data, going well over what a team of humans could handle. For example, Eichstaedt et al. mapped 148 million tweets to counties in an effort to relate language patterns to county-level heart disease mortality [53]. At an even larger scale, Ediger and colleagues used a Cray computer to approximate centrality within two hours on a dataset of interactions between Twitter users comprising 1.47 billion edges [54]. While these cases are noteworthy by their volume of data, studies employing social web mining for obesity research typically involve millions of tweets^{Footnote 2}. Using 2.2 million tweets, Chou and colleagues found that tweets (as well as Facebook posts) often stigmatized individuals living with overweight and obesity [38]. In two studies on obesity and weight-related factors, Karami analyzed 6 million [39] and 4.5 million tweets [40]. In a study of health-related statistics, Culotta mined 4.3 million tweets and found that the data was correlated with obesity [56]. Given that obesity is driven by many factors (e.g., eating behaviors, physical activity behaviors), there is also a wealth of large-scale studies on such factors, such as the work of Abbar et al. on 503 million tweets regarding food [57]. Finally, the value proposition of several new platforms is not the analysis of one particular dataset, but rather the ongoing ability to monitor diet or physical activity. This is particularly the case for the Lexicocalorimeter, which measures calories in each US state via Twitter [58], and to a lesser extent for the National Neighborhood Dataset of Zhang et al. which tracks diet and physical activity through Twitter [59].

Several commentaries [60] and reviews [61,62,63] have explored whether this abundance of studies has contributed to public health. Findings depend on what specific aspect of health is concerned. Social media has yet to impact practices in public health surveillance [62], but a review centered on chronic disease found a benefit on clinical outcomes in almost half of the studies [61], and a review specific to obesity highlighted a modest impact on weight [63].

2.2 Conceptual Models in Obesity Research

Although our work will involve the identification of themes, we have a very different endeavor from studies reviewed in the previous section, which focused on identifying themes and their variations across time, places, or communities of users. Our objective is to contrast conceptual models that have been automatically extracted from tweets and expert reports. As evoked in the introduction, models of complex systems such as obesity support several important policy-making and analytical tasks. In this section, we briefly review the features that models often seek to capture when it comes to complex health systems, and how models are used in obesity research specifically. Penn detailed key characteristics of complex health systems that justify the development of models (emphases added):

“Many problems that society wishes to address in population health are clearly problems of managing complex adaptive systems. They involve making interventions in systems with multiple interacting causal connections, which span domains from physiological to economic. Additionally, of course, the individuals whose health we ultimately wish to improve adapt and change their behavior in response to medical or policy interventions.” [64]

Several of these points were echoed by Silverman in justifying the use of systems-based simulation for population health research [65]. Modeling changes in the heterogeneous health behaviors of individuals often uses the simulation technique of Agent-Based Modeling, and has been done in obesity research on multiple occasions [66,67,68,69,70]. Such models can be very detailed and use widely different architectures to capture the cognitive processes of the agents. Validating them using text is thus an arduous task. Modeling interacting causes across domains has been achieved in obesity research through a variety of techniques. System Dynamics (SD) allows to represent nonlinear interactions between weigh-related factors over different time scales and at different strengths [71, 72]. However, much like agent-based modeling, the great level of details supported by SD makes it difficult to derive or validate such models from text. Fuzzy Cognitive Maps (FCM) are a simpler alternative that eliminates the notion of time to focus on the different strengths of causal relations [34, 73,74,75]. Such models can be compared [34], but validating them from text still requires a trained analyst [76]. An even greater simplification is to use conceptual rather than simulation models. Conceptual models cannot run scenarios or what-if questions, and cannot ‘generate’ numbers. Instead, their focus is to capture relevant factors and whether they are connected [77]. Conceptual models can be compared [78] and validated using text as shown in our previous work [77].

There are several types of conceptual models [35]. We recently detailed the differences between causal maps, mind maps, and concept maps [36]. In short, this paper focuses on concept maps (Fig. 1), which are undirected networks representing concepts as nodes and relationships as edges. Similarly to the other forms of conceptual models aforementioned, a concept map supports policy-oriented tasks such as identifying clusters [27] (e.g., to coordinate actors across domains on one problem such as food) or finding feedback loops [24, 28, 29] (e.g., to use as leverage points in an intervention).

3 Validating a Conceptual Model from Text

The process starts with a conceptual model that we seek to validate, and the text corpus is used to validate. Intuitively, our process uses the concepts’ names to find relevant parts of the corpus and find which concepts tend to co-occur. Technical aspects include handling variations in language (as we cannot rigidly assume that a concept’s name will appear as such), identifying themes, and mapping themes from the corpus back to concepts in the conceptual model. Our process uses seven steps, illustrated on a theoretical example in Fig. 2. The first two steps are performed for each concept node:

(1.a)
We replace all concepts’ names and words from the corpus with their base form (i.e., lemma). This is accomplished through lemmatization, which uses a morphological analysis to remove inflectional endings. This step ensures that minor variations of a term are all mapped to the same one (e.g., ‘flooding’ and ‘floods’ are all mapped to ‘flood’).
(1.b)
Each lemmatized concept names is expanded with derivationally related forms. For instance, instead of only searching for ‘flood’ in the corpus, we will also accept words such as ‘deluge’.
(2)
For each concept (i.e., the expanded lemma), we retrieve all parts of the corpus that contain it. For instance, the concept ‘flooding’ will lead to retrieving all tweets include the lemmas ‘flood’ or ‘deluge’.

Upon completion of step 2, we have related a portion of the corpus to each concept node. We then find the themes in each portion of the corpus using three parameters:

(3)
We apply the Latent Dirichlet Accuracy (LDA) model to find prevalent themes. The two parameters for this step are the number of themes and number of words per theme.
(4)
We gather words across themes into a single set of words. This set is cleaned by removing words that are already present in the set of derivationally related form of the node. In other words, we only look for concepts that the node could be associated with but not equivalent to.
(5)
Since concepts’ names are entities, a concept can only be associated with an entity. Consequently, we remove all non-entities from the words.
(6)
At this step, we have a set of entities that a concept node could be associated with. However, some of the entities may be noise rather than meaningful associations. We thus sort the entities by tf-idf (term-frequency inverse-document-frequency) computed over the set of tweets in which each word appears. We use a threshold parameter to identify which entities have a sufficient tf-idf to be selected.

Upon completion of step 2, we found entities that a concept node could be associated with. The final step goes back to the conceptual model to see if the association exists:

(7)
For each node, we compare its associated entities with its connected nodes and derivationally related forms. If there is a match, then the text corpus has confirmed an association between the two concepts. If no match is found, the association is not confirmed. Note that associated entities that do not match any connected nodes suggest additional connections, which is a different from validation as we seek to confirm existing connections.

This process is also depicted in Fig. 3, listing the libraries that can be used for each step. The specific versions of the libraries used in our experiments are included in Sect. 4.

4 Comparing Conceptual Models from Twitter and Expert Reports

4.1 Datasets and Pre-processing

The conceptual model that we seek to validate was developed with the Provincial Health Services Authority (PHSA) of British Columbia to explore the interrelationships involved in obesity and well-being. The model was presented in 2015 at the Canadian Obesity Summit [24] and tested with policy makers in 2016 [29]. The model is now part of the ActionableSystems tool [28] can be downloaded at https://osf.io/7ztwu/ within ‘Sample maps’ (file Drasic et al (edges).csv). The model consists of 98 nodes and 177 edges. From here on, we will refer to it as ‘the PHSA map’.

To validate the PHSA map, we used two datasets. Our first dataset (‘the twitter dataset’) consists of 6,633,625 tweets in the English language on obesity collected from Oct. 2, 2018 to Oct. 4, 2018. The number of tweets was chosen to be in line with comparable studies at the interface of natural language processing and obesity research [38,39,40]. The keywords to collect the tweets included each of the 98 concept names in the PHSA map as well as their synonyms automatically retrieved through WordNet. For instance, we used not only ‘obesity’ but also words such as ‘fatness’, ‘corpulent’, ‘embonpoint’ and ‘fleshiness’. Similarly, physical activity was expanded to include many forms such as calisthenics, isometrics, jogging, jump rope, and so on. The rationale is that the map contains abstract concepts, but individuals may speak of specific instances or use a variety of words to describe the same abstraction. After collecting a large number of tweets, natural language applications require extensive pre-processing. The impact of each options (and their interactions) on results obtained from Twitter has been extensively described when performing sentiment analysis [79,80,81] and in more generic tasks such as classification [82]. Some of these options are summarized in Fig. 4 and include the removal of parts deemed unnecessary for analysis (e.g., hashtags, URLs, numbers, non English words) or the mapping of data into forms that can be more conveniently processed (e.g., expanding acronyms and abbreviations, replacing emojis, spell checking). The pre-processing options used for our dataset are depicted in Fig. 5. These options are chosen specifically for our research question: for instance, we remove stop words because they cannot be meaningful concept names in a model, but other analyses (e.g., attributing tweets to specific writers) may have kept such words. The order of the steps also matters: for instance, we cannot perform part-of-speech tagging and lemmatization (step 5) before ensuring that all the words have been corrected (step 3). After pre-processing, our dataset included 1,791,333 tweets.

The second dataset is formed of three reports on obesity: the 2010 report from the white house task force on childhood obesity [83], the 2013 report to the Provincial Health Services Authority [84] and its 2015 update (whose findings are published in [24]). We combined the three reports with the PyPDF2 library, leading to 310 pages, and we kept 247 pages after removing those that were either blank or only contained images. Pages were then transformed into raw text using the pdftotext library and divded into 4,302 sentences using the full point (‘.’). Pre-processing was finally applied, using the same script as for tweets while noting that several options such as removing emojis would not be triggered. The resulting dataset had 3447 sentences.

4.2 Validating the Model for Each Dataset

The methods introduced in Sect. 3 are implemented in Python, relying on libraries as listed in Table 1. While our implementation was able to cope with millions of tweets, we note that a larger volume of data may also require a distributed database architecture and an efficient search engine such as Elasticsearch [85].

Table 1. Libraries used in each step (Sect. 3) of our experiments.

Full size table

Our approach has three parameters: number of themes, number of words per theme, and tf-idf threshold to eliminate noise. Hyperparameter optimization was thus necessary to use each dataset most efficiently, and fairly compare their potential in validating a model. To optimize performances with expert reports, we performed a grid search by varying the number of topics and words per topic from 5 to 50 in increments of 5, and we varied the tf-idf from 2 to 9 by increments of 1. This resulted in 800 combinations of parameter values. As there is randomness in the LDA model, we performed ten experiments per combination of parameter values, leading to a total of 8,000 experiments. At most, our process validated an average of 136.5 edges (77.11% of the map) using 50 topics, 50 words per topic, and a td-idf threshold of 8 (Fig. 6).

A grid search was also performed on the Twitter dataset. However, our current implementation takes approximately five days to compute the results for one combination of parameter values (single experiment), using a server-grade workstation (Dual Xeon Gold 6140). Given this limitation, we used single experiments and a coarser grid. At most, our process validated 101 edges (57.06%) using 50 topics, 50 words per topic, and a tf-idf threshold of 9.

5 Discussion

A focus group with a few participants may only discuss some of the interrelationships at work in overweight and obesity, and may avoid sharing opinions that are potentially disapproved by others. In contrast, social media such as Twitter provide access to a massive number of participants who can use conditions of anonymity to share opinions more freely. Social web mining applied to Twiter thus comes with the potential to explore many interrelationships in an unobtrusive fashion. In particular, crowdsourcing over Twitter holds the promise of easily building large conceptual models, under the assumption that at least some groups of users will touch on each part of the model. Our study questions this potential and promises by analyzing whether millions of tweets are more useful to develop a conceptual model of obesity than a handful of reports.

Although conceptual models can be automatically compared [78], developing a model from each dataset (tweets vs. reports) and comparing them would not be able to tell us which one is ‘better’. Our study question thus requires a referential. We use a previously developed conceptual model of obesity and well-being to serve as referential, and we establish how much of this model would have been obtained if we used either tweets or reports. In other words, we measured the percentage of the model’s structure that is confirmed with each dataset.

While both datasets were able to cover over half of the model, we note that it only took three expert reports compared to using millions of tweets. In addition, despite the abundance of tweets, the three expert reports touched on more relationships. Within our application context, these results suggest that an exclusive reliance on social media may result in oversimplifying a complex system, thus limiting the potential to automatically develop models using such a source. We note that a comprehensive analysis across subjects and using a variety of maps would be needed to assess whether our results produced on one model (the Provincial Health Services Authority map) and one application subject (obesity) can be generalized to other models and subjects.

There are several limitations to this study, which we intend to address in our future research. First, one of the premises of big data research is that a large volume may compensate for many imperfections in the individual data points. Although we used a similar number of tweets to other studies at the interface of natural language processing and obesity research [38,39,40], it is possible that some of the interrelationships of the model we seek to validate are rare and thus only detectable in even larger datasets. Repeating this study with significantly larger datasets could elucidate this question. However, we then run into the second issue: our process to validate a causal map against textual data is very computational intensive. The search space to optimize the result is defined by three parameters which involve randomness, thus requiring several experiments for each combination of parameter values. On a server-grade workstation, a single combination with a CPU-based implementation requires in the order of days. Optimizing results and using larger datasets will thus require implementations that scale, with a particularly promising option consisting of a GPU-based implementation. Alternatively, we may reduce the search space if we can better characterize the impact that parameters generally have on the results and then devise more computational efficient processes. For instance, the tf-idf threshold plays an essential role in driving performances (Fig. 6) but may be replaced by additional pre-processing steps preventing the inclusion of noise, such as classifiers removing unwanted documents [87].

6 Conclusion

Both social media data and expert reports may be used to take into account popular perspectives and expert opinions when creating large conceptual models. In the case of obesity, we found that three expert reports discussed 77% of all possibilities while millions of tweets on obesity and its cognates covered fewer interrelationships. Creating models using social media only may thus result in an oversimplification of complex problems.

Notes

1.
While our focus is on analyzing the text provided by tweets, studies on Twitter that are primarily human- rather than computer-based are not exclusively content analyses. In the study of May and colleagues, the researchers created twitter accounts for fictional obese and non-obese characters. They evaluated whether the weight status mediated how other users would interact with them [47].
2.
There are several exceptions of studies employing smaller dataset. However, their objectives may not be to identify themes (which necessitates a large volume of tweets), thus they can accomplish their goals with a smaller dataset. A case in point is the work of Tiggemann and colleagues, who used 3,289 tweets to examine interactions between Twitter communities that promoted either a ‘thin ideal’ or health and fitness [55].

References

Centers for Disease Control and Prevention (CDC): Selected health conditions and risk factors, by age: United states, selected years 1988–1994 through 2015–2016
Google Scholar
Peralta, M., et al.: Prevalence and trends of overweight and obesity in older adults from 10 European Countries from 2005 to 2013. Scand. J. Public Health 46, 522–529 (2018). https://doi.org/10.1177/1403494818764810
Article Google Scholar
Lubbe, J.: Obesity and metabolic surgery in South Africa. S. Afr. Gastroenterology Rev. 16(1), 23–28 (2018)
Google Scholar
Wang, Y., Wang, L., Qu, W.: New national data show alarming increase in obesity and noncommunicable chronic diseases in China. Eur. J. Clin. Nutr. 71(1), 149 (2017)
Article Google Scholar
Ng, M., et al.: Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the global burden of disease study 2013. Lancet 384(9945), 766–781 (2014)
Article Google Scholar
Skinner, A.C., Perrin, E.M., Skelton, J.A.: Prevalence of obesity and severe obesity in US children, 1999-2014. Obesity 24(5), 1116–1123 (2016)
Article Google Scholar
Bleich, S.N., et al.: Interventions to prevent global childhood overweight and obesity: a systematic review. Lancet Diabetes Endocrinol. 6(4), 332–346 (2018)
Article Google Scholar
Hutchesson, M., et al.: eH ealth interventions for the prevention and treatment of overweight and obesity in adults: a systematic review with meta-analysis. Obes. Rev. 16(5), 376–392 (2015)
Article Google Scholar
Rajjo, T., et al.: Treatment of pediatric obesity: an umbrella systematic review. J. Clin. Endocrinol. Metab. 102(3), 763–775 (2017)
Google Scholar
Teixeira, P.J., et al.: Successful behavior change in obesity interventions in adults: a systematic review of self-regulation mediators. BMC Med. 13(1), 84 (2015)
Article MathSciNet Google Scholar
National Institute for Health and Care Excellence: Managing overweight and obesity in adults-lifestyle weight management services. NICE Public Health Guideline, 53 (2014)
Google Scholar
Blackburn, G.: Effect of degree of weight loss on health benefits. Obes. Res. 3(S2), 211s–216s (1995)
Article Google Scholar
Fink, D.S., Keyes, K.M.: Wrong answers: when simple interpretations create complex problems. In: Systems Science and Population Health, pp. 25–36 (2017)
Chapter Google Scholar
Frood, S., et al.: Obesity, complexity, and the role of the health system. Curr. Obes. Rep. 2(4), 320–326 (2013)
Article Google Scholar
Finegood, D.T.: The complex systems science of obesity. In: The Oxford Handbook of the Social Science of Obesity (2011)
Google Scholar
Rutter, H., et al.: The need for a complex systems model of evidence for public health. Lancet 390(10112), 2602–2604 (2017)
Article Google Scholar
Giabbanelli, P.J.: Analyzing the complexity of behavioural factors influencing weight in adults. In: Giabbanelli, P.J., Mago, V.K., Papageorgiou, E.I. (eds.) Advanced Data Analytics in Health. SIST, vol. 93, pp. 163–181. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77911-9_10
Chapter Google Scholar
Deck, P., Giabbanelli, P., Finegood, D.T.: Exploring the heterogeneity of factors associated with weight management in young adults. Can. J. Diabetes 37, S269–S270 (2013)
Article Google Scholar
Giabbanelli, P.J., Torsney-Weir, T., Mago, V.K.: A fuzzy cognitive map of the psychosocial determinants of obesity. Appl. Soft Comput. 12(12), 3711–3724 (2012)
Article Google Scholar
Jebb, S., Kopelman, P., Butland, B.: Executive summary: foresight ‘tackling obesities: future choices’ project. Obes. Rev. 8, vi–ix (2007)
Article Google Scholar
Xue, H., et al.: Applications of systems modelling in obesity research. Obes. Rev. 19(9), 1293–1308 (2018)
Article Google Scholar
Frerichs, L., et al.: Mind maps and network analysis to evaluate conceptualization of complex issues: a case example evaluating systems science workshops for childhood obesity prevention. Eval. Program Plan. 68, 135–147 (2018)
Article Google Scholar
Johnston, L.M., Matteson, C.L., Finegood, D.T.: Systems science and obesity policy: a novel framework for analyzing and rethinking population-level planning. Am. J. Public Health 104(7), 1270–1278 (2014)
Article Google Scholar
Drasic, L., Giabbanelli, P.J.: Exploring the interactions between physical well-being, and obesity. Can. J. Diabetes 39, S12–S13 (2015)
Article Google Scholar
Dubé, L., Du, P., McRae, C., Sharma, N., Jayaraman, S., Nie, J.-Y.: Convergent innovation in food through big data and artificial intelligence for societal-scale inclusive growth. Technol. Innov. Manag. Rev. 8, 49–65 (2018)
Article Google Scholar
Jha, S.K., Gold, R., Dube, L.: Convergent innovation platform to address complex social problems: a tiered governance model. In: Academy of Management Proceedings, Volume 2016, Academy of Management Briarcliff Manor, NY 10510 (2016)
Article Google Scholar
Finegood, D.T., Merth, T.D., Rutter, H.: Implications of the foresight obesity system map for solutions to childhood obesity. Obesity 18(S1), S13–S16 (2010)
Article Google Scholar
Giabbanelli, P.J., Baniukiewicz, M.: Navigating complex systems for policymaking using simple software tools. In: Giabbanelli, P.J., Mago, V.K., Papageorgiou, E.I. (eds.) Advanced Data Analytics in Health. SIST, vol. 93, pp. 21–40. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77911-9_2
Chapter Google Scholar
Giabbanelli, P., et al.: developing technology to support policymakers in taking a systems science approach to obesity and well-being. Obes. Rev. 17, 194–195 (2016)
Google Scholar
Owen, B., et al.: Understanding a successful obesity prevention initiative in children under 5 from a systems perspective. PloS one 13(3), e0195141 (2018)
Article Google Scholar
McGlashan, J., et al.: Quantifying a systems map: network analysis of a childhood obesity causal loop diagram. PloS one 11(10), e0165459 (2016)
Article Google Scholar
McGlashan, J., et al.: Comparing complex perspectives on obesity drivers: action-driven communities and evidence-oriented experts. Obes. Sci. Pract. 4, 575–581 (2018)
Article Google Scholar
Allender, S., et al.: A community based systems diagram of obesity causes. PLoS One 10(7), e0129683 (2015)
Article Google Scholar
Giles, B.G., et al.: Integrating conventional science and aboriginal perspectives on diabetes using fuzzy cognitive maps. Soc. Sci. Med. 64(3), 562–576 (2007)
Article Google Scholar
Voinov, A., et al.: Tools and methods in participatory modeling: selecting the right tool for the job. Environ. Model. Softw. 109, 232–255 (2018)
Article Google Scholar
Reddy, T., Giabbanelli, P.J., Mago, V.K.: The artificial facilitator: guiding participants in developing causal maps using voice-activated technologies. In: International Conference on Augmented Cognition (2019)
Google Scholar
So, J., et al.: What do people like to “share” about obesity? A content analysis of frequent retweets about obesity on twitter. Health Commun. 31(2), 193–206 (2016)
Article Google Scholar
Chou, W.Y.S., Prestin, A., Kunath, S.: Obesity in social media: a mixed methods analysis. Transl. Behav. Med. 4(3), 314–323 (2014)
Article Google Scholar
Shaw Jr., G., Karami, A.: Computational content analysis of negative tweets for obesity, diet, diabetes, and exercise. Proc. Assoc. Inf. Sci. Technol. 54(1), 357–365 (2017)
Article Google Scholar
Karami, A., et al.: Characterizing diabetes, diet, exercise, and obesity comments on twitter. Int. J. Inf. Manag. 38(1), 1–6 (2018)
Article Google Scholar
Giabbanelli, P.J., Adams, J., Pillutla, V.S.: Feasibility and framing of interventions based on public support: leveraging text analytics for policymakers. In: Meiselwitz, G. (ed.) SCSM 2016. LNCS, vol. 9742, pp. 188–200. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39910-2_18
Chapter Google Scholar
Harris, J.K., et al.: Communication about childhood obesity on twitter. Am. J. Public Health 104(7), e62–e69 (2014)
Article Google Scholar
Lydecker, J.A., et al.: Does this tweet make me look fat? A content analysis of weight stigma on twitter. Eat. Weight. Disord.-Stud. Anorex. Bulim. Obes. 21(2), 229–235 (2016)
Article Google Scholar
Lee, J.L., et al.: What are health-related users tweeting? A qualitative content analysis of health-related users and their messages on twitter. J. Med. Internet Res. 16(10), e237 (2014)
Article Google Scholar
Alnemer, K.A., et al.: Are health-related tweets evidence based? Review and analysis of health-related tweets on twitter. J. Med. Internet Res. 17(10), e246 (2015)
Google Scholar
De Gagne, J.C., et al.: Uncovering cyberincivility among nurses and nursing students on twitter: a data mining study. Int. J. Nurs. Stud. 89, 24–31 (2019)
Article Google Scholar
May, C.N., et al.: Weight loss support seeking on twitter: the impact of weight on follow back rates and interactions. Transl. Behav. Med. 7(1), 84–91 (2016)
Article MathSciNet Google Scholar
Turner-McGrievy, G.M., Beets, M.W.: Tweet for health: using an online social network to examine temporal trends in weight loss-related posts. Transl. Behav. Med. 5(2), 160–166 (2015)
Article Google Scholar
Sui, Z., et al.: Recent trends in intensive treatments of obesity: is academic research matching public interest? Surg. Obes. Relat. Dis. (2019). https://www.sciencedirect.com/science/article/pii/S1550728918311948
O’Leary, D.E.: Twitter mining for discovery, prediction and causality: applications and methodologies. Intell. Syst. Account. Financ. Manag. 22(3), 227–247 (2015)
Article Google Scholar
Boulos, M.N.K., et al.: Social web mining and exploitation for serious applications: technosocial predictive analytics and related technologies for public health, environmental and national security surveillance. Comput. Methods Programs Biomed. 100(1), 16–23 (2010)
Article Google Scholar
Paul, M.J., Dredze, M.: You are what you tweet: analyzing twitter for public health. Icwsm 20, 265–272 (2011)
Google Scholar
Eichstaedt, J.C., et al.: Psychological language on twitter predicts county-level heart disease mortality. Psychol. Sci. 26(2), 159–169 (2015)
Article Google Scholar
Ediger, D., et al.: Massive social network analysis: mining twitter for social good. In: 2010 39th International Conference on Parallel Processing, pp. 583–593. IEEE (2010)
Google Scholar
Tiggemann, M., et al.: Tweeting weight loss: a comparison of# thinspiration and# fitspiration communities on twitter. Body Image 25, 133–138 (2018)
Article Google Scholar
Culotta, A.: Estimating county health statistics with twitter. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1335–1344. ACM (2014)
Google Scholar
Abbar, S., Mejova, Y., Weber, I.: You tweet what you eat: studying food consumption through twitter. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3197–3206. ACM (2015)
Google Scholar
Alajajian, S.E., et al.: The lexicocalorimeter: gauging public health through caloric input and output on social media. PloS One 12(2), e0168893 (2017)
Article Google Scholar
Nguyen, Q.C., et al.: Building a national neighborhood dataset from geotagged twitter datafor indicators of happiness, diet, and physical activity. JMIR Public Health Surveill. 2(2), e158 (2016)
Article Google Scholar
Eke, P.I.: Using social media for research and public health surveillance. J. Dent. Res. 90(9), 1045 (2011)
Article Google Scholar
Patel, R., et al.: Social media use in chronic disease: a systematic review and novel taxonomy. Am. J. Med. 128(12), 1335–1350 (2015)
Article Google Scholar
Charles-Smith, L.E., et al.: Using social media for actionable disease surveillance and outbreak management: a systematic literature review. PloS One 10(10), e0139701 (2015)
Article Google Scholar
Waring, M.E., et al.: Social media and obesity in adults: a review of recent research and future directions. Curr. Diabetes Rep. 18(6), 34 (2018)
Article Google Scholar
Penn, A.: Moving from overwhelming to actionable complexity in population health policy: Can alife help? (2018)
Article Google Scholar
Silverman, E.: Bringing alife and complex systems science to population health research. Artif. Life 24(3), 220–223 (2018)
Article Google Scholar
Giabbanelli, P.J., Crutzen, R.: Using agent-based models to develop public policy about food behaviours: future directions and recommendations. Comput. Math. Methods Med. (2017). https://www.hindawi.com/journals/cmmm/2017/5742629/abs/
Giabbanelli, P., Crutzen, R.: An agent-based social network model of binge drinking among Dutch adults. J. Artif. Soc. Soc. Simul. 16(2), 10 (2013)
Article Google Scholar
Khademi, A., Zhang, D., Giabbanelli, P.J., Timmons, S., Luo, C., Shi, L.: An agent-based model of healthy eating with applications to hypertension. In: Giabbanelli, P.J., Mago, V.K., Papageorgiou, E.I. (eds.) Advanced Data Analytics in Health. SIST, vol. 93, pp. 43–58. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77911-9_3
Chapter Google Scholar
Zhang, D., et al.: Impact of different policies on unhealthy dietary behaviors in an urban adult population: an agent-based simulation model. Am. J. Public Health 104(7), 1217–1222 (2014)
Article Google Scholar
Giabbanelli, P.J., et al.: Modeling the influence of social networks and environment on energy balance and obesity. J. Comput. Sci. 3(1–2), 17–27 (2012)
Article Google Scholar
Verigin, T., Giabbanelli, P.J., Davidsen, P.I.: Supporting a systems approach to healthy weight interventions in British Columbia by modeling weight and well-being. In: Proceedings of the 49th Annual Simulation Symposium, Society for Computer Simulation International, p. 9 (2016)
Google Scholar
Fallah-Fini, S., et al.: Modeling us adult obesity trends: a system dynamics model for estimating energy imbalance gap. Am. J. Public Health 104(7), 1230–1239 (2014)
Article Google Scholar
Mago, V.K., et al.: Fuzzy cognitive maps and cellular automata: an evolutionary approach for social systems modelling. Appl. Soft Comput. 12(12), 3771–3784 (2012)
Article Google Scholar
Giabbanelli, P.J., Jackson, P.J., Finegood, D.T.: Modelling the joint effect of social determinants and peers on obesity among Canadian adults. In: Dabbaghian, V., Mago, V. (eds.) Theories and simulations of complex social systems. ISRL, vol. 52, pp. 145–160. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-39149-1_10
Chapter Google Scholar
Giabbanelli, P.J., Crutzen, R.: Creating groups with similar expected behavioural response in randomized controlled trials: a fuzzy cognitive map approach. BMC Med. Res. Methodol. 14(1), 130 (2014)
Article Google Scholar
Pillutla, V.S., Giabbanelli, P.J.: Iterative generation of insight from text collections through mutually reinforcing visualizations and fuzzy cognitive maps. Appl. Soft Comput. 76, 459–472 (2019)
Article Google Scholar
Giabbanelli, P.J., Jackson, P.J.: Using visual analytics to support the integration of expert knowledge in the design of medical models and simulations. Procedia Comput. Sci. 51, 755–764 (2015)
Article Google Scholar
Giabbanelli, P.J., Tawfik, A.A., Gupta, V.K.: Learning analytics to support teachers’ assessment of problem solving: a novel application for machine learning and graph algorithms. In: Ifenthaler, D., Mah, D.-K., Yau, J.Y.-K. (eds.) Utilizing Learning Analytics to Support Study Success, pp. 175–199. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-64792-0_11
Chapter Google Scholar
Jianqiang, Z., Xiaolin, G.: Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5, 2870–2879 (2017)
Article Google Scholar
Singh, T., Kumari, M.: Role of text pre-processing in twitter sentiment analysis. Procedia Comput. Sci. 89, 549–554 (2016)
Article Google Scholar
Symeonidis, S., Effrosynidis, D., Arampatzis, A.: A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis. Expert. Syst. Appl. 110, 298–310 (2018)
Article Google Scholar
Keerthi Kumar, H.M., Harish, B.S.: Classification of short text using various preprocessing techniques: an empirical evaluation. In: Sa, P.K., Bakshi, S., Hatzilygeroudis, I.K., Sahoo, M.N. (eds.) Recent Findings in Intelligent Computing Techniques. AISC, vol. 709, pp. 19–30. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8633-5_3
Chapter Google Scholar
Barnes, M.: Solving the problem of childhood obesity within a generation. White House Task Force on Childhood Obesity Report to the President, Washington, DC (2010)
Google Scholar
Daghofer, D.: From weight to well-being: time for shift in paradigms. Technical report, a discussion paper on the inter-relationships among obesity, overweight ... (2013)
Google Scholar
Shah, N., Willick, D., Mago, V.: A framework for social media data analytics using Elasticsearch and Kibana. Wirel. Netw., 1–9 (2009)
Google Scholar
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer (2010)
Google Scholar
Robinson, K., Mago, V.: Birds of prey: identifying lexical irregularities in spam on twitter. Wirel. Netw. 1–8 (2018). https://doi.org/10.1007/s11276-018-01900-9

Download references

Acknowledgments

The authors are indebted to Mitacs Canada for providing the financial support which allowed MS to perform this research at Furman University, while mentored by PJG (local advisor) and VKM (home advisor). Publication costs are supported by an NSERC Discovery Grant for VKM. We thank Chetan Harichandra Mendhe for gathering the tweets under supervision of VKM.

Contributions. MS wrote the scripts to generate the results and analyzed them. PJG wrote the manuscript and designed the methods. MS was advised by PJG and VKM, who jointly initiated the study. All authors read and approved of this manuscript.

Author information

Authors and Affiliations

Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
Mannila Sandhu & Vijay K. Mago
Computer Science Department, Furman University, Greenville, SC, USA
Philippe J. Giabbanelli

Authors

Mannila Sandhu
View author publications
You can also search for this author in PubMed Google Scholar
Philippe J. Giabbanelli
View author publications
You can also search for this author in PubMed Google Scholar
Vijay K. Mago
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe J. Giabbanelli .

Editor information

Editors and Affiliations

Computer Science, Towson University, Towson, MD, USA
Gabriele Meiselwitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sandhu, M., Giabbanelli, P.J., Mago, V.K. (2019). From Social Media to Expert Reports: The Impact of Source Selection on Automatically Validating Complex Conceptual Models of Obesity. In: Meiselwitz, G. (eds) Social Computing and Social Media. Design, Human Behavior and Analytics. HCII 2019. Lecture Notes in Computer Science(), vol 11578. Springer, Cham. https://doi.org/10.1007/978-3-030-21902-4_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-21902-4_31
Published: 08 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21901-7
Online ISBN: 978-3-030-21902-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics