Abstract
While user-generated online content (UGC) is increasingly available, public opinion studies are yet to fully exploit the abundance and richness of online data. This study contributes to the practical knowledge of user-generated online content and machine learning techniques that can be used for the analysis of UGC. For this purpose, we explore the potential of user-generated content and present an application of natural language pre-processing, text mining and sentiment analysis to the question of public satisfaction with healthcare systems. Concretely, we analyze 634 online comments reflecting attitudes towards healthcare services in different countries. Our analysis identifies the frequency of topics related to healthcare services in textual content of the comments and attempts to classify and rank national healthcare systems based on the respondents’ sentiment scores. In this paper, we describe our approach, summarize our main findings, and compare them with the results from cross-national surveys. Finally, we outline the typical limitations inherent in the analysis of user-generated online content and suggest avenues for future research.





Similar content being viewed by others
Availability of data and materials
Not applicable.
Code availability
Not applicable.
Notes
For a comprehensive comparison between social media and surveys, see Schober et al. [37].
Aaron Carol is a health services researcher and professor of pediatrics at Indiana University School of Medicine.
Austin Frakt is a director of the Partnered Evidence-Based Policy Resource Center at the V. A. Boston Healthcare System, associate professor with Boston University’s School of Public Health; and adjunct associate professor with the Harvard T. H. Chan School of Public Health.
A common notion in a computing: ‘garbage in garbage out’ highlights the importance of the quality of input data.
For an excellent discussion of the importance of validation see Grimmer and Stewart [15].
References
Badawy, A., & Ferrara, E. (2018). The rise of jihadist propaganda on social networks. Journal of Computational Social Science, 1(2), 453–470.
Blendon, R. J., Benson, J., Donelan, K., Leitman, R., Taylor, H., Koeck, C., & Gitterman, D. (1995). Who has the best health care system? A second look. Health Affairs, 14(4), 220–230.
Bleich, S. N., Özaltin, E., & Murray, C. J. (2009). How does satisfaction with the health-care system relate to patient experience? Bulletin of the World Health Organization, 87, 271–278.
Bonikowski, B. (2017). Big data: challenges and opportunities for comparative historical sociology. Trajectories Newsletter of the ASA Comparative and Historical Section, 28(2), 29–32.
Bonoli, G., & Palier, B. (1998). Changing the politics of social programmes: Innovative change in British and French welfare reforms. Journal of European Social Policy, 8(4), 317–330.
Cammett, M., Lynch, J., & Bilev, G. (2015). The influence of private health care financing on citizen trust in government. Perspectives on Politics, 13(4), 938–957.
Caren (2012). https://nealcaren.github.io/.
Cohen, G. (1996). Age and health status in a patient satisfaction survey. Social Science & Medicine, 42(7), 1085–1093.
Couper, M. P. (2011). The future of modes of data collection. Public Opinion Quarterly, 75(5), 889–908.
Enghoff, O., & Aldridge, J. (2019). The value of unsolicited online data in drug policy research. International Journal of Drug Policy, 73, 210–218.
Feinerer, I. (2008). An introduction to text mining in R. The Newsletter of the R Project volume 8/2, October 2008 8 (2008):19.
Gelissen, J. (2000). Popular support for institutionalised solidarity: A comparison between European welfare states. International Journal of Social Welfare, 9(4), 285–300.
Gevers, J., Gelissen, J., Arts, W., & Muffels, R. (2000). Public health care in the balance: Exploring popular support for health care systems in the European Union. International Journal of Social Welfare, 9(4), 301–321.
Golato, A. (2017). Naturally occurring data. The Routledge Handbook of Pragmatics (pp. 21–26). Routledge.
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297.
Groves, R. M. (2011). Three eras of survey research. Public Opinion Quarterly, 75(5), 861–871.
Hall, J. A., & Dornan, M. C. (1990). Patient sociodemographic characteristics as predictors of satisfaction with medical care: A meta-analysis. Social Science & Medicine, 30(7), 811–818.
Harford, T. (2014). Big data: A big mistake? Significance, 11(5), 14–19.
Havey, N. F. (2020). Partisan public health: How does political ideology influence support for COVID-19 related misinformation? Journal of Computational Social Science, 3(2), 319–342.
He, W., Tian, X., Tao, R., Zhang, W., Yan, G., & Akula, V. (2017). Application of social media analytics: a case of analyzing online hotel reviews. Online Information Review, 41, 921–935.
Hutto, C. J., & Eric, G. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI conference on weblogs and social media. 2014.
Japec, L., Kreuter, F., Berg, M., Biemer, P., Decker, P., Lampe, C., Lane, J., Cathy, O., & Usher, A. (2015). Big data in survey research: AAPOR task force report. Public Opinion Quarterly, 79(4), 839–880.
Jensen, C., & Naumann, E. (2016). Increasing pressures and support for public healthcare in Europe. Health Policy, 120(6), 698–705.
Kleinberg, B., van der Isabelle, V., & Paul, G. (2021). The temporal evolution of a far-right forum. Journal of Computational Social Science, 4(1), 1–23.
Kohl, J., & Wendt, C. (2004). Satisfaction with health care systems. A comparison of EU countries. In W. Glatzer, S. V. Below, & M. Stoffregen (Eds.), Challenges for Quality of Life in the Contemporary World (pp. 311–331). Kluwer Academic Publishers.
Kurian, J. C. (2015). Facebook use by the open access repository users. Online Information Review., 39, 903–922.
Manosevitch, E., & Walker, D. (2009, April). Reader comments to online opinion journalism: A space of public deliberation. In International Symposium on Online Journalism Vol. 10, pp. 1–30.
Missinne, S., Meuleman, B., & Bracke, P. (2013). The popular legitimacy of European healthcare systems: A multilevel analysis of 24 countries. Journal of European Social Policy, 23(3), 231–247.
Mossialos, E. (1997). Citizens’ views on health care systems in the 15 member states of the European Union. Health Economics, 6(2), 109–116.
Naeem, B., Khan, A., Beg, M. O., & Mujtaba, H. (2020). A deep learning framework for clickbait detection on social area network using natural language cues. Journal of Computational Social Science, 3, 1–13.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.
Piña-García, C. A., Mario-Siqueiros-García, J., Robles-Belmont, E., Carreón, G., Gershenson, C., & Amador-Díaz-López, J. (2018). From neuroscience to computer science: a topical approach on Twitter. Journal of Computational Social Science, 1(1), 187–208.
Rahmqvist, M., & Bara, A. C. (2010). Patient characteristics and quality dimensions related to patient satisfaction. International Journal for Quality in Health Care, 22(2), 86–92.
Robinson, K. M. (2001). Unsolicited narratives from the Internet: A rich source of qualitative data. Qualitative Health Research, 11(5), 706–714.
Ryan, G., & Bernard, H. (2003). Techniques to identify themes. Field Methods, 15(1), 85–109.
Santana, A. D. (2011). Online readers’ comments represent new opinion pipeline. Newspaper Research Journal, 32(3), 66–81.
Schober, M. F., Pasek, J., Guggenheim, L., Lampe, C., & Conrad, F. G. (2016). Social media analyses for social measurement. Public Opinion Quarterly, 80(1), 180–211.
Shahsavari, S., Holur, P., Wang, T., Tangherlini, T. R., & Roychowdhury, V. (2020). Conspiracy in the time of corona: Automatic detection of emerging COVID-19 conspiracy theories in social media and the news. Journal of Computational Social Science, 3(2), 279–317.
Souma, W., Vodenska, I., & Aoyama, H. (2019). Enhanced news sentiment analysis using deep learning methods. Journal of Computational Social Science, 2(1), 33–46.
van der Vegt, I., Maximilian, M., Paul, G., & Bennett, K. (2021). Online influence, offline violence: language use on YouTube surrounding the ‘Unite the Right’rally. Journal of Computational Social Science, 4(1), 333–354.
Uyheng, J., & Carley, K. M. (2020). Bots and online hate during the COVID-19 pandemic: Case studies in the United States and the Philippines. Journal of Computational Social Science, 3(2), 445–468.
Wang, A.H.-E., Mei-chun, L., Min-Hsuan, W., & Puma, S. (2020). Influencing overseas Chinese by tweets: text-images as the key tactic of Chinese propaganda. Journal of Computational Social Science, 3(2), 469–486.
Wendt, C., Kohl, J., Mischke, M., & Pfeifer, M. (2010). How do Europeans perceive their healthcare system? Patterns of satisfaction and preference for state involvement in the field of healthcare. European Sociological Review, 26(2), 177–192.
Acknowledgements
The author would like to thank Seppe vanden Broucke for his valuable suggestions in the earlier stages of the paper.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ruelens, A. Analyzing user-generated content using natural language processing: a case study of public satisfaction with healthcare systems. J Comput Soc Sc 5, 731–749 (2022). https://doi.org/10.1007/s42001-021-00148-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42001-021-00148-2