Abstract
Topic modeling has been used for many applications, but has not been applied to science and health communication research yet. In this paper, using topic modeling for this novel domain is explored, by investigating the coverage of cancer in news items from the New York Times since 1970 with the Latent Dirichlet Allocation (LDA) model. Content analysis of cancer in print media has been performed before, but at a much smaller scope and with manual rather than computational analysis. We collected 45.684 articles concerning cancer via the New York Times API to build the LDA model upon.
Our results show a predominance of breast cancer in news articles as compared with other types of cancer, similar to previous studies. Additionally, our topic model shows 6 distinct topics: research on cancer, lifestyle and mortality, the healthcare system, business and insurance issues regarding cancer treatment, environmental politics and American politics on cancer-related policies.
Since topic modeling is a computational technique, the model has more difficulty with understanding the meaning of the analyzed text than (most) humans. Therefore, future research will be set up to let the public contribute to analysis of a topic model.
N. Hariman and M. de Vries—Contributed equally to this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Python code can be found at github.com/NHariman/LDA-model-SCS-2018.
References
Greenberg, R.H., Freimuth, V.S., Bratick, E.A.: A content analytic study of daily newspaper coverage of cancer. Commun. Yearb. 3(8985), 645–654 (1979)
Freimuth, V.S., Greenberg, R.H., DeWitt, J., Romano, R.M.: Covering cancer: newspapers and the public interest. J. Commun. 34(1), 62–73 (1984)
Clarke, J.N., Everest, M.M.: Cancer in the mass print media: fear, uncertainty and the medical model. Soc. Sci. Med. 62(10), 2591–2600 (2006)
Musso, E., Wakefield, S.E.L.: “Tales of mind over cancer”: cancer risk and prevention in the canadian print media. Health, Risk Soc. 11(1), 17–38 (2009)
The New York Times Developer Network. https://developer.nytimes.com/. Accessed 28 Aug 2018
Lau, J., Collier, N., Baldwin, T.: On–line trend analysis with topic models: #twitter trends detection topic model online. In: Proceedings of COLING 2012: Technical Papers, pp. 1519–1534 (2012)
Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: TopicSketch: real–time bursty topic detection from Twitter. In: IEEE 13th International Conference on Data Mining, pp. 837–846 (2013)
Fang, A., Ounis, I., Habel, P., Macdonald, C., Limsopatham, N.: Topic–centric classification of Twitter user’s political orientation. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 791–794 (2015)
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), e73791 (2013)
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 448–456 (2011)
Jacobi, C., van Atteveldt, W., Welbers, K.: Quantitative analysis of large amounts of journalistic texts using topic modelling. Digital J. 4(1), 89–106 (2016)
Nltk.corpus package. https://www.nltk.org/api/nltk.corpus.html. Accessed 28 Aug 2018
Hong, L., Davison, B.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88 (2010)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50 (2010)
Spacy. https://spacy.io/. Accessed 28 Aug 2018
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
PyLDAvis, https://pyldavis.readthedocs.io/. Accessed 28 Aug 2018
Sievert, C., Shirley, K.E.: LDAvis: a method for visualizing and interpreting topics. In: Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pp. 63–70 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hariman, N., de Vries, M., Smeets, I. (2019). Topic Modeling for Exploring Cancer-Related Coverage in Journalistic Texts. In: Atzmueller, M., Duivesteijn, W. (eds) Artificial Intelligence. BNAIC 2018. Communications in Computer and Information Science, vol 1021. Springer, Cham. https://doi.org/10.1007/978-3-030-31978-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-31978-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31977-9
Online ISBN: 978-3-030-31978-6
eBook Packages: Computer ScienceComputer Science (R0)