Skip to main content

Topic Modeling for Exploring Cancer-Related Coverage in Journalistic Texts

  • Conference paper
  • First Online:
Artificial Intelligence (BNAIC 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1021))

Included in the following conference series:

Abstract

Topic modeling has been used for many applications, but has not been applied to science and health communication research yet. In this paper, using topic modeling for this novel domain is explored, by investigating the coverage of cancer in news items from the New York Times since 1970 with the Latent Dirichlet Allocation (LDA) model. Content analysis of cancer in print media has been performed before, but at a much smaller scope and with manual rather than computational analysis. We collected 45.684 articles concerning cancer via the New York Times API to build the LDA model upon.

Our results show a predominance of breast cancer in news articles as compared with other types of cancer, similar to previous studies. Additionally, our topic model shows 6 distinct topics: research on cancer, lifestyle and mortality, the healthcare system, business and insurance issues regarding cancer treatment, environmental politics and American politics on cancer-related policies.

Since topic modeling is a computational technique, the model has more difficulty with understanding the meaning of the analyzed text than (most) humans. Therefore, future research will be set up to let the public contribute to analysis of a topic model.

N. Hariman and M. de Vries—Contributed equally to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Python code can be found at github.com/NHariman/LDA-model-SCS-2018.

References

  1. Greenberg, R.H., Freimuth, V.S., Bratick, E.A.: A content analytic study of daily newspaper coverage of cancer. Commun. Yearb. 3(8985), 645–654 (1979)

    Google Scholar 

  2. Freimuth, V.S., Greenberg, R.H., DeWitt, J., Romano, R.M.: Covering cancer: newspapers and the public interest. J. Commun. 34(1), 62–73 (1984)

    Article  Google Scholar 

  3. Clarke, J.N., Everest, M.M.: Cancer in the mass print media: fear, uncertainty and the medical model. Soc. Sci. Med. 62(10), 2591–2600 (2006)

    Article  Google Scholar 

  4. Musso, E., Wakefield, S.E.L.: “Tales of mind over cancer”: cancer risk and prevention in the canadian print media. Health, Risk Soc. 11(1), 17–38 (2009)

    Article  Google Scholar 

  5. The New York Times Developer Network. https://developer.nytimes.com/. Accessed 28 Aug 2018

  6. Lau, J., Collier, N., Baldwin, T.: On–line trend analysis with topic models: #twitter trends detection topic model online. In: Proceedings of COLING 2012: Technical Papers, pp. 1519–1534 (2012)

    Google Scholar 

  7. Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: TopicSketch: real–time bursty topic detection from Twitter. In: IEEE 13th International Conference on Data Mining, pp. 837–846 (2013)

    Google Scholar 

  8. Fang, A., Ounis, I., Habel, P., Macdonald, C., Limsopatham, N.: Topic–centric classification of Twitter user’s political orientation. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 791–794 (2015)

    Google Scholar 

  9. Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), e73791 (2013)

    Article  Google Scholar 

  10. Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 448–456 (2011)

    Google Scholar 

  11. Jacobi, C., van Atteveldt, W., Welbers, K.: Quantitative analysis of large amounts of journalistic texts using topic modelling. Digital J. 4(1), 89–106 (2016)

    Article  Google Scholar 

  12. Nltk.corpus package. https://www.nltk.org/api/nltk.corpus.html. Accessed 28 Aug 2018

  13. Hong, L., Davison, B.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88 (2010)

    Google Scholar 

  14. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50 (2010)

    Google Scholar 

  15. Spacy. https://spacy.io/. Accessed 28 Aug 2018

  16. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    Google Scholar 

  17. PyLDAvis, https://pyldavis.readthedocs.io/. Accessed 28 Aug 2018

  18. Sievert, C., Shirley, K.E.: LDAvis: a method for visualizing and interpreting topics. In: Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pp. 63–70 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hariman, N., de Vries, M., Smeets, I. (2019). Topic Modeling for Exploring Cancer-Related Coverage in Journalistic Texts. In: Atzmueller, M., Duivesteijn, W. (eds) Artificial Intelligence. BNAIC 2018. Communications in Computer and Information Science, vol 1021. Springer, Cham. https://doi.org/10.1007/978-3-030-31978-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31978-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31977-9

  • Online ISBN: 978-3-030-31978-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics