skip to main content
10.1145/3351108.3351134acmotherconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

A Computational Analysis of News Media Bias: A South African Case Study

Published: 17 September 2019 Publication History

Abstract

News media in South Africa is assumed to be unbiased and objective in their reporting of the news. Indeed, editors are required to uphold an objective and balanced view with no favour to external political or corporate interests. This assumption of objectivity is tested on a large scale by computationally analysing 30 000 articles published by five media houses: News24, SABC, EWN, ENCA, and IOL. Using topic modelling, 38 topics are extracted from the corpus, and sentiment is computed for each topic. The study highlights various cases of both over and under-reporting by media houses on particular topics. We also identify various tonality biases by media houses.

References

[1]
Taylor Arnold. 2017. A Tidy Data Model for Natural Language Processing using cleanNLP., 20 pages. https://journal.r-project.org/archive/2017/RJ-2017-035/index.html
[2]
Veni Madhavan C.E. Narasimha Murthy M.N Arun R., Suresh V. 2009. On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations. Mycological Research 113, 2 (2009), 207--221.
[3]
Anthony Aue and Michael Gamon. 2005. Customizing Sentiment Classifiers to New Domains: A Case Study. https://www.researchgate.net/publication/215470666
[4]
Bhagyashree Vyankatrao Barde and Anant Madhavrao Bainwad. 2017. An Overview of Topic Modeling Methods and Tools. (2017), 745--750.
[5]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003), 993--1022.
[6]
Juan Cao, Tian Xia, Jintao Li, Yongdong Zhang, and Sheng Tang. 2009. A density-based method for adaptive LDA model selection. Neurocomputing 72, 7-9 (2009), 1775--1781.
[7]
Juan Cao, Tian Xia, Jintao Li, Yongdong Zhang, and Sheng Tang. 2009. A density-based method for adaptive LDA model selection. Neurocomputing 72, 7-9 (2009), 1775--1781.
[8]
Romain Deveaud, Eric SanJuan, and Patrice Bellot. 2014. Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique 17, 1 (2014), 61--84.
[9]
Jakob Moritz Eberl, Hajo G Boomgaarden, and Markus Wagner. 2015. One Bias Fits All? Three Types of Media Bias and Their Effects on Party Preferences. Communication Research 44, 8 (2015), 1125--1148.
[10]
Edelman. 2019. 19th Annual Edelman Trust Barometer: Global Report. Technical Report. Edelman, Edelman2019. 66 pages. https://www.edelman.com/sites/g/files/aatuss191/files/2019-03/2019_Edelman_Trust_Barometer_Global_Report.pdf?utm_source=website&utm_medium=global_report&utm_campaign=downloads
[11]
Robert M. Faris, Hal Roberts, Bruce Etling, Nikki Bourassa, Ethan Zuckerman, and Yochai Benkler. 2017. Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election. Technical Report. Berkman Klein Center for Internet & Society. 142 pages. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33759251
[12]
Martin Gerlach, Tiago P. Peixoto, and Eduardo G. Altmann. 2018. A network approach to topic models. Science Advances 4, 7 (2018).
[13]
Yigal Godler, Zvi Reich, and Boaz Miller. 2018. Social epistemology as a new paradigm for journalism and media studies. New Media and Society (2018), 1--18.
[14]
Thomas Hofmann. 1999. Probabilistic Latent Semantic Indexing. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI '99). Morgan Kaufmann Publishers Inc., Stockholm, Sweden, 289--296.
[15]
C Richard Hofstetter. 1976. Bias in The News: Network Television News Coverage of the 1972 Election Campaign. Ohio State University Press.
[16]
Minqing Hu and Bing Liu. 2004. Mining and Summarizing Customer Reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04). ACM, New York, NY, USA, 168--177.
[17]
Matthew L. Jockers. 2015. Extracts Sentiment and Sentiment-Derived Plot Arcs from Text., 12 pages. https://cran.r-project.org/web/packages/syuzhet/syuzhet.pdf
[18]
Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 530--539.
[19]
H. P. Luhn. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2, 2 (1958), 159--165.
[20]
Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2015. The Stanford CoreNLP Natural Language Processing Toolkit.
[21]
Fiona Martin and Mark Johnson. 2015. More Efficient Topic Modelling Through a Noun Only Approach. In Proceedings of Australasian Language Technology Association Workshop. 111--115.
[22]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv.1301.3781 (2013). http://arxiv.org/abs/1301.3781
[23]
MonkeyLearn. 2018. Sentiment Analysis: Nearly Everything you Need to Know.
[24]
Andrea Morandi, Marceau Limousin, Jack Sayers, Sunil R. Golwala, Nicole G. Czakon, Elena Pierpaoli, Eric Jullo, Johan Richard, and Silvia Ameglio. 2014. Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis. JMLR: W&CP (2014).
[25]
Lincoln A Mullen, Kenneth Benoit, Os Keyes, Dmitry Selivanov, and Jeffrey Arnold. 2018. Fast, Consistent Tokenization of Natural Language Text. Journal of Open Source Software 3, 23 (2018), 655.
[26]
Maurizio Naldi. 2019. A review of sentiment computation methods with R packages. http://arxiv.org/abs/1901.08319
[27]
Murzintcev Nikita. 2016. ldatuning: Tuning of the Latent Dirichlet Allocation Models Parameters. https://cran.r-project.org/package=ldatuning
[28]
Gloria Origgi. 2018. Say Goodbye To The Information Age: It's All About Reputation Now. https://www.fastcompany.com/40565050/say-goodbye-to-the-information-age-its-all-about-reputation-now
[29]
Christos H Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. 1998. Latent Semantic Indexing: A Probabilistic Analysis. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '98). ACM, Seattle, Washington, USA, 159--168.
[30]
Tyler W Rinker. 2018. {lexicon}: Lexicon Data. http://github.com/trinker/lexicon
[31]
Tyler W Rinker. 2018. {sentimentr}: Calculate Text Polarity Sentiment. http://github.com/trinker/sentimentr
[32]
Tyler W Rinker. 2018. {textclean}: Text Cleaning Tools. https://github.com/trinker/textclean
[33]
Margaret Roberts, Brandon M. Stewart, Dustin Tingley, and Edoardo Airoldi. 2013. The structural topic model and applied social science. In NIPS 2013 Workshop on Topic Models (NIPS '13).
[34]
Milan Straka and Jana Straková. 2017. Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe.
[35]
The Press Ombudsman. {n. d.}. Code of Ethics and Conduct for South African Print and Online Media. https://presscouncil.org.za/ContentPage?code=PRESSCODE
[36]
Christian Wartena and Rogier Brussee. 2008. Topic detection by clustering keywords. Belgian/Netherlands Artificial Intelligence Conference May 2014 (2008), 379--380.
[37]
Herman Wasserman, Wallace Chuma, and Tanja Bosch. 2018. Print media coverage of service delivery protests in South Africa: A content analysis. African Studies 1 (2018), 145--156.
[38]
Hadley Wickham. 2019. rvest: Easily Harvest (Scrape) Web Pages. https://cran.r-project.org/package=rvest
[39]
Jan Wijffels. 2019. udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit. https://cran.r-project.org/package=udpipe
[40]
Yi Zhang, Jie Lu, Feng Liu, Qian Liu, Alan Porter, Hongshu Chen, and Guangquan Zhang. 2018. Does deep learning help topic extraction? A kernel k-means clustering method with word embedding. Journal of Informetrics 12, 4 (2018), 1099--1117.
[41]
George Kingsley Zipf. 2006. The psycho-biology of language: An introduction to dynamic philology. Routledge.

Cited By

View all
  • (2023)Protests and Media Representations: An Intersectional Analysis of the Marikana Massacre (2012), the Johannesburg Protests, and the Phoenix Massacre (2021)E-Journal of Humanities, Arts and Social Sciences10.38159/ehass.2023415(59-77)Online publication date: 8-Feb-2023
  • (2023)An investigation of media reports of digital surveillance within the first year of the COVID-19 pandemicFrontiers in Digital Health10.3389/fdgth.2023.12156855Online publication date: 24-Jul-2023
  • (2022)Devotees on an Astroturf: Media, Politics, and Outrage in the Suicide of a Popular FilmStarProceedings of the 5th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies10.1145/3530190.3534801(453-475)Online publication date: 29-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SAICSIT '19: Proceedings of the South African Institute of Computer Scientists and Information Technologists 2019
September 2019
352 pages
ISBN:9781450372657
DOI:10.1145/3351108
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NLP
  2. media bias
  3. news
  4. south-african politics
  5. topic modelling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SAICSIT '19

Acceptance Rates

Overall Acceptance Rate 187 of 439 submissions, 43%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)4
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Protests and Media Representations: An Intersectional Analysis of the Marikana Massacre (2012), the Johannesburg Protests, and the Phoenix Massacre (2021)E-Journal of Humanities, Arts and Social Sciences10.38159/ehass.2023415(59-77)Online publication date: 8-Feb-2023
  • (2023)An investigation of media reports of digital surveillance within the first year of the COVID-19 pandemicFrontiers in Digital Health10.3389/fdgth.2023.12156855Online publication date: 24-Jul-2023
  • (2022)Devotees on an Astroturf: Media, Politics, and Outrage in the Suicide of a Popular FilmStarProceedings of the 5th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies10.1145/3530190.3534801(453-475)Online publication date: 29-Jun-2022
  • (2020)Undergoing the experiences of the COVID-19 pandemic as ruptures in American civil society create conditions for right actionReflective Practice10.1080/14623943.2020.1821632(1-17)Online publication date: 16-Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media