Automated news recommendation in front of adversarial examples and the technical limits of transparency in algorithmic accountability

Descampe, Antonin; Massart, Clément; Poelman, Simon; Standaert, François-Xavier; Standaert, Olivier

doi:10.1007/s00146-021-01159-3

Automated news recommendation in front of adversarial examples and the technical limits of transparency in algorithmic accountability

Original Article
Published: 13 March 2021

Volume 37, pages 67–80, (2022)
Cite this article

AI & SOCIETY Aims and scope Submit manuscript

Antonin Descampe¹,
Clément Massart²,
Simon Poelman²,
François-Xavier Standaert² &
…
Olivier Standaert¹

1255 Accesses
9 Citations
Explore all metrics

Abstract

Algorithmic decision making is used in an increasing number of fields. Letting automated processes take decisions raises the question of their accountability. In the field of computational journalism, the algorithmic accountability framework proposed by Diakopoulos formalizes this challenge by considering algorithms as objects of human creation, with the goal of revealing the intent embedded into their implementation. A consequence of this definition is that ensuring accountability essentially boils down to a transparency question: given the appropriate reverse-engineering tools, it should be feasible to extract design criteria and to identify intentional biases. General limitations of this transparency ideal have been discussed by Ananny and Crawford (New Media Soc 20(3):973–989, 2018). We further focus on its technical limitations. We show that even if reverse-engineering concludes that the criteria embedded into an algorithm correspond to its publicized intent, it may be that adversarial behaviors make the algorithm deviate from its expected operation. We illustrate this issue with an automated news recommendation system, and show how the classification algorithms used in such systems can be fooled with hard-to-notice modifications of the articles to classify. We therefore suggest that robustness against adversarial behaviors should be taken into account in the definition of algorithmic accountability, to better capture the risks inherent to algorithmic decision making. We finally discuss the various challenges that this new technical limitation raises for journalism practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enabling Accountability of Algorithmic Media: Transparency as a Constructive and Critical Lens

Algorithms and Media Ethics in the AI Age

News Recommendation and Information Cocoons: The Impact of Algorithms on News Consumption

Notes

https://www.vie-publique.fr/sites/default/files/rapport/pdf/144000541.pdf.
As discussed by Katz and Lindell, “a common mistake is to think that definitions are not needed or are trivial to come with, because everyone has an intuitive idea of what” (for example) “security means”. It turns out specifying what (for example) security means has been an iterative process where definitions were introduced, invalidated thanks to counter-examples and refined. As our following discussions will show, a similar situation holds with algorithmic accountability.
Alternatively, we selected this dictionary based on the “Term Frequency—Inverse Document Frequency” (TFIDF) in order to detect salient words for articles with different tags (Ramos 2003). Both options gave similar results. Our results are for the most frequent words method.
https://www.mcafee.com/blogs/other-blogs/mcafee-labs/model-hacking-adas-to-pave-safer-roads-for-autonomous-vehicles/
Since our study is based on a French-speaking newspaper, the article is translated.
Yet, as will be mentioned next, transparency may facilitate the generation and exploitation of adversarial examples, and therefore make the robustness requirement harder to reach.

References

Ananny M, Crawford K (2018) Seeing without knowing: limitations of the transparency ideal and its application to algorithmic accountability. New Media Soc 20(3):973–989
Google Scholar
Anderson C (2012) Towards a sociology of computational and algorithmic journalism. New Media Soc 15(7):1005–1021
Google Scholar
Araujo T, Helberger N, Kruikemeier S, de Vreese CH (2019) In AI we trust? Perceptions about automated decision-making by artificial intelligence. AI Soc 35:611–623
Google Scholar
Arnt A, Zilberstein S (2003) Learning to perform moderation in online forums. Web Intell 2003:637–641
Google Scholar
Barocas S, Hardt M, Narayanan A (2019). Fairness in machine learning, https://fairmlbook.org/
Beck U (1992) Risk society: towards a new modernity. Sage, London
Google Scholar
Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines. Proc ICML 2012:1467–1474
Google Scholar
Bishop CM (2007) Pattern recognition and machine learning, 5th edn. Springer, Berlin
MATH Google Scholar
Bodó B (2019) Selling news to audiences—a qualitative inquiry into the emerging logics of algorithmic news personalization in European quality news media. Digit Journal 7(8):1054–1075
Google Scholar
Broussard M (2018) Artificial unintelligence: how computers misunderstand the world. MIT Press, Cambridge
Google Scholar
Broussard M, Diakopoulos N, Guzman AL, Abebe R, Dupagne M, Chuan C-H (2019) Artificial intelligence and journalism. Journal Mass Commun Q 96(3):673–695
Google Scholar
Bucher T (2018) If then: algorithmic power and politics. Oxford University Press, Oxford
Google Scholar
Burrel J (2016) How the machine ‘thinks’: understanding opacity in machine learning algorithms. Big Data Soc. https://doi.org/10.1177/2053951715622512
Article Google Scholar
Carlson M, Lewis SC (2015) Boundaries of journalism: professionalism, practices, and participation. Routledge, UK
Google Scholar
Coddington M (2015) Clarifying journalism’s quantitative turn. Digit Journal 3(3):331–348
Google Scholar
Crain M (2018) The limits of transparency: data brokers and commodification. New Media Soc 20(1):88–104
Google Scholar
Crawford K, Schultz J (2014) Big data and due process: toward a framework to redress predictive privacy harms. Boston College Law Rev 55:93
Google Scholar
Dagiral E, Parasie S (2016) La « Science des Données » à la Conquête des Mondes Sociaux. Ce que le « Big Data » doit aux Épistémologies Locales, In Big Data et Traçabilité Numérique. Les Sciences Sociales Face à la Quantification Massive des Individus, 85–104, Collège de France, Paris.
Datta A, Tschantz MC, Datta A (2015) Automated experiments on ad privacy settings: a tale of opacity, choice, and discrimination. PoPETs 1:92–112
Google Scholar
Diakopoulos N (2015) Algorithmic accountability: journalistic investigation of computational power structures. Digit Journal 3(3):398–415
Google Scholar
Diakopoulos N (2019a) Automating the news: how algorithms are rewriting the media. Harvard University Press, Cambridge
Google Scholar
Diakopoulos N (2019b) Towards a design orientation on algorithms and automation in news production. Digit Journal 3(3):1180–1184
Google Scholar
Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D (2018) Robust physical-world attacks on deep learning visual classification. Proc CVPR 2018:1625–1634
Google Scholar
Giddens A (1992) Risk and responsibility. Mod L Review 62(1):1–10
Google Scholar
Goodfellow IJ, McDaniel PD, Papernot N (2018) Making machine learning robust against adversarial inputs. Commun ACM 61(7):56–66
Google Scholar
Hassan N, Arslan F, Li C, Tremayne M (2017) Toward automated fact-checking: detecting check-worthy factual claims by ClaimBuster. Proc KDD 2017:1803–1812
Google Scholar
Helberger N (2019) On the Democratic Role of News Recommenders. Digital J 7(8):993–1012. https://doi.org/10.1080/21670811.2019.1623700
Karlsson M (2010) Rituals of transparency: evaluating online news outlets’ uses of transparency rituals in the United States, United Kingdom and Sweden. Journal Studies 11:535–545
Google Scholar
Karpinnen KE (2018) Journalism, Pluralism and diversity. Journalism, De Gruyter
Katz J, Lindell Y (2015) Introduction to modern cryptography, Second Edition. CRC Press 2014
Kormelink TG, Meijer IC (2014) Tailor-made news: meeting the demands of news users on mobile and social media. Journal Studies 15(5):632–641
Google Scholar
Kunert J, Thurman N (2019) The Form of Content Personalisation at Mainstream Transatlantic News outletsm. Journal Pract 13(7):759–780
Google Scholar
Lai S, Liheng X, Liue K, Zhao J (2015) Recurrent convolutional neural networks for text classification. Proc IAAA 2015:2267–2273
Google Scholar
Lepri B, Oliver N, Letouzé E, Pentland A, Vinck P (2018) Fair, transparent, and accountable algorithmic decision-making processes. Philos Technol 31(4):611–627
Google Scholar
Lewis SC, Usher N (2013) Open source and journalism: toward new frameworks for imagining news innovation. Media Cult Soc 35(5):602–619
Google Scholar
Lewis SC, Westlund O (2016) Mapping the human–machine divide in journalism, sage handbook of digital journalism. Sage, California
Google Scholar
Lewis SC, Guzman AL, Schmidt TR (2019) Automation, journalism, and human–machine communication: rethinking roles and relationships of humans and machines in news digital journalism. Sage, California
Google Scholar
López-Cózar ED, Robinson-García N, Torres-Salinas D (2014) The google scholar experiment: how to index false papers and manipulate bibliometric indicators. JASIST 65(3):446–454
Google Scholar
McGregor L, Murray D, Vivian NG (2019) International human rights law as a framework for algorithmic accountability. ICLQ 68(2):309–343
Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space https://arxiv.org/abs/1301.3781
Milano S, Taddeo M, Floridi L (2020) Recommender systems and their ethical challenges. AI Soc 35:957–967
Google Scholar
Milosavljević M, Vobič I (2019) Human still in the loop: editors reconsider the ideals of professional journalism through automation. Digit Journal 7(8):1098–1116
Google Scholar
Mittelstadt B (2016) Automation, algorithms, and politics: auditing for transparency in content personalization systems. Int J Commun 10:12
Google Scholar
Nielsen R (2016) The many crises of western journalism. A comparative analysis of economic crises, professional crises, and crises of confidence. The crisis of journalism reconsidered. Cambridge University Press, Cambridge
Google Scholar
Perra N, Rocha LEC (2019) Modelling opinion dynamics in the age of algorithmic personalisation. Nat Sci Rep 9(1):7261
Google Scholar
Ramos J (2003) Using TF-IDF to determine word relevance in document queries, Proceedings of the First Instructional Conference on Machine Learning, pp 133–142
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
MATH Google Scholar
Sandvig C, Hamilton K, Karahalios K, Langbort C (2014). Auditing algorithms: research methods for detecting discrimination on internet platforms, data and discrimination: converting critical concerns into productive inquiry, 22
Steinhardt J, Pang W, KohLiang WSP (2017) Certified defenses for data poisoning attacks. Proc NIPS 2017:3517–3529
Google Scholar
Thurman N, Moeller J, Helberger N, Trilling D (2018) My friends, editors, algorithms, and i: examining audience attitudes to news selection, digital. Journalism 7(4):447–469
Google Scholar
Thurman N, Lewis SC, Kunert J (2019) Algorithms, automation, and news, digital. Journalism 7(8):980–992
Google Scholar
Tramèr F, Papernot N, Goodfellow I, Boneh D, McDaniel P (2017). The space of transferable adversarial examples https://arxiv.org/abs/1704.03453
Tramèr F, Kurakin A, Papernot N, Goodfellow IJ, Boneh D, McDaniel PD (2018) Ensemble adversarial training: attacks and defenses ICLR, 20p
Ward S (2015) Radical media ethics: a global perspective. Wiley, New Jersey
Google Scholar
Ward S (2018) Epistemologies of journalism. Journalism 19:63–82
Google Scholar
Wing J (2008) Computational thinking and thinking about computing, philosophical transactions. Series A. Math Phys Eng Sci 366:3717–3725
MATH Google Scholar

Download references

Acknowledgements

François-Xavier Standaert is a senior associate researcher of the Belgium Fund for Scientific Research. This work has been funded in parts by the European Union though the ERC Consolidator Grant SWORD (724725).

Author information

Authors and Affiliations

Research observatory on media & journalism (ORM), Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium
Antonin Descampe & Olivier Standaert
Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium
Clément Massart, Simon Poelman & François-Xavier Standaert

Authors

Antonin Descampe
View author publications
You can also search for this author in PubMed Google Scholar
Clément Massart
View author publications
You can also search for this author in PubMed Google Scholar
Simon Poelman
View author publications
You can also search for this author in PubMed Google Scholar
François-Xavier Standaert
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Standaert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Antonin Descampe or François-Xavier Standaert.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: details on the classifiers

The multinomial NB classifier was used directly on the histograms made of 20,000 words provided by the bag of words NLP. The only technical tweak was the use of a smoothing factor of 0.09 to deal with words having estimated probability zero.

For the MLP classifier, we selected the best parameters thanks to a grid search set to optimize the classifier’s accuracy, with one to four layers and number of neurons per layer ranging from 10 to 1,000). Several solutions led to similar results and we eventually selected a classifier with 3 layers: the first one uses 141 neurons (corresponding to the square root of the 20,000 words output by the bag of words NLP) with a relu activation function, the last one last one uses 7 neurons (i.e., our number of classes) with a tanh activation layer, and the hidden layer uses 31 neurons (which corresponds to the geometric mean between 141 and 7, i.e., \(\sqrt{141\times 7}\)) and a relu activation layer. Using more layers did not lead to significant concrete improvements in our case study.

Finally, we used different number and types of layers for the RNN: bidirectional, convolutional (combined with pooling), dense, LSTM (Long Short-Term Memory) with dropout. The one that provided the best results in our context used the next parameters:

Layer (type)	Output shape	# of params
Embedding_2	(None,100,200)	4,000,200
Bidirectional_2	(None,100,200)	180,600
Conv1d_2	(None,98,5000)	3,005,000
Global_max_pooling_1d_2	(None,5000)	0
Dense_3	(None,100)	500,100
Dropout_2	(None,100)	0
Dense_4	(None,7)	707

Total params: 7,686,607
Trainable params: 3,686,407
Non-trainable params: 4,000,200

The implementations of all the machine learning tools that we used are based on the scikit-learn library available at the address: https://scikit-learn.org/stable/. As for the NLP part of the tool, we used the SnowballStemmer library for the words stemming (https://kite.com/python/docs/nltk.SnowballStemmer), in which the stopwords removal can be activated as an option, and we used the Word2Vec models made available by Jean-Philippe Fauconnier for the words embedding (http://fauconnier.github.io/). We report the different learning curves of the three combinations of NLP and ML tools in the Fig. 4 As can be observed, the amount of profiling data (i.e., 4000 given that we estimate the accuracy with fivefold cross-validation) is sufficient for the NB and MLP classifiers to approach convergence, which confirms the amount of collected data is sufficient for those classifiers to provide meaningful outcomes. The RNN shows slightly worse results in our case, which may be due both to the simple (topic) feature that we aim to capture and to a lack of data for such a more data-demanding machine learning algorithm.

Appendix B: additional figures

See the Figs.

4 and Fig.

5.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Descampe, A., Massart, C., Poelman, S. et al. Automated news recommendation in front of adversarial examples and the technical limits of transparency in algorithmic accountability. AI & Soc 37, 67–80 (2022). https://doi.org/10.1007/s00146-021-01159-3

Download citation

Received: 06 October 2020
Accepted: 09 February 2021
Published: 13 March 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00146-021-01159-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated news recommendation in front of adversarial examples and the technical limits of transparency in algorithmic accountability

Abstract

Access this article

Similar content being viewed by others

Enabling Accountability of Algorithmic Media: Transparency as a Constructive and Critical Lens

Algorithms and Media Ethics in the AI Age

News Recommendation and Information Cocoons: The Impact of Algorithms on News Consumption

Notes

References

Acknowledgements