Skip to main content

Advertisement

Log in

Automated news recommendation in front of adversarial examples and the technical limits of transparency in algorithmic accountability

  • Original Article
  • Published:
AI & SOCIETY Aims and scope Submit manuscript

Abstract

Algorithmic decision making is used in an increasing number of fields. Letting automated processes take decisions raises the question of their accountability. In the field of computational journalism, the algorithmic accountability framework proposed by Diakopoulos formalizes this challenge by considering algorithms as objects of human creation, with the goal of revealing the intent embedded into their implementation. A consequence of this definition is that ensuring accountability essentially boils down to a transparency question: given the appropriate reverse-engineering tools, it should be feasible to extract design criteria and to identify intentional biases. General limitations of this transparency ideal have been discussed by Ananny and Crawford (New Media Soc 20(3):973–989, 2018). We further focus on its technical limitations. We show that even if reverse-engineering concludes that the criteria embedded into an algorithm correspond to its publicized intent, it may be that adversarial behaviors make the algorithm deviate from its expected operation. We illustrate this issue with an automated news recommendation system, and show how the classification algorithms used in such systems can be fooled with hard-to-notice modifications of the articles to classify. We therefore suggest that robustness against adversarial behaviors should be taken into account in the definition of algorithmic accountability, to better capture the risks inherent to algorithmic decision making. We finally discuss the various challenges that this new technical limitation raises for journalism practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://www.vie-publique.fr/sites/default/files/rapport/pdf/144000541.pdf.

  2. As discussed by Katz and Lindell, “a common mistake is to think that definitions are not needed or are trivial to come with, because everyone has an intuitive idea of what” (for example) “security means”. It turns out specifying what (for example) security means has been an iterative process where definitions were introduced, invalidated thanks to counter-examples and refined. As our following discussions will show, a similar situation holds with algorithmic accountability.

  3. Alternatively, we selected this dictionary based on the “Term Frequency—Inverse Document Frequency” (TFIDF) in order to detect salient words for articles with different tags (Ramos 2003). Both options gave similar results. Our results are for the most frequent words method.

  4. https://www.mcafee.com/blogs/other-blogs/mcafee-labs/model-hacking-adas-to-pave-safer-roads-for-autonomous-vehicles/

  5. Since our study is based on a French-speaking newspaper, the article is translated.

  6. Yet, as will be mentioned next, transparency may facilitate the generation and exploitation of adversarial examples, and therefore make the robustness requirement harder to reach.

References

  • Ananny M, Crawford K (2018) Seeing without knowing: limitations of the transparency ideal and its application to algorithmic accountability. New Media Soc 20(3):973–989

    Google Scholar 

  • Anderson C (2012) Towards a sociology of computational and algorithmic journalism. New Media Soc 15(7):1005–1021

    Google Scholar 

  • Araujo T, Helberger N, Kruikemeier S, de Vreese CH (2019) In AI we trust? Perceptions about automated decision-making by artificial intelligence. AI Soc 35:611–623

    Google Scholar 

  • Arnt A, Zilberstein S (2003) Learning to perform moderation in online forums. Web Intell 2003:637–641

    Google Scholar 

  • Barocas S, Hardt M, Narayanan A (2019). Fairness in machine learning, https://fairmlbook.org/

  • Beck U (1992) Risk society: towards a new modernity. Sage, London

    Google Scholar 

  • Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines. Proc ICML 2012:1467–1474

    Google Scholar 

  • Bishop CM (2007) Pattern recognition and machine learning, 5th edn. Springer, Berlin

    MATH  Google Scholar 

  • Bodó B (2019) Selling news to audiences—a qualitative inquiry into the emerging logics of algorithmic news personalization in European quality news media. Digit Journal 7(8):1054–1075

    Google Scholar 

  • Broussard M (2018) Artificial unintelligence: how computers misunderstand the world. MIT Press, Cambridge

    Google Scholar 

  • Broussard M, Diakopoulos N, Guzman AL, Abebe R, Dupagne M, Chuan C-H (2019) Artificial intelligence and journalism. Journal Mass Commun Q 96(3):673–695

    Google Scholar 

  • Bucher T (2018) If then: algorithmic power and politics. Oxford University Press, Oxford

    Google Scholar 

  • Burrel J (2016) How the machine ‘thinks’: understanding opacity in machine learning algorithms. Big Data Soc. https://doi.org/10.1177/2053951715622512

    Article  Google Scholar 

  • Carlson M, Lewis SC (2015) Boundaries of journalism: professionalism, practices, and participation. Routledge, UK

    Google Scholar 

  • Coddington M (2015) Clarifying journalism’s quantitative turn. Digit Journal 3(3):331–348

    Google Scholar 

  • Crain M (2018) The limits of transparency: data brokers and commodification. New Media Soc 20(1):88–104

    Google Scholar 

  • Crawford K, Schultz J (2014) Big data and due process: toward a framework to redress predictive privacy harms. Boston College Law Rev 55:93

    Google Scholar 

  • Dagiral E, Parasie S (2016) La « Science des Données » à la Conquête des Mondes Sociaux. Ce que le « Big Data » doit aux Épistémologies Locales, In Big Data et Traçabilité Numérique. Les Sciences Sociales Face à la Quantification Massive des Individus, 85–104, Collège de France, Paris.

  • Datta A, Tschantz MC, Datta A (2015) Automated experiments on ad privacy settings: a tale of opacity, choice, and discrimination. PoPETs 1:92–112

    Google Scholar 

  • Diakopoulos N (2015) Algorithmic accountability: journalistic investigation of computational power structures. Digit Journal 3(3):398–415

    Google Scholar 

  • Diakopoulos N (2019a) Automating the news: how algorithms are rewriting the media. Harvard University Press, Cambridge

    Google Scholar 

  • Diakopoulos N (2019b) Towards a design orientation on algorithms and automation in news production. Digit Journal 3(3):1180–1184

    Google Scholar 

  • Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D (2018) Robust physical-world attacks on deep learning visual classification. Proc CVPR 2018:1625–1634

    Google Scholar 

  • Giddens A (1992) Risk and responsibility. Mod L Review 62(1):1–10

    Google Scholar 

  • Goodfellow IJ, McDaniel PD, Papernot N (2018) Making machine learning robust against adversarial inputs. Commun ACM 61(7):56–66

    Google Scholar 

  • Hassan N, Arslan F, Li C, Tremayne M (2017) Toward automated fact-checking: detecting check-worthy factual claims by ClaimBuster. Proc KDD 2017:1803–1812

    Google Scholar 

  • Helberger N (2019) On the Democratic Role of News Recommenders. Digital J 7(8):993–1012. https://doi.org/10.1080/21670811.2019.1623700

  • Karlsson M (2010) Rituals of transparency: evaluating online news outlets’ uses of transparency rituals in the United States, United Kingdom and Sweden. Journal Studies 11:535–545

    Google Scholar 

  • Karpinnen KE (2018) Journalism, Pluralism and diversity. Journalism, De Gruyter

  • Katz J, Lindell Y (2015) Introduction to modern cryptography, Second Edition. CRC Press 2014

  • Kormelink TG, Meijer IC (2014) Tailor-made news: meeting the demands of news users on mobile and social media. Journal Studies 15(5):632–641

    Google Scholar 

  • Kunert J, Thurman N (2019) The Form of Content Personalisation at Mainstream Transatlantic News outletsm. Journal Pract 13(7):759–780

    Google Scholar 

  • Lai S, Liheng X, Liue K, Zhao J (2015) Recurrent convolutional neural networks for text classification. Proc IAAA 2015:2267–2273

    Google Scholar 

  • Lepri B, Oliver N, Letouzé E, Pentland A, Vinck P (2018) Fair, transparent, and accountable algorithmic decision-making processes. Philos Technol 31(4):611–627

    Google Scholar 

  • Lewis SC, Usher N (2013) Open source and journalism: toward new frameworks for imagining news innovation. Media Cult Soc 35(5):602–619

    Google Scholar 

  • Lewis SC, Westlund O (2016) Mapping the human–machine divide in journalism, sage handbook of digital journalism. Sage, California

    Google Scholar 

  • Lewis SC, Guzman AL, Schmidt TR (2019) Automation, journalism, and human–machine communication: rethinking roles and relationships of humans and machines in news digital journalism. Sage, California

    Google Scholar 

  • López-Cózar ED, Robinson-García N, Torres-Salinas D (2014) The google scholar experiment: how to index false papers and manipulate bibliometric indicators. JASIST 65(3):446–454

    Google Scholar 

  • McGregor L, Murray D, Vivian NG (2019) International human rights law as a framework for algorithmic accountability. ICLQ 68(2):309–343

    Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space https://arxiv.org/abs/1301.3781

  • Milano S, Taddeo M, Floridi L (2020) Recommender systems and their ethical challenges. AI Soc 35:957–967

    Google Scholar 

  • Milosavljević M, Vobič I (2019) Human still in the loop: editors reconsider the ideals of professional journalism through automation. Digit Journal 7(8):1098–1116

    Google Scholar 

  • Mittelstadt B (2016) Automation, algorithms, and politics: auditing for transparency in content personalization systems. Int J Commun 10:12

    Google Scholar 

  • Nielsen R (2016) The many crises of western journalism. A comparative analysis of economic crises, professional crises, and crises of confidence. The crisis of journalism reconsidered. Cambridge University Press, Cambridge

    Google Scholar 

  • Perra N, Rocha LEC (2019) Modelling opinion dynamics in the age of algorithmic personalisation. Nat Sci Rep 9(1):7261

    Google Scholar 

  • Ramos J (2003) Using TF-IDF to determine word relevance in document queries, Proceedings of the First Instructional Conference on Machine Learning, pp 133–142

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    MATH  Google Scholar 

  • Sandvig C, Hamilton K, Karahalios K, Langbort C (2014). Auditing algorithms: research methods for detecting discrimination on internet platforms, data and discrimination: converting critical concerns into productive inquiry, 22

  • Steinhardt J, Pang W, KohLiang WSP (2017) Certified defenses for data poisoning attacks. Proc NIPS 2017:3517–3529

    Google Scholar 

  • Thurman N, Moeller J, Helberger N, Trilling D (2018) My friends, editors, algorithms, and i: examining audience attitudes to news selection, digital. Journalism 7(4):447–469

    Google Scholar 

  • Thurman N, Lewis SC, Kunert J (2019) Algorithms, automation, and news, digital. Journalism 7(8):980–992

    Google Scholar 

  • Tramèr F, Papernot N, Goodfellow I, Boneh D, McDaniel P (2017). The space of transferable adversarial examples https://arxiv.org/abs/1704.03453

  • Tramèr F, Kurakin A, Papernot N, Goodfellow IJ, Boneh D, McDaniel PD (2018) Ensemble adversarial training: attacks and defenses ICLR, 20p

  • Ward S (2015) Radical media ethics: a global perspective. Wiley, New Jersey

    Google Scholar 

  • Ward S (2018) Epistemologies of journalism. Journalism 19:63–82

    Google Scholar 

  • Wing J (2008) Computational thinking and thinking about computing, philosophical transactions. Series A. Math Phys Eng Sci 366:3717–3725

    MATH  Google Scholar 

Download references

Acknowledgements

François-Xavier Standaert is a senior associate researcher of the Belgium Fund for Scientific Research. This work has been funded in parts by the European Union though the ERC Consolidator Grant SWORD (724725).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Antonin Descampe or François-Xavier Standaert.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: details on the classifiers

The multinomial NB classifier was used directly on the histograms made of 20,000 words provided by the bag of words NLP. The only technical tweak was the use of a smoothing factor of 0.09 to deal with words having estimated probability zero.

For the MLP classifier, we selected the best parameters thanks to a grid search set to optimize the classifier’s accuracy, with one to four layers and number of neurons per layer ranging from 10 to 1,000). Several solutions led to similar results and we eventually selected a classifier with 3 layers: the first one uses 141 neurons (corresponding to the square root of the 20,000 words output by the bag of words NLP) with a relu activation function, the last one last one uses 7 neurons (i.e., our number of classes) with a tanh activation layer, and the hidden layer uses 31 neurons (which corresponds to the geometric mean between 141 and 7, i.e., \(\sqrt{141\times 7}\)) and a relu activation layer. Using more layers did not lead to significant concrete improvements in our case study.

Finally, we used different number and types of layers for the RNN: bidirectional, convolutional (combined with pooling), dense, LSTM (Long Short-Term Memory) with dropout. The one that provided the best results in our context used the next parameters:

Layer (type)

Output shape

# of params

Embedding_2

(None,100,200)

4,000,200

Bidirectional_2

(None,100,200)

180,600

Conv1d_2

(None,98,5000)

3,005,000

Global_max_pooling_1d_2

(None,5000)

0

Dense_3

(None,100)

500,100

Dropout_2

(None,100)

0

Dense_4

(None,7)

707

  1. Total params: 7,686,607
  2. Trainable params: 3,686,407
  3. Non-trainable params: 4,000,200

The implementations of all the machine learning tools that we used are based on the scikit-learn library available at the address: https://scikit-learn.org/stable/. As for the NLP part of the tool, we used the SnowballStemmer library for the words stemming (https://kite.com/python/docs/nltk.SnowballStemmer), in which the stopwords removal can be activated as an option, and we used the Word2Vec models made available by Jean-Philippe Fauconnier for the words embedding (http://fauconnier.github.io/). We report the different learning curves of the three combinations of NLP and ML tools in the Fig. 4 As can be observed, the amount of profiling data (i.e., 4000 given that we estimate the accuracy with fivefold cross-validation) is sufficient for the NB and MLP classifiers to approach convergence, which confirms the amount of collected data is sufficient for those classifiers to provide meaningful outcomes. The RNN shows slightly worse results in our case, which may be due both to the simple (topic) feature that we aim to capture and to a lack of data for such a more data-demanding machine learning algorithm.

Appendix B: additional figures

See the Figs.

Fig. 4
figure 4

Learning curves of the experimented NLP + ML tools

4 and Fig.

Fig. 5
figure 5

MLP classifier: minimum article perturbation in function of the class distance

5.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Descampe, A., Massart, C., Poelman, S. et al. Automated news recommendation in front of adversarial examples and the technical limits of transparency in algorithmic accountability. AI & Soc 37, 67–80 (2022). https://doi.org/10.1007/s00146-021-01159-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00146-021-01159-3

Keywords

Navigation