Something Old, Something New, Something Borrowed, Something Blue: Part 3 – An Elephant Never Forgets

Mark Dougherty

doi:10.1515/jisys-2014-0148

Open Access Published by De Gruyter June 3, 2016

Something Old, Something New, Something Borrowed, Something Blue: Part 3 – An Elephant Never Forgets

Mark Dougherty

From the journal Journal of Intelligent Systems

https://doi.org/10.1515/jisys-2014-0148

Abstract

Forgetting is an oft-forgotten art. Many artificial intelligence (AI) systems deliver good performance when first implemented; however, as the contextual environment changes, they become out of date and their performance degrades. Learning new knowledge is part of the solution, but forgetting outdated facts and information is a vital part of the process of renewal. However, forgetting proves to be a surprisingly difficult concept to either understand or implement. Much of AI is based on analogies with natural systems, and although all of us have plenty of experiences with having forgotten something, as yet we have only an incomplete picture of how this process occurs in the brain. A recent judgment by the European Court concerns the “right to be forgotten” by web index services such as Google. This has made debate and research into the concept of forgetting very urgent. Given the rapid growth in requests for pages to be forgotten, it is clear that the process will have to be automated and that intelligent systems of forgetting are required in order to meet this challenge.

Keywords: Adaptive memory; forgetting

I can’t remember to forget you

Shakira and Rihanna

1 Introduction

The literary classic The Moonstone [3] is often lifted up as the first detective novel written in the English language. Not only was it groundbreaking in terms of genre, it also takes up a philosophical issue that is becoming increasingly pressing in the modern information society. The plot of The Moonstone is heavily bound up around the concept of memory, forgetting, and amnesia. Yet one of the most lasting statements in the entire book actually comes in the introduction:

“In this matter of the Diamond,” he said, “the characters of innocent people have suffered under suspicion already – as you know. The memories of innocent people may suffer, hereafter, for want of a record of the facts to which those who come after us can appeal.”

Collins understood the capacity of information in the public domain to both ruin the reputation of someone and, putting the question on its head, to construct a defense. What is, and is not, in the public domain is important.

Collins also unwittingly paints another farsighted glimpse into the future information society:

“When you come to fix your memory with a date in this way, it is wonderful what your memory will pick up for you upon that compulsion.”

What Collins is describing (which actually seems rather an unlikely skill for most humans to possess) is exactly what computers are good at. Data stored in databases can be indexed according to certain keys. Given a particular unique search term, a search engine can provide you with a set of links to relevant information sources.

In the information age, vast quantities of data are stored every day. In many countries, we have the right as individuals to inspect information held on us by official government agencies. There are procedures available to us to demand that incorrect information is updated or deleted. However, a great deal of information and data are not held by government agencies but by private companies and individuals, and much of them are published online.

This had led to a situation where material that can be extremely damaging to an individual is made available to the general public without much chance of redress. A non-exhaustive list of example includes the following:

Revenge porn (i.e. naked or pornographic pictures posted online by former partners, when there was never an intention that such material would be made public);
Information about criminal convictions that circumvent legislation intended to rehabilitate criminals into society;
Information concerning bankruptcies; again, rehabilitation legislation is in danger of becoming powerless;
Negative online reviews about products, services, and so on;
Defamatory or libelous remarks.

Several aspects of the nature of the Internet complicate the situation. It is not easy for a private person to sue for libel or insist on corrections if accusations or false information are posted anonymously on a web server in a far-flung country. The Internet facilitates anonymous communication, is international, and is also distributed. The mythical hydra comes to mind; as soon as one head is cut off, two more grow in its place. Like an elephant sleeping in the jungle, the Internet never forgets.

2 The Science of Forgetting

Forgetting as an aspect of human cognition has been extensively researched by both psychologists and neuroscientists [11]. There are many competing and conflicting theories, and the only really certain statement we can make is that we still have a great deal to research and learn about the subject. A key aspect is that memories are distributed across a massively parallel and networked processor – the mass of neurons in the brain – and this makes it a very hard problem to research. What has been understood, though, is that learning is much easier than trying to explicitly “unlearn” something. For example, sports scientists have long realized that learning the correct technique from the beginning is very important for an athlete [1]. Unlearning an incorrect technique and bad habits is a struggle for any athlete or musician.

Understanding how to re-train neural networks [and other similar artificial intelligence (AI) paradigms] in order to forget certain memories yet retain others presents very significant problems. The cascade-correlation algorithm [5] was an early attempt to solve this enigma; however, scaling up both this approach and others to real-world problem domains has proven elusive and requires considerable computational resources.

In a similar vein, truth maintenance systems for rule-based systems [4] experience similar problems concerning the level of computational resources required. Here, the main problem is how to identify and deal with situations where rules conflict with one another. Similarly, one of the most troublesome aspects of any AI system is keeping it up to date, especially when we extend our ideas to a multi-agent situation. Tuyls et al. [10] gives a good overview of these issues.

The question of computers “forgetting” has also become increasingly important within the field of digital forensics [2]. A forensic image of a hard disk or computer memory often contains a large jumble of “forgotten” data and scraps of files and other information. It is often possible, with painstaking analysis, to reconstruct at least a partial picture of what the computer has recently been used for, but the work is laborious and often requires human expertise. However, a key observation is that keeping a computer system forensically clean, with no traces of past activity, is extremely intricate and difficult. It comes as no surprise to find that the same situation is mirrored with the Internet – erasing all traces of information that has, at some point, been online is an impossibility.

3 The “Right To Be Forgotten” Ruling

This is the background to a recent ruling from the European Union Court of Justice, which has demanded that web search services such as Google and Yahoo have to provide a service that facilitates a “right to be forgotten.” The idea is that although it might be impossible to demand that information is removed from servers outside of our jurisdiction, we can at least make it harder for casual Internet users to search for and find such information. This can be achieved by search providers simply not returning results for certain “blacklisted” search terms. As the search providers are in our jurisdiction, this ought (at least in theory) to be possible. For a search term to be blacklisted, the individual concerned needs to apply to web search servicers and provide a justification as to why the search term should be blacklisted.

There are many problems and issues with this approach:
Some commentators see this law as a direct attack on free speech.
The information is not actually removed and could possibly be found by other means.
There does not seem to be a well-developed legal process.
The sheer number of “right to be forgotten” requests is a considerable burden to the search providers, who are thus forced to bear the costs of policing activities over which they have no control and do not bear any responsibility for.

The free speech argument is that a free Internet is fundamental to journalism and democracy. Journalists complain that investigative journalism will be hampered, and that rich and powerful people and organizations will be able to manipulate the ruling in order to suppress justifiable criticisms and prevent grassroots democracy. For example, the BBC journalist Robert Preston was informed by Google that they would no longer list a link to his blog on the financial crisis of 2007 because of possible embarrassment to financial executives mentioned in the blog [9]. A particularly difficult aspect is that historical research or even war crimes investigations might be hampered in the future, if old archival material about controversial happenings is made inaccessible [6].

The information is not actually removed. The websites serving “blacklisted” information still exist. Trying to “fence in” websites with blacklisted material is a fruitless exercise, as technology such as TOR (https://www.torproject.org/) can circumvent firewalls. All Google does is not list the relevant link if it is searched for. However, the search term currently has to include the name of the person/organization who requested its removal. Thus (to take the example above), a creative search term can often still find the relevant page. This is particularly the case if the person publishing the material publishes an embedded link to the “hidden” page on another page that contains some useful search terms (but NO actual names!). So to take the example mentioned above, a search for “Robert Preston blog” quickly turned up an article by Preston [8] complaining about Google’s decision and which, voila, included a link to the blacklisted blog. Just for interest’s sake, I included the link to the original blog in the bibliography [7], and the reader can thus bypass Google’s blacklisting for this particular case. Furthermore, new searchable websites such as hiddenfromgoogle.com are springing up with the explicitly stated aim of listing all of the links removed by Google. As mentioned earlier, the Internet is like a hydra and cutting off one head does no good. In fact, some commentators argue that asking Google to remove a link is counterproductive, as it shows that you have something to hide and draws attention to something that might otherwise be quietly forgotten.

The process for deciding whether a link should be removed fails several obvious criteria that would guarantee due legal process. The process is run by Google, not an independent body. The process is not conducted in public. The owner of the published material is not informed until the decision has been made and therefore has no chance to argue their case. There seems no well-developed appeals mechanism.

The burden to the search providers is considerable. In fairness to Google, it was the European Court of Justice that made the ruling and forced Google to provide the “right to be forgotten” service. It seems bizarre for a court of law to effectively delegate making large numbers of legal rulings to a private information technology company. The sheer number of requests received also makes it impossible to spend much time deliberating each one. In his article, Preston reported that Google received 50,000 requests in just a few days. As Google is a private, for-profit company, it obviously wants to minimize the resources committed to running the process. In the future, this process will have to be largely automated if it is to function at all, but this raises a whole new range of legal and ethical issues about the validity of making quasi-legal judgments using AI systems and only very limited human input. However, it will certainly be an interesting technical challenge to meet.

4 Analysis

The “right to be forgotten” ruling seems to be very poorly thought out. It is not sufficient for a legal ruling to be “correct” from a legal standpoint. It has to be possible to enforce it from a practical perspective. As in many other situations where the legal profession gets involved in disputes about the digital world, there seems to be a marked lack of understanding as to how the Internet works. Trying to remove information, links, or in any other way censor the Internet, is an extremely challenging and complicated technical task. Even complex strategies can be subverted by clever publishers, creative users, or perhaps a bit of both. As Collins further states in The Moonstone:

“Every human institution (Justice included) will stretch a little, if only you pull it in the right way.”

There is also little understanding about the sheer size of the Internet, which can be simply overwhelming. If 50,000 requests come in just a few days, how many people and how much resources will this take to manage? How easy is the process to automate?

In a sense, these problems should not have come as a surprise. As already discussed, forgetting is one of the aspects of human cognition that is least understood and computer scientists working in AI have yet to solve major problems in this area. Any unlearning (or forgetting) old knowledge or empirically derived relationships is far harder than learning new knowledge. Where does this leave us? What we have in front of us is a whole new game of cat and mouse. Between publishers, who will try to find a myriad of ways of keeping their material available and searchable, with of course the collusion of users and protest organizations who want to keep the Internet free and open. On the other hand, major players like Google, and perhaps in the future Internet service providers, will be struggling to contain a battle that they do not want to fight and do not want to spend resources on. I know which side my money is on!

Corresponding author: Prof. Mark Dougherty, Högskolan Dalarna – Data, Falun, Falun 79188, Sweden

Bibliography

[1] J. Brown, Modern psychologies of sports, J. Health Phys. Educ.8 (1946), 138–191.10.1080/23267240.1937.10619724Search in Google Scholar

[2] E. Casey, Digital Evidence and Computer Crime: Forensic Science, Computers and the Internet, Academic Press, New York, 2011.Search in Google Scholar

[3] W. Collins, The Moonstone. Wordsworth Classics edition, 1992. First published 1868.Search in Google Scholar

[4] J. Doyle, A truth maintenance system, AI12 (1979), 251–272.10.1016/0004-3702(79)90008-0Search in Google Scholar

[5] S. E. Fahlman and C. Liebiere, The cascade-correlation learning architecture, in: Advances in Neural Information Processing Systems, vol. II, pp. 524–532, Morgan Kaufmann, San Mateo, CA, 1990.Search in Google Scholar

[6] D. McGoldrick, Developments in the right to be forgotten, Hum. Rts. L. Rev.13 (2014), 4.10.1093/hrlr/ngt035Search in Google Scholar

[7] R. Preston, Merril’s mess, BBC 2007, http://www.bbc.co.uk/blogs/legacy/thereporters/robertpeston/2007/10/merrills_mess.html (downloaded 25 August, 2014).Search in Google Scholar

[8] R. Preston, Why has Google cast me into oblivion? BBC 2014, http://www.bbc.com/news/business-28130581 (downloaded 25 August, 2014).Search in Google Scholar

[9] P. Scheer, The right to be forgotten is already messing up journalism, Truthdig 2014, http://www.truthdig.com/eartotheground/item/the_right_to_be_forgotten_is_already_messing_up_journalism_20140702 (downloaded 25 August, 2014).Search in Google Scholar

[10] K. Tuyls, P. Jan’t Hoen, K. Verbeeck and S. Sen, Learning and adaption in multi-agent systems, in: Lecture Notes in Computer Science, vol. 3898, Springer Verlag, Berlin, 2006.Search in Google Scholar

[11] J. Wixted, The psychology and neuroscience of forgetting, Annu. Rev. Psychol.55 (2004), 235–269.10.1146/annurev.psych.55.090902.141555Search in Google Scholar PubMed

Received: 2014-10-10

Published Online: 2016-6-3

Published in Print: 2017-7-26

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Something Old, Something New, Something Borrowed, Something Blue: Part 3 – An Elephant Never Forgets

Abstract

1 Introduction

2 The Science of Forgetting

3 The “Right To Be Forgotten” Ruling

4 Analysis

Bibliography

Journal and Issue

Articles in the same Issue