Keywords

1 Introduction

Until now, humans have had to do all of the thinking however, we are at the beginning of a new era in human history—the cognitive augmentation era. Cognitive systems are being developed in every domain in which a human can be an expert. We are about to be inundated with a host of intelligent applications, devices, products, and services fueled by a confluence of deep learning, big data, the Internet of things, and natural language interfaces. These systems will communicate with us in spoken natural language but also know things about us from our emails, tweets, and daily activities. In much the same way we work with our family members, loved ones, and co-workers, humans and cognitive systems will work together in natural, collegial, peer-to-peer partnerships. The age of the augmented human is upon us.

In 2011, a cognitive computing system called Watson, built by IBM, defeated two of the most successful human Jeopardy champions of all time [1]. Watson communicated in natural language and deeply reasoned about its answers using several different techniques from artificial intelligence research. In 2016, GoogleMind’s AlphaGo computer defeated the reigning world champion in Go using a deep neural network and advanced Monte Carlo tree search [2]. Although not the first time computers have beaten human champions (checkers, chess, and various card games for example), Watson and AlphaGo are different. Watson and AlphaGo learned how to play their respective games using a variety of deep learning techniques [3, 4]. Watson and AlphaGo learned and practiced to ultimately achieve expert-level performance within their respective domains.

These systems were not built just to play games. Watson and AlphaGo represent a new kind of computer system built as a platform for a new kind of application [5, 6]. This new type of system is intended to act as partners with and alongside humans. John Kelly, Senior Vice President and Director of Research at IBM describes the coming revolution in cognitive augmentation as follows [6]:

The goal isn’t to… replace human thinking with machine thinking. Rather…humans and machines will collaborate to produce better resultseach bringing their own superior skills to the partnership. The machines will be more rational and analyticand, of course, possess encyclopedic memories and tremendous computational abilities. People will provide judgment, intuition, empathy, a moral compass and human creativity.

Since 2011, IBM has been actively commercializing Watson technology to serve (and in many ways create) the emerging multi-billion dollar cognitive computing market. The Cognitive Business Solutions group consults with companies to create cogs. The Watson Health group’s focus is to commercialize Watson technology for the health sector [7,8,9,10]. In her keynote address at the 2016 Consumer Electronics Show, Chairwoman, President, and CEO of IBM Ginni Rometty announced more than 500 partnerships with companies and organizations across 17 industries each building new applications and services utilizing cognitive computing technology based on Watson [11, 12]. Many of these systems currently under development are intended for use by the average person.

IBM is not alone. Most major technology companies are actively researching and developing new artificial intelligence-based products and services. Voice-activated personal assistants will be one of the first battlegrounds. Apple’s Siri, Microsoft’s Cortana, Google Now, Facebook’s M, and Amazon Echo’s Alexa each accept natural-language requests from users, reply in natural language, and perform services on behalf of the user [13,14,15,16,17]. But currently, these tools simply retrieve information, and perform minor clerical tasks such as creating appointment calendar items. Each of these are steadily increasing in the complexity and variety of tasks they can perform. The voice-controlled assistant represents the primary user interface connecting hundreds of millions to their technology, so the major technology companies are understandably competing for control in this area.

As cogs become able to perform higher-order cognitive processing, human-cog partnerships of the future will go far beyond what is possible today. Cogs will be able to consume vast quantities of unstructured data and information and deeply reason to arrive at novel conclusions and revelations, as well as, or better than, any human expert. Cogs will then become colleagues, co-workers, and confidants instead of tools. Because cogs will interact with us in natural language and be able to converse with us at human levels, humans will form relationships with cogs much like we do with friends, fellow workers, and family members. These systems may very well lead to the democratization of expertise [49].

In the early 1960s, Engelbart was among the first to describe human-computer partnerships and famously developed a framework modeling a human being as being part of an integrated system consisting of: the human, language (concepts, symbols, representations), artifacts (physical objects), methodologies (procedures, know-how), and training or H-LAM/T [18]. As shown in Fig. 1, Engelbart’s framework envisions a human interacting with an artificial entity (or entities) working together on a cognitive task. To perform the task, the system as a whole executes a series of cognitive processes with the human performing (called explicit-human processes) and the artificial entities performing some (called explicit-artifact processes). Other processes are performed by a combination of human and machine (called composite processes). In Engelbart’s framework, the cognitive ability of the human can be augmented by making improvements to any component (human, artifact, language, methodologies, or training). When the paper was written artifacts were seen as making it easier for the human to perform the cognitive task. Engelbart’s artifacts perform a relatively small amount of cognitive work. Instead, they make the human able to perform their cognitive work more efficiently. However, in the cog era, cogs will be artifacts capable of performing cognitive work of their own at a level rivaling or surpassing humans. This represents a fundamental shift in what we mean when we say “cognitive augmentation.”

Fig. 1.
figure 1

Englebart’s H-LAM/T Framework.

The entire discussion of augmented cognition becomes a matter of how cognitive processes are distributed across the human and cog infrastructure. However, we currently lack the metrics and general theory to describe cognitive processes or the distribution. Currently, we are not able to measure how much the human is augmented. To develop such metrics, we must first define the very nature of a cognitive process. At the very core of this effort are basic questions like: “What is information?” “What is cognition?” and “What does it mean to be cognitively augmented?”.

2 Literature

This work lies at the intersection of three different fields: knowledge management, information theory, and cognitive computing.

2.1 Knowledge Management and Information Science

From the knowledge management and information science fields, we adopt a widely accepted view of data, information, knowledge, and wisdom (DIKW) [19]. The DIKW hierarchy represents information as processed data, knowledge as processed information, and wisdom as processed knowledge. Each level is of a higher value than the level below it because of the processing and therefore represents a higher level of understanding. However, lacking are metrics allowing us to measure data, information, knowledge, and wisdom and measure the effect of processing as we climb the levels. The metrics we present here can be used in this manner.

2.2 Physics and Thermodynamics

There is a long history of entropic measures of information. In physics, particularly statistical mechanics and thermodynamics, such metrics are associated with the concept of order and disorder. In the mid-1800s, Clausius coined the term entropy to describe the amount of heat energy dissipated across the system boundary ultimately leading to the Second Law of Thermodynamics. In the late 1800s, Boltzmann related thermodynamic entropy of a system, S, to the number of equally likely arrangements (states) W, where k is Boltzman’s constant [20].

$$ S = k\,\ln \,W. $$
(1)

Boltzmann’s entropy is a measure of the amount of disorder in a system. In 1929, Szilard was one of the first to examine the connection between thermodynamic entropy and information by analyzing the decrease in entropy in a thought experiment called Maxwell’s Demon [21]. Szilard reasoned the reduction of entropy is compensated by a gain of information:

$$ S = k\sum\limits_{i} {p_{i} \,\ln \,p_{i} } , $$
(2)

where p i is the probability of the ith event, outcome, or state in the system. In 1944, Schrodinger wondered how biological systems (highly ordered systems) can become so structured, in apparent violation of the Second Law of Thermodynamics and realized the organism increases its order by decreasing the order of the environment [22]:

$$ {\text{S}} = k\,\ln \,D $$
(3a)
$$ - {\text{S}} = k\,\ln (1 /D) $$
(3b)

Schrodinger calls −S, the negative entropy (or negentropy), a measure of order in a system. Brillouin refined the idea and described living systems as importing and storing negentropy [23]. The ideas of Schrodinger, Szilard, and Brillouin involve a flow of information from one entity to another and use entropy to measure the flow.

2.3 Information Theory

In the field of information theory, Hartley was one of the first to define the information content, H, of a message of N symbols chosen from an alphabet of S symbols as [24]

$$ H = { \log }\;S^{N} = NlogS $$
(4)

Since S N messages are possible, one can view the number of messages as the number of possible arrangements or states and therefore see the connection to thermodynamic entropy equations. Hartley’s equation represents a measure of disorder in probability distribution across the possible messages, Hartley simply defines this measure of disorder to be equivalent to the information content of a message. In 1948, Shannon developed the basis for what has become known as information theory [25,26,27]. Shannon’s equation for entropy, H, is

$$ H = - K\sum\limits_{i = 1}^{v} {p(i)\log_{2} p(i)} , $$
(5)

where p(i) is the probability of the ith symbol in a set of v symbols and K, is an arbitrary constant enabling the equation to yield any desired units. Shannon, as did Hartley, equates order/disorder and information content. Shannon’s equation gives the average information content per symbol. The information content, I, of a message consisting of m symbols is

$$ \text{I = }\,{\text{mH }} = - {\text{mK}}\sum\limits_{i = 1}^{m} {p(i)\log_{2} p(i)} $$
(6)

Shannon’s information theory is the most widely used measure of information but yields anomalous results for our purposes. In the 1960’s, Chaitin and others developed the concept of algorithmic information content as a measure of information [28,29,30,31]. The algorithmic information content, I, of a string of symbols, w, is defined as the size of the minimal program, s, running on the universal Turing machine that generates the string

$$ {\text{I}}(w) = \left| {\text{s}} \right|, $$
(7)

where the vertical bars indicate the length, or size, of the program s. This definition of information concerns the compressibility of a string of symbols. A string with regular patterns can be “compressed” to a much shorter representation whereas the shortest description of a string of random symbols is a verbatim listing, the string itself. This description also equates order/disorder to information content, although in a different manner. In 1990, Stonier suggested an exponential relationship between entropy, S, and information, I [32,33,34]:

$$ \begin{aligned} & I = I_{0} \, e^{ - S/K} \\ & {\text{where }}S = - k\sum\limits_{i} {p_{i} \ln p_{i} } \\ \end{aligned} $$
(8)

where k is Boltzmann’s constant, and I 0 is the amount of information in the system at zero entropy. The measure of entropy, S, is the thermodynamic form of Shannon’s equation which is mathematically equivalent to Szilard’s and Boltzmann’s earlier general entropic equations. Stonier’s equation is interesting because it is similar mathematically to the one we will use, as described below, but differs theoretically.

We employ a relatively new form of information theory, called representational information theory (RIT) [35, 36]. Unlike entropy-based traditional information theory metrics, such as those discussed above, the fundamental unit in RIT is the concept. The value of a piece of information in RIT is relative to the concept to which the information refers and how well the information represents the concept. This brings meaning and understanding to information calculus for the first time. We will discuss RIT in detail in a later section.

2.4 Cognition, Cognitive Systems, Artificial Intelligence

There is no widely accepted general theory of cognition although the topic has been studied for decades by researchers in several different disciplines. At the outset of this research, it seemed reasonable to look to the cognitive science field for metrics of information value, content, and measures of cognition. However, all findings in this area are unsatisfactory because any metrics developed apply only to a specific implementation, the cognitive architecture, for which it was developed. A number of cognitive architectures and models have been developed. Computational cognitive architectures such as Soar [37,38,39], ACT and ACT-R [40], CLARION [41, 42], and EPIC [43] are ultimately based on the idea of reducing human cognition to symbol manipulation—Newell and Simon’s famous physical symbol system hypothesis [44]. In these architectures, reasoning is achieved by the matching, selection, and execution of if-then statements called production rules representing procedural, declarative, and episodic knowledge. However, we seek metrics applicable to all forms of inference and cognitive processing fully recognizing not all cognitive systems are production systems.

Connectionist models and architectures are based on mental or behavioral phenomena being the emergent result of interconnected networks of simple units such as artificial neural networks (ANNs) and include: Holographic Associative Memory [45], Hierarchical Temporal Memory [46], Society of Mind [47], and more recently, Google DeepMind [48]. However, we find these unsatisfactory for our purposes because any metrics from this field is too heavily invested in the neural network architecture and do not apply generically. We seek a more general description of cognition and one independent of implementation details.

Vigo’s RIT, described in detail in the next section, is such an effort to define the general principles of human conceptual behavior. At the core is the concept and the effort associated with learning the concept given one or more representations of the concept. We feel this is a superior model of cognition for our purposes because we can use it to define a cognitive process, define the value of information, and, when combined with the DIKW view of information, construct cognitive performance metrics applicable to any biological or artificial system performing cognitive processing.

3 Representational Information Theory

There are two problems with using entropic information measures for the cognitive augmentation metrics we seek here. The first problem is entropic measures do not take into account meaning. Intuitively, we known information has different value based on the context of the production, consumption, and processing of the information. Therefore, a measure of information must be relative to context in some manner. In this section, we describe how representational information theory measures information relative to a concept. The second problem is entropic measures ascribe the highest amount of information content to totally random ensembles. This contradicts intuition. Imagine a completely random collection of letters compared to a novel. Regardless of who reads it, the random letters will never convey a large amount of information. However, even though not random, the novel conveys a tremendous amount of information to a reader. It is true, the amount of information actually conveyed varies depending on the reader’s own ability, knowledge, and comprehension. However, this just further illustrates the need for a relative measure of information.

We employ a relatively new form of information theory, called representational information theory (RIT) [35, 36]. Unlike entropic-based information theory treatments the fundamental component in RIT is the concept. The value of a piece of information in RIT is always relative to the concept to which the information refers. Incorporating the concept into the measure of information is different than any other version of information theory. This brings meaning and understanding to information calculus for the first time.

In RIT, a concept is a mental construct in the mind be it biological or artificial. One should think of concepts as existing in abstract concept space. The only way to experience a concept in the real world is via some kind of representation. A representation can be a description, model, or a collection of example objects belonging to or describing the concept. Thus, one should think of a representation as an instance of a concept. Some representations convey the concept better than other representations. The more ambiguous the representation is, the more difficult (or complex) it is to discern the concept it refers to. RIT measures the strength of a representation by a metric called structural complexity where the complexity refers to how difficult it is to discern the concept from the representation. Representations with a low level of structural complexity convey the concept easier. Representations with a high level of structural complexity convey the concept with more difficulty. The goal, of course, are representations with minimal structural complexity.

In RIT, structural complexity is proportional to the size of the representation and inversely proportional to a quality of the representation called invariance. Invariance relates how robust a representation is in the face of change. For example, consider the concept sports car and a representation of this concept consisting of instances of red, white, blue, and yellow sports cars. If we change the color of one of the cars, say change the blue one to an orange sports car, one could still easily discern the concept. However, if we change one of the cars to a truck, ambiguity and uncertainty is introduced. More interpretations of the representation are possible thus discerning the concept sports car is more complex. This representation is invariant with respect to color but not to type of car with respect to the sports car concept. Stated alternatively, the representation is variant with respect to type but not variant with respect to color. In general, stable, robust, highly invariant representations convey concepts easier (with less complexity in understanding). In RIT, the structural complexity of a representation is given by

$$ {\Psi }(\overbrace {F}^{{}}) = \frac{p}{{f({\Phi }(F))}} $$
(9)

where F is the representation of a concept, p is the size of the representation, ϕ is the invariance of the representation, and f is a monotonic function. Of course, this begs the question of the nature of f. Researchers in the field of human cognitive behavior have shown empirically discerning the concept becomes exponentially more difficult as the ambiguity of a representation increases [35, 36]. This allows us to assign the exponential function to f above yielding

$$ { \uppsi} (\overbrace {F}^{{}}) = pe^{{ - {\Phi} (F)}} = pe^{{ - \left[ {\sum\nolimits_{i = 1}^{\text{D}} {\left[ {\left\| {\frac{{\hat{\partial }F(x_{1} , \cdots ,x_{D} )}}{{\hat{\partial }x_{i} }}} \right\|} \right]}^{2} } \right]^{1/2} }} $$
(10)

where the partial differential represents the sensitivity of each item in the representation to change. The similarity of Eq. 10 to Stonier’s equation, Eq. 8, is striking and reassuring since Stonier deduced the exponential relationship in an entirely different way than RIT. To illustrate the calculation of structural complexity consider the following representation, S:

This representation can represent a number of different concepts but here we consider just two concepts: animal and pet. We have assigned a “fitness” value between 0% and 100% indicating the robustness of each of the items in representing the concepts animal and pet. (These fitness values are just for illustration only and are not the result of any rigorous study.) The fitness values capture the fact that while all are unambiguously animals, birds and cows are not usually viewed as pets but birds are considered pets more so than cows. Substituting these values into Eq. 10 gives us

$$ \psi (s) = \left| s \right|e^{{ - \sqrt {v_{1}^{2} + v_{2}^{2} + \ldots + v_{n}^{2} } }} $$
(11)

where p = |S| = 4 and v i is the fitness value of the i th item in the presentation. We can now calculate the structural complexity of this representation relative to each concept:

$$ \begin{array}{*{20}l} {\psi (s\,|\,animal) = 4e^{{ - \sqrt {1^{2} + 1^{2} + 1^{2} + 1^{2} } }} } \hfill & {\psi (s\,|\,pet) = 4e^{{ - \sqrt {1^{2} + .5^{2} + 1^{2} + .8^{2} } }} } \hfill \\ {\psi (s\,|\,animal) = 4e^{ - 2} } \hfill & {\psi (s\,|\,pet) = 4e^{ - 1.7} } \hfill \\ {\psi (s\,|\,animal) = 0.5413} \hfill & {\psi (s\,|\,pet) = 0.7307} \hfill \\ \end{array} $$

These calculations indicate it is easier to learn the concept animal than it is to learn the concept pet given this particular representation. As another illustration of structural complexity, consider the case in which we change one of the animals in the above S:

Now, the representation contains three excellent examples of the concept animal but one example not representing the concept at all. The structural complexity of this representation with respect to the concept animal is:

$$ \begin{array}{*{20}l} {\psi (s\,|\,animal) = 4e^{{ - \sqrt {1^{2} + 1^{2} + 1^{2} + 0^{2} } }} } \hfill \\ {\psi (s\,|\,animal) = 4e^{ - 1.732} } \hfill \\ {\psi (s\,|\,animal) = 0.7077} \hfill \\ \end{array} $$

indicating, as one might expect, that it is more difficult to discern the concept of animal from this new representation because of the presence of the airplane. Structural complexity gives us a new way to calculate the value of information and a new way to calculate information content with dependence on the concept in question. For example, with respect to the concept animal, the first representation is more valuable by virtue of its lower structural complexity than the second representation. We next show how this is used to define cognitive processes and cognition.

4 Definition of a Cognitive Process

We adopt the view of data, information, knowledge, and wisdom known as the DIKW [19]. The DIKW hierarchy represents information as processed data, knowledge as processed information, and wisdom as processed knowledge. Each level is of a higher value than the lower level because of the processing. Data is considered to be of the lowest value and the closest to the physical world (and therefore the least abstract of the levels). Data is generated when physical phenomena are sensed. For example, the measure of the surge of electrical current from an optical sensor on a piece of rotating machinery is data. If multiple surges are sensed and counted over a measured period of time, then revolutions per minute (RPM) can be determined. RPM is information not able to be sensed directly from the environment. It must be synthesized as a result of some amount of processing in which two pieces of data, electrical current surges and time, are compared. This is an illustration of how data is transformed into information by the effort of the processing involved. Likewise, information can be processed and transformed into knowledge. In our example, if we combine the information RPM with other information, say humidity and temperature, dependencies are revealed explaining the behavior of the RPM. Ultimately, knowledge is transformed into wisdom.

We maintain transformation of data, information, knowledge, and wisdom is the essential aspect of a cognitive action we call a cognitive process. Instead of repeating the four words over and over, we refer generically to any data, information, knowledge, and wisdom as information stock. We can then visualize a cognitive process as the transformation of information stock as shown in Fig. 2 where S in is the information stock in its original form and S out is the information stock after the transformation.

Fig. 2.
figure 2

A cognitive process as a transformation of information stock.

The structural complexity of the information stock prior to and after execution of the cognitive process, \( \psi_{in} \) and \( \psi_{out} \) respectively is calculated using Eq. 11. The difference in structural complexity represents a certain amount of cognitive gain denoted by G and explained in more detail in the next section. Cognitive gain is achieved by the expenditure of a certain amount of cognitive work denoted by W and explained in more detail in the next section. This affords us a precise definition of a cognitive process:

Definition 1.

A cognitive process is defined as the transformation of information stock from an input form to an output form achieving a certain amount of cognitive gain requiring the expenditure of a quantity of cognitive work.

5 Cognitive Gain

If we use Eq. 11 from representational information theory (RIT) to calculate the structural complexity of the information stock before and after the cognitive process is executed we can then calculate the amount of change effected by the cognitive process’s transformation, \( \psi_{out} - \psi_{in} \). RIT represents the change in structural complexity as a percentage using

$$ G = \frac{{\psi_{out} - \psi_{in} }}{{\psi_{in} }} $$
(12)

and calls this representational information. We call this the cognitive gain of a cognitive process. Note that a cognitive process can either increase or decrease the structural complexity of the information stock it transforms (or leave it the same), so G can be negative, positive, or zero. Recalling structural complexity measures the ambiguity of a representation relative to a concept, a negative cognitive gain represents a transformation resulting in a more robust representation of a concept while a positive cognitive gain represents a transformation resulting in a less robust representation. Stated another way, if a cognitive process achieves a reduction in structural complexity it moves the information stock closer to the concept at hand. If a cognitive process achieves an increase in structural complexity, it has moved the information stock further away from the concept at hand. Using RIT, we are now able to express and calculate cognitive processing in terms relative to a specific concept.

6 Cognitive Work

Cognitive gain is certainly a useful measure of a cognitive process but is not sufficient. It is possible for a cognitive process to produce, after some amount of transformation, an output with exactly the same structural complexity as the input resulting in zero cognitive gain. While it is true zero cognitive gain was achieved in this case it is also true that something happened and this escapes measurement by the cognitive gain formula alone. For this reason, we developed the concept of cognitive work.

Cognitive work is a measure of all transformation of information stock regardless of the cognitive gain achieved. If we look at only the input and output, we miss everything that might have went on inside and during the cognitive process. Any information stock transformation achieved during the execution of the cognitive process but not represented in the output is not visible to the outside world. We call such internal transformations lost as is represented in Fig. 3.

Fig. 3.
figure 3

A cognitive process with full accounting of information stock transformations.

Cognitive work, then, is an accounting of all information stock transformations achieved by a cognitive process as given by

$$ W = \left| {\Psi \left( {S_{out} } \right) -\Psi \left( {S_{in} } \right)} \right| +\Psi _{lost} $$
(13)

We use the absolute value form of the cognitive gain because we are concerned only with positive values since a negative amount of cognitive work is meaningless. Cognitive work is a measure of the total effort expended in the execution of a cognitive process. It is important to note it requires both cognitive gain and cognitive work to characterize a cognitive process. A cognitive process could perform an enormous amount of transformation and yet achieve very little, if any, real cognitive gain. In such a case, cognitive gain would be near zero but cognitive work would be large. A cognitive process could also achieve an enormous amount of cognitive gain yet do so with very little transformation. In this case, cognitive work would be small but cognitive gain large. Cognitive gain provides a magnitude and direction of the result while cognitive work provides the amount of effort.

With these metrics, we can compare and analyze all cognitive processes regardless of how the entity performs the cognitive process. The most efficient cognitive processes are those which achieve a large cognitive gain while expending little cognitive work. It is important to not forget the connection of cognitive gain and cognitive work to the concept. Since structural complexity is relative to the concept described by the information stock, cognitive gain and cognitive work is also relative to the concept at hand. A cognitive process achieves different amounts of cognitive gain and cognitive work depending on the concept being considered. As an example, consider a cognitive process that sums a list of numbers. Relative to the concept sorted numbers, this process achieves nothing but expends an amount of cognitive work. However, relative to the concept of average value, this process moves decidedly closer toward that goal so achieves a substantial cognitive gain by expending an amount of cognitive work. The dependency on the relationship of information to concept is unique in RIT and a powerful notion to our cognitive work theory.

7 Cognitive Gain and Cognitive Work of Bubble Sort

To illustrate the calculation of cognitive work and cognitive gain, we consider the bubble sort algorithm. An algorithm is a sequence of steps to perform a task and therefore is simply a step-by-step description of a cognitive process. By Definition 1 then, bubble sort achieves a cognitive gain and expends an amount of cognitive work.

Bubble sort is a well-known algorithm to sort any collection of items for which a “less than” relationship can be determined between the items. Here, unless otherwise stated, we assume “sorted” refers to items sorted in ascending order and “inversely sorted” refers to items in descending order. We can visualize bubble sort as a cognitive process accepting as input an unsorted collection of items and outputting the collection of items in a sorted order as shown in Fig. 4.

Fig. 4.
figure 4

Bubble sort as a cognitive process.

For this sample calculation, the items to be sorted are the letters C, B, D, A, E. The “fit” of each letter is based on its position relative to the position it should be in when sorted. If any letter is in the correct sorted position, we assign a fit of 100% (1). If a letter is not in the correct position, we assign a fit based on the number of positions away from the correct position at 20% (.2) per position. For example, if the letter B is in the 1st or 3rd position, its fitness is 80% (.8) because it is one position, or 20%, away from the correct position. Therefore, the input sequence C, B, D, A, E has fitness values of 0.6, 1.0, 0.8, 0.4, 1.0 respectively allowing the structural complexity of the unsorted letters to be calculated as follows using Eq. 11:

$$ \begin{array}{*{20}l} {\psi \left( {C,B,D,A,E} \right) = 5e^{{ - \sqrt {.6^{2} + 1^{2} + .8^{2} + .4^{2} + 1^{2} } }} } \hfill \\ {\psi \left( {C,B,D,A,E} \right) = 5e^{{ - \sqrt {.36 + 1 + .64 + .16 + 1} }} } \hfill \\ {\psi \left( {C,B,D,A,E} \right) = 5e^{{ - \sqrt {3.16} }} } \hfill \\ {\psi \left( {C,B,D,A,E} \right) = 5e^{ - 1.7776} } \hfill \\ {\psi \left( {C,B,D,A,E} \right) = 5(0.1690)} \hfill \\ {\psi \left( {C,B,D,A,E} \right) = 0.8452} \hfill \\ \end{array} $$

The bubble sort cognitive process produces the sorted letters A, B, C, D, and E as output. Since, when sorted, each letter is in the correct position, we can assign a fitness of 100% \( (p = 1) \) to each allowing us to calculate the structural complexity of the five sorted letters:

$$ \begin{array}{*{20}l} {\psi \left( {A,B,C,D,E} \right) = 5e^{{ - \sqrt {1^{2} + 1^{2} + 1^{2} + 1^{2} + 1^{2} } }} } \hfill \\ {\psi \left( {A,B,C,D,E} \right) = 5e^{ - 2.2361} } \hfill \\ {\psi \left( {A,B,C,D,E} \right) = 5(0.1069)} \hfill \\ {\psi \left( {A,B,C,D,E} \right) = 0.5344} \hfill \\ \end{array} $$

The cognitive gain achieved by the bubble sort algorithm in this case is:

$$ \begin{aligned} & G = \frac{{\psi_{out} - \psi_{in} }}{{\psi_{in} }} \\ & G = \frac{0.5344 - 0.8452}{0.8452} \\ & G = - 0.3677 \\ & G = - 36.77\% \\ \end{aligned} $$

Recall the concept to which the above is calculated relative is the sorted concept. The unsorted input does not represent the sorted concept as well as the sorted output so as expected, the structural complexity of the input (unsorted letters) is higher than the output (sorted letters). The bubble sort cognitive process reduces the structural complexity in this instance by 36.77%.

Bubble sort swaps letters when it finds an unsorted adjacent pair of letters. Each time the algorithm swaps letters it creates an intermediate sequence of letters each resulting in a change in structural complexity of the information stock being transformed. To calculate the cognitive work of the above instance of bubble sort, we must consider all of the intermediate configurations the algorithm creates before it arrives at the final output. The following shows the five versions of the letters (including the three intermediate versions) and the structural complexity of each.

Allowing the calculation of cognitive work as follows:

$$ \begin{array}{*{20}l} {W = \left| {\psi_{out} - \psi_{in} } \right| + \psi_{lost} } \hfill \\ {W = \left| {0.5344 - 0.8452} \right| + (0.0193 + 0.1225 + 0. 1 10 3+ 0.0 9 7 3)} \hfill \\ {W = 0.6602} \hfill \\ \end{array} $$

8 Cognitive Augmentation Metrics

The ability to calculate cognitive gain and cognitive work, as demonstrated above, gives us a new and powerful capability. In this section, we use these metrics to derive several other metrics to describe a human augmented by one or more artificial systems (cogs).

8.1 Cognitive Work and Gain of the Ensemble

Figure 1 shows Englebart’s original vision of human/computer symbiosis in which a human is augmented by one or more artificial systems. In Fig. 5, we update this vision to the cognitive systems era in which one or more humans work in partnership with one or more cogs to execute a cognitive task.

Fig. 5.
figure 5

Humans and cogs participating in a cognitively augmented ensemble.

The ensemble, inside the dashed border, receives information stock as input and transforms it to produce transformed information stock as output. In the execution of the cognitive task, the humans achieve a certain cognitive gain and expend a certain amount of cognitive work, G H and W H collectively. Likewise, cogs achieve a cognitive gain and expend a certain amount of cognitive work, G C and W C collectively, where:

$$ \begin{array}{*{20}c} {W_{H} = \sum\limits_{i} {W_{H}^{i} } \quad G_{H} = \sum\limits_{j} {G_{H}^{j} } } \\ {W_{C} = \sum\limits_{i} {W_{C}^{i} } \quad G_{C} = \sum\limits_{j} {G_{C}^{j} } } \\ \end{array} $$
(14)

The total amount of cognitive gain, and cognitive work by the ensemble is

$$ W^{*} = W_{H} + W_{C} \quad G^{*} = G_{H} + G_{C} $$
(15)

We have good reason to believe \( W^{*} \) is actually greater than the sum of the human and cog cognitive work but this will be the subject of a future paper.

8.2 Augmentation Factor

Given that we can calculate the individual cognitive contributions of the humans and the cogs, it is natural to compare their efforts. In fact, doing so yields the augmentation factor, A +. Note we can use either the cognitive gain or the cognitive work, or both, to calculate the augmentation factor.

$$ A_{W}^{ + } = \frac{{W_{C} }}{{W_{H} }}\quad A_{G}^{ + } = \frac{{G_{C} }}{{G_{H} }} $$
(16)

Note humans working alone without the aid of artificial systems (W C  = G C  = 0) are not augmented at all and have an A + = 0. As long as the humans are performing more cognitive work or cognitive gain than the cogs, A + < 1. This is the world in which we have been living so far. However, when cogs start performing more cognitive work or cognitive gain than humans, A + > 1 with no upward bound. We believe the cognitive systems era will see A + continually increase. In fact, we propose A + as a global metric for cognitive augmentation to be tracked over the coming years.

8.3 Cognitive Efficiency

The goal of any cognitive effort is to have the most effect (measured as cognitive gain) but expend the least amount of effort (measured by cognitive work). Therefore we define cognitive efficiency as

$$ \xi = \frac{G}{W} $$
(17)

Note, the above equation is written in a generic form. One can calculate cognitive efficiency of just the human component (G H /W H ), just the cog component (G C /W C ), or for the entire ensemble (G*/W*).

8.4 Cognitive Power

A common way to analyze performance is in the time domain. If we relate how much time it takes to achieve an amount of cognitive gain and expend an amount of cognitive work we have the metric we call cognitive power.

$$ P_{G} = \frac{G}{t}\quad P_{W} = \frac{W}{t} $$
(18)

Again, note the above equation is written in a generic form. One can calculate cognitive power of just the human component (G H /t) or (W H /t), just the cog component (G C /t) or (W C /t), or for the entire ensemble (G*/t) or (W*/t).

In terms of cognitive power, artificial systems have a distinct advantage over humans. Computers are able to perform trillions of operations per second, so the time it takes for them to achieve the same thing a human does is many orders of magnitude greater.

8.5 Cognitive Density

Another useful way to analyze performance is the energy domain. Relating how much energy it takes to achieve an amount of cognitive gain or expend an amount of cognitive work yields a metric we call cognitive density.

$$ D_{G} = \frac{G}{E}\quad D_{W} = \frac{W}{E} $$
(19)

The cognitive density of just the human component is (G H /E) or (W H /E), just the cog component is (G C /E) or (W C /E), or for the entire ensemble (G*/E) or (W*/E). Currently, electronic circuits in a computer require milliwatts of electrical power on par with neurons in the human brain, but large computers, where cognitive systems are implemented, consume many times that to operate. As cognitive systems improve and begin to be implementable on portable, handheld electronic devices, the cognitive density with respect to energy will change dramatically.

8.6 Discussion: Watson vs. Humans

An interesting illustration of the cognitive augmentation metrics introduced here is to consider what occurred in 2011 when IBM Watson played human champions in Jeopardy! Both Watson and the human players received the same clue and both were able to correctly state the answer (in a form of a question of course). In terms of a cognitive process, the information stock input and output was exactly the same for Watson and the humans. Since their outputs were the same, each achieved exactly the same cognitive gain, G Human  = G Watson .

However, to answer each clue, Watson performed billions of operations and analyzed potentially millions of pieces of information as it reasoned its way to the answer. Most observers would agree Watson performed a much greater amount of cognitive work than the humans simply because it performed millions of transformations. If we use cognitive gain as the basis, Watson and human exhibited the same cognitive power. If we use cognitive work as the basis, Watson exhibited a much greater cognitive power (simply by virtue of transforming so much information stock in so little amount of time as compared to humans), \( P_{Watson} \gg P_{Human} \).

On the other hand, if we agree Watson performed more cognitive work, then the humans achieved a much greater cognitive efficiency because they expended far less cognitive work than Watson yet achieved the same cognitive gain. The 2011 version of Watson consumed on the order of 20 kW of electrical power while the human brain consumes about 20 W of power. In terms of cognitive gain, humans achieved a cognitive density 1000 times greater than Watson. However, in terms of cognitive work, even though Watson consumed 1000 times more energy, most would agree it performed millions of times more cognitive work so therefore achieved a greater cognitive density.

In terms of cognitive power, the rules of Jeopardy! assured both human and computer had to answer in about the same amount of time. Since each achieved the same cognitive gain, one would reason they exhibited the same cognitive power. However, in terms of cognitive work, Watson achieved a significantly greater cognitive power.

9 Conclusion and Future Work

We have employed a new kind of information theory, called representational information theory, to form the basis of a set of metrics we can use to characterize and analyze the coming revolution in the cognitive systems era. In the near future, we will all be interacting with cogs capable of performing human-level cognition. All of us will become augmented humans where our cognitive ability is enhanced by working in partnership with cogs. The metrics presented here, cognitive work, cognitive gain, cognitive power, cognitive efficiency, cognitive density, and augmentation factor can be used to study this new era of human/computer interaction.

We have offered a new, generic definition of cognition in defining a cognitive process as the transformation of information stock resulting in a cognitive gain at the expenditure of an amount of cognitive work. We think these metrics could be adopted and used in the field of cognitive systems architectures as an implementation-independent way to compare architectures.

Similarly, we feel neuroscientists, psychologists, behaviorists, and computer scientists studying human cognitive performance can adopt our notion of cognition, cognitive work, and cognitive gain, and other metrics, as an implementation-independent way to compare and contrasts research findings. For example, we urge researchers in computational complexity to use cognitive gain and cognitive work to analyze complexity classes such “NP-complete” and “NP-hard.” Currently these can be analyzed statistically, but the notions of cognitive gain and cognitive work give the possibility of measuring specific instances of algorithms.

Finally, we certainly hope the cognitive systems and cognitive augmentation community will adopt, further explore, and extend the cognitive augmentation metrics presented here. We urge researchers to include in their analyses of cognitive systems discussion and calculations using the metrics introduced here.