Keywords

1 Introduction

Bitcoin is the most popular decentralized digital currency and it is the largest of its kind in terms of total market value. As of May 2017, the total number of bitcoins in circulation correspond to over 28B US dollars. As opposed to traditional currencies, Bitcoin does not rely on a trusted entity like a bank or governmental authority. Instead, it is based on an open social model of trust and on incentivized collaboration. After an initial period when it was only known to a small group of enthusiasts and libertarians, Bitcoin has recently gained considerable popularity. According to the “State of Bitcoin” [2], nowadays more than 100,000 merchants accept payments in Bitcoin. Developers started to add it in their applications as a standard form of payment and financial institutions have recently launched initiatives to explore its potential.

The average number of daily Bitcoin transactions, as of May 2017, is somewhere around 280,000. This number is known to be heavily influenced by a number of phenomena. Wallets shuffling and coin mixing are just two examples of activities that generate transactions that are not directly related to real purchases of goods or services. Another example comes from the activity of some exchanges (i.e., organizations that allow users to trade Bitcoin for fiat currency and vice versa) that use long transaction chains to issue payments to customers that decide to withdraw Bitcoins. This kind of organizations aggregate several deposits into a single large transaction and repeatedly issue payments spending the large change of the previous transaction at each step. Although these long transaction chains are triggered by human activity (i.e., by users that decide to trade Bitcoins), they are still generated by a automatic mechanisms that inflate the raw number of daily Bitcoin transactions associated to the explicit exchange of goods or services between users.

On top of this it is also believed that organizations with interests in Bitcoin generate transactions with the mere objective of attracting investors and inflating the exchange rate. Being generated by computer programs, these “artificial” transactions often introduce in the blockchain regular patterns. Visual systems [5, 9] and previous analytical papers [10, 14] have pinpointed various suspicious structures ranging from binary tree-like distributions, fork-merge patterns, long and “peeling” chains [10].

In this paper we focus on long transaction chains and consider the frequency at which these chains evolve over time. In brief, we label each transaction with its LLC, namely with the length of the longest chain the transaction lays on, and we analyze the statistical distribution of the LLCs using both short and long intervals of time. We therefore introduce the concept of Bitcoin Heartbeat, namely an average measure of the pace at which long chains in Bitcoin have grown over the history.

The paper is structured as follows. Section 2 gives a short description of the Bitcoin transaction graph. Section 3, after providing the reader with some context on long transaction chains, shows the results of our experiments on the distribution of long chains. Section 4 focuses on a specific set of transactions and analyzes how the chains they lay on change through time. Section 5 introduces the concept of Bitcoin Heartbeat. Section 6 concludes the paper.

2 The Bitcoin Transaction Graph

In this section we give a simplified description of Bitcoin transactions and we define the Bitcoin transaction graph. For a broader introduction to Bitcoin see e.g. the original paper [11] and recent surveys [4, 16].

A transaction (in what follows tx) t has a set of inputs \(i^1_t,\dots ,i^h_t\) and a set of outputs \(o^1_t,\dots ,o^k_t\), each associated with a cryptographic identifier, called address, and a bitcoin amount. A tx transfer bitcoins from its inputs to its outputs. Outputs of txs are denoted txos. At a certain time T, each txo of a tx t can be unspent (utxo) or spent (stxo). The only way to spend a utxo \(o_t\) of t is to use it as the input \(i_{t'}\) of a tx \(t'\) (with \(t \ne t'\)). In this way, bitcoins flowing from one tx to the other create the so called “chain of ownership”.

As the authors of [13] we define a directed graph, called Transaction graph (tx-graph), as follows. Nodes are txs. Nodes t and \(t'\) are connected by a directed edge \((t,t')\) if one output \(o_t\) of t is used as an input \(i_{t'}\) of \(t'\). More precisely, the tx-graph is acyclic, because transactions are never issued twice and it is a multigraph, since several outputs of t can correspond to several inputs of \(t'\). For the sake of simplicity, we will refer to the tx-graph without the attributes “directed”, “acyclic” and “multi”. An example of a tx-graph can be found in Fig. 1. The above defined graph differs from the user-graphs defined for example in [6, 7, 12] where nodes represent users and edges represent transactions involving pairs of users. The user-graph is obtained by contracting the tx-graph thanks to a heuristic described in [13]. The rule establishes that all the addresses associated to all the inputs of a multi-input tx belong to the same user and can be therefore clustered together. An example of a user-graph can be found in Fig. 2. For the rest of this paper we will always refer to the raw tx-graph.

Fig. 1.
figure 1

An example of tx-graph with 3 txs (1, 2 and 3). Tx 1 has 3 inputs (\(i_1\), \(i_2\) and \(i_3\)) associated to the addresses A, B and C and 3 outputs (\(o_1\), \(o_2\) and \(o_3\)) associated to D, E and F. These outputs are spent in txs 2 and 3. Some inputs of txs 2 and 3 come from outputs of txs that are not part of the drawing (dashed arrows).

Fig. 2.
figure 2

The corresponding user-graph obtained by heuristic described in [13]. Addresses associated to all the inputs of the 3 txs are grouped into single nodes. For the purpose of this paper we do not consider these types of graphs.

The Blockchain is divided into “pages” called blocks. Each block contains, roughly, the txs issued in a time interval of ten minutes. The block sequence number is its height. For a tx t we denote b(t) the block of t. As of May 2017, the Blockchain consists of about 460,000 blocks and contains about 220 M txs, that is the number of nodes of the tx-graph.

Given a set S of txs, the subgraph of the tx-graph induced by S is the graph whose nodes coincide with S and whose edges are the edges of the tx-graph between vertices of S. For a given pair of blocks \((b',b'')\) such that \(b'<b''\) we define \(G(b',b'')\), as the tx-graph induced by the txs \(t_i\) s.t. \(b'\le b(t_i) \le b''\).

3 Long Transaction Chains

Understanding whether a chain of txs has been generated automatically or it reflects a chain of human purchases is a challenging task. In fact, the “chain of ownership” naturally introduces long chains in the tx-graph. What is often not natural is the velocity at which these chains are generated. With this in mind, Blockchain.info [1] has developed a heuristic to rule out high velocity chains that are probably not in a one-to-one correspondence with chains of real purchases of goods and servicesFootnote 1. The heuristic works as follows. Each 24 h a counter resets and keeps track of the lengths of the new chains looking at how many times tx outputs are spent on the same day. Data are summarized in Fig. 3. This simple heuristic provides a first estimate of the extent of the phenomena we are looking at: only about 40% of the total number of daily txs do not belong to chains longer than 10.

Fig. 3.
figure 3

Number of confirmed txs per day. Red series includes all the txs; green (grey, yellow, blue) series excludes txs belonging to chains longer than 10000 (1000, 100, 10). (Color figure online)

3.1 What Happens in a Day

To understand the nature of long tx chains in Bitcoin, we designed Algorithm 1.

figure a

Such algorithm receives as input the tx-graph \(G(b',b'')\), for a pair of blocks \((b',b'')\), and labels each node v of \(G(b',b'')\) with a quantity LLC that represents the length of the longest chain of \(G(b',b'')\), vertex v belongs to. In the algorithm with the word “source” (“sink”) we refer to nodes with in-degree (out-degree) equal to zero. LLC labeling is computed by leveraging a topological ordering algorithm described in [15].

We performed a first experiment computing graph \(G_1 = G(417113,417256)\) which correspond approximately to 24 h of activity. We then ran Algorithm 1 to label each node with its LLC. \(G_1\) has 219,084 nodes and 264,084 edges. About 10% of the nodes (20,975) have \(LLC = 0\) and about 7% of the nodes (15,775) have \(LLC = 1\). Figure 4 shows the probability density function of LLC using logarithmic scales on both axes. We left out of the chart nodes with \(LLC = 0\) or 1 in order to be able to draw on logarithmic axes. We also computed a power trendline for values of LLC lower than 100 (see the dashed red line on the chart and the equation in the top-right corner). We found out that the left part of the distribution seems to be following a power-law (we recall that straight lines on doubly logarithmic axes are equivalent to exponentially decreasing curves on linear axes). We ran the same experiment on 30 different, randomly selected days obtaining very similar charts and interpolations. Additionally, Fig. 5 shows the cumulative distribution function of LLC. We can observe that about 60% of the daily txs have \(LLC \le 200\) and that about 90% of them have \(LLC \le 700\).

Fig. 4.
figure 4

Probability density function of LLC on 144 blocks using log scales. (Color figure online)

Fig. 5.
figure 5

Cumulative distribution function of LLC on 144 blocks.

3.2 Extending the Analysis to a Wider Range of Blocks

To better understand the LLC distribution, we computed graph \(G_2 = G(413000, 419143)\) referred to 6144 consecutive blocks (this number corresponds to the maximum blocks we managed to load in the memory of a single machine) and we ran Algorithm 1. Graph \(G_2\) has 9,344,879 nodes and 18,375,705 edges. Txs for which \(LLC = 0\) are 0.49% of the total whereas txs for which \(LLC = 1\) are 0.27% of the total. Figure 6 shows the probability density function of LLC. Values on the y-axis have been normalized by multiplying them by 144/6144 where 144 is the average number of blocks per day and 6144 is the number of considered blocks. We have therefore obtained the average number of transactions per day (y-axis) with a certain value of LLC (x-axis). We can observe that also in this case, values of LLC lower than about 100 seem to be following a power-law and that even though values are much higher, the interpolated trendline for values lower than 100 has essentially the same shape and slope as before. The CDF instead (reported in Fig. 7), has a very different shape. This can be explained as follows: even if the number of nodes, that is the average number of txs per day, is roughly the same as the previous experiment, the set of edges increased quite a lot, since its number grows non linearly with the number of involved blocks. We have that 10% of the samples have LLC \(\le \) 20,000, whereas 50% have LLC \(\le \) 50,000.

Fig. 6.
figure 6

Probability density function of frequency of LLC on 6144 blocks.

Fig. 7.
figure 7

Cumulative distribution function of LLC on 6144 blocks.

3.3 Trying to Separate Human and Non-human Activity

As reported in [8], power-law distributions have been termed “the signature of human activity”. As it is clear from Figs. 4 and 6, the probability density functions of LLC for our experiments do not exhibit the classical power-law shape in their entirety. In particular our distributions lack very long steady tails. In fact, for values higher than about 100, LLC does not seem to follow any regular trendline. Looking at Figs. 4 and 6 we suspect that the power-law portions of the distributions represent human activity whereas the rest represent algorithmically generated txs. Zooming into the figures, we observed that for values higher than about 100, a series of consecutive peaks appear. Such peaks might be interpreted as a sequence of automatic phenomena, each of which introduces at its own frequency new “artificial” transactions in the blockchain. Examples of these peaks can be observed in Fig. 8 where we zoomed into Fig. 6 and we considered the number of daily txs for which \(1000\le LLC \le 10000\).

Fig. 8.
figure 8

Zoom of Fig. 6 showing number of daily txs with \(1000\le LLC \le 10000\). X-axis uses a linear scale and y-axis uses a logarithmic scale.

4 A Deeper Analysis of a Specific Set of Transactions

In this section we describe the outcome of an experiment aimed at understanding how LLC values change over time for a specific set of txs. We decided to deepen the analysis on a recent block and arbitrarily selected block \(B = 416000\). The number of txs in B is 1205. We computed graphs \(G_k = G(B-k+1, B+k)\) with \(k=3,6,12,\dots ,1536,3072\). Such graphs refer to sequences of blocks “centered” around B, including a number of blocks that grows exponentially. We therefore obtained graphs \(G(415998,416003), G(415995,416006), \dots , G(414465,417536), G(412929,419072)\) where the first graph refers to 6 blocks (corresponding to about one hour of activity) and the last graph refers to 6144 blocks (corresponding to about 42 days of activity). For each \(G_k\) we computed with Algorithm 1, the LLC value for each tx in B. Then we normalized such values, i.e., we computed \((LLC \times 144)/b\), where b is the number of blocks of \(G_k\) obtaining the txs per day (TPD) belonging to the longest chain traversing each tx. Figure 9 shows the evolution of TPD for every tx in B. In order to be able to represent null values on logarithmic axes we changed them to be 0.01.

Fig. 9.
figure 9

Evolution of the transactions per day for a specific set of txs. Each point in the plot refers to a specific transaction t of B. Its x-value is the number of blocks of a graph \(G_k\). Its y-value is the TPD for t in \(G_k\). Each tx is represented by a set of points, each showing its TPD in a graph \(G_k\). Such points are linked by a curve. The red curves refer to txs that “wake up” in \(G_{6}\) (in one hour). (Color figure online)

Interestingly, TPD for almost all txs, in the long run, converges to a value included in [300, 1300] as indicated by the red bar in the top-right corner of the figure. This suggests that, after some time, most txs in B will be connected to chains that evolve at the pace of h TPDs, with \(h \in [300,1300]\). Note that, for each graph \(G_k\) (see Fig. 9) there is a certain number of txs that “get in the game”, i.e., txs whose TPD for the graph \(G_{k}\) is 0 and for the subsequent graph is \(\ne 0\). We say that such txs “wake up”. We drew in red those txs “waking up” in graph \(G_{6}\). We decided to pay specific attention to those txs because, in a sense, they start belonging to some chains exactly at the same time. Txs with this characteristic will be the object of our last experiment.

5 The Bitcoin Heartbeat

In this section we describe our last experiment aimed at investigating how the interval h, introduced in Sect. 4 changed in the Bitcoin history. We considered as a starting date Jan. 2011 because looking at the numbers [3] in this month the number of daily Bitcoin transaction started to be steadily above 1000. Following the same procedure of Sect. 4, we built 22 families of graphs such that each of them refers to 6144 consecutive blocks. The 22 families of graphs correspond to intervals of blocks centered in a random block of the first day of the months Feb., May, Aug. and Nov. of years 2011–2015 and partially 2016 (only Feb. and May). The considered intervals span in total about 141,000 blocks. We then built, for each family, a charts similar to the one of Fig. 9. In particular for each family we only considered txs “waking up” when graph \(G_{6}\) of the family is taken into account. Finally, we computed one h-interval for each family of graphs as in the previous section, restricting the attention to those txs.

Fig. 10.
figure 10

Evolution of the Bitcoin heartbeat. x-axis is labeled with time. y-axis with the frequency at which on average, LLC values of “waking-up” txs grow in one hour. The standard deviation, for each average value is represented using vertical dashed lines.

Since the h-interval is the set of frequency values where txs tend to converge over time, we call its average value the Bitcoin Heartbeat. Figure 10 shows the evolution of the Bitcoin heartbeat. Note that, in the figure, frequencies are normalized to txs per hour rather than to TPDs as in Fig. 9. Each point represents the frequency at which on average, LLC values of “waking up” txs grow in an hour. In May 2016 we were standing around 52. This means that at the given pace, on average, tx chains get longer by about 1248 (i.e., \(52 \times 24\)) TPDs.

6 Conclusions

In this paper we have analyzed long chains in the Bitcoin transaction graph performing several experiments each spanning a considerable amount of time and involving a large number of blocks.

The experiments put in evidence what follows. (i) The distribution of the lengths of the longest chains passing through transactions exhibit a shape that is hard to believe to be produced by explicit human activities. In fact, it consists of a low frequency portion that resembles a power-law distribution and an high frequency portion that contains several peaks. (ii) If we consider a sufficiently large amount of time the transactions surprisingly tend to be traversed by long chains with frequencies distributed in a somehow small interval. We call the average of such interval Bitcoin Heartbeat. (iii) The Bitcoin Heartbeat has a rather stable value that has slowly grown in the recent Bitcoin history.

Although our observations highlight some interesting properties of the tx-graph we believe that further reasoning is needed about the found results. We believe that a better understanding of the dynamics taking place in Bitcoin has two positive side effects. On one hand it stimulates new research in the field and on the other hand it leads to a more conscious digital economy.