Abstract
Over the past few years a persistent growth of the number of daily Bitcoin transactions has been observed. This trend however, is known to be influenced by a number of phenomena that generate long transaction chains that are not related to real purchases (e.g. wallets shuffling and coin mixing). For a transaction chain we call transaction chain frequency the number of transactions of the chain divided by the time interval of the chain. In this paper, we first analyze to which extent Bitcoin transactions are involved in high frequency transaction chains, in the short and in the long term. Based on this analysis, we then argue that a large fraction of transactions do not refer to explicit human activity, namely to transactions between users that trade goods or services. Finally, we show that most of the transactions are involved into chains whose frequency is roughly stable over time and that we call Bitcoin Heartbeat.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Bitcoin is the most popular decentralized digital currency and it is the largest of its kind in terms of total market value. As of May 2017, the total number of bitcoins in circulation correspond to over 28B US dollars. As opposed to traditional currencies, Bitcoin does not rely on a trusted entity like a bank or governmental authority. Instead, it is based on an open social model of trust and on incentivized collaboration. After an initial period when it was only known to a small group of enthusiasts and libertarians, Bitcoin has recently gained considerable popularity. According to the “State of Bitcoin” [2], nowadays more than 100,000 merchants accept payments in Bitcoin. Developers started to add it in their applications as a standard form of payment and financial institutions have recently launched initiatives to explore its potential.
The average number of daily Bitcoin transactions, as of May 2017, is somewhere around 280,000. This number is known to be heavily influenced by a number of phenomena. Wallets shuffling and coin mixing are just two examples of activities that generate transactions that are not directly related to real purchases of goods or services. Another example comes from the activity of some exchanges (i.e., organizations that allow users to trade Bitcoin for fiat currency and vice versa) that use long transaction chains to issue payments to customers that decide to withdraw Bitcoins. This kind of organizations aggregate several deposits into a single large transaction and repeatedly issue payments spending the large change of the previous transaction at each step. Although these long transaction chains are triggered by human activity (i.e., by users that decide to trade Bitcoins), they are still generated by a automatic mechanisms that inflate the raw number of daily Bitcoin transactions associated to the explicit exchange of goods or services between users.
On top of this it is also believed that organizations with interests in Bitcoin generate transactions with the mere objective of attracting investors and inflating the exchange rate. Being generated by computer programs, these “artificial” transactions often introduce in the blockchain regular patterns. Visual systems [5, 9] and previous analytical papers [10, 14] have pinpointed various suspicious structures ranging from binary tree-like distributions, fork-merge patterns, long and “peeling” chains [10].
In this paper we focus on long transaction chains and consider the frequency at which these chains evolve over time. In brief, we label each transaction with its LLC, namely with the length of the longest chain the transaction lays on, and we analyze the statistical distribution of the LLCs using both short and long intervals of time. We therefore introduce the concept of Bitcoin Heartbeat, namely an average measure of the pace at which long chains in Bitcoin have grown over the history.
The paper is structured as follows. Section 2 gives a short description of the Bitcoin transaction graph. Section 3, after providing the reader with some context on long transaction chains, shows the results of our experiments on the distribution of long chains. Section 4 focuses on a specific set of transactions and analyzes how the chains they lay on change through time. Section 5 introduces the concept of Bitcoin Heartbeat. Section 6 concludes the paper.
2 The Bitcoin Transaction Graph
In this section we give a simplified description of Bitcoin transactions and we define the Bitcoin transaction graph. For a broader introduction to Bitcoin see e.g. the original paper [11] and recent surveys [4, 16].
A transaction (in what follows tx) t has a set of inputs \(i^1_t,\dots ,i^h_t\) and a set of outputs \(o^1_t,\dots ,o^k_t\), each associated with a cryptographic identifier, called address, and a bitcoin amount. A tx transfer bitcoins from its inputs to its outputs. Outputs of txs are denoted txos. At a certain time T, each txo of a tx t can be unspent (utxo) or spent (stxo). The only way to spend a utxo \(o_t\) of t is to use it as the input \(i_{t'}\) of a tx \(t'\) (with \(t \ne t'\)). In this way, bitcoins flowing from one tx to the other create the so called “chain of ownership”.
As the authors of [13] we define a directed graph, called Transaction graph (tx-graph), as follows. Nodes are txs. Nodes t and \(t'\) are connected by a directed edge \((t,t')\) if one output \(o_t\) of t is used as an input \(i_{t'}\) of \(t'\). More precisely, the tx-graph is acyclic, because transactions are never issued twice and it is a multigraph, since several outputs of t can correspond to several inputs of \(t'\). For the sake of simplicity, we will refer to the tx-graph without the attributes “directed”, “acyclic” and “multi”. An example of a tx-graph can be found in Fig. 1. The above defined graph differs from the user-graphs defined for example in [6, 7, 12] where nodes represent users and edges represent transactions involving pairs of users. The user-graph is obtained by contracting the tx-graph thanks to a heuristic described in [13]. The rule establishes that all the addresses associated to all the inputs of a multi-input tx belong to the same user and can be therefore clustered together. An example of a user-graph can be found in Fig. 2. For the rest of this paper we will always refer to the raw tx-graph.
The Blockchain is divided into “pages” called blocks. Each block contains, roughly, the txs issued in a time interval of ten minutes. The block sequence number is its height. For a tx t we denote b(t) the block of t. As of May 2017, the Blockchain consists of about 460,000 blocks and contains about 220 M txs, that is the number of nodes of the tx-graph.
Given a set S of txs, the subgraph of the tx-graph induced by S is the graph whose nodes coincide with S and whose edges are the edges of the tx-graph between vertices of S. For a given pair of blocks \((b',b'')\) such that \(b'<b''\) we define \(G(b',b'')\), as the tx-graph induced by the txs \(t_i\) s.t. \(b'\le b(t_i) \le b''\).
3 Long Transaction Chains
Understanding whether a chain of txs has been generated automatically or it reflects a chain of human purchases is a challenging task. In fact, the “chain of ownership” naturally introduces long chains in the tx-graph. What is often not natural is the velocity at which these chains are generated. With this in mind, Blockchain.info [1] has developed a heuristic to rule out high velocity chains that are probably not in a one-to-one correspondence with chains of real purchases of goods and servicesFootnote 1. The heuristic works as follows. Each 24 h a counter resets and keeps track of the lengths of the new chains looking at how many times tx outputs are spent on the same day. Data are summarized in Fig. 3. This simple heuristic provides a first estimate of the extent of the phenomena we are looking at: only about 40% of the total number of daily txs do not belong to chains longer than 10.
3.1 What Happens in a Day
To understand the nature of long tx chains in Bitcoin, we designed Algorithm 1.
Such algorithm receives as input the tx-graph \(G(b',b'')\), for a pair of blocks \((b',b'')\), and labels each node v of \(G(b',b'')\) with a quantity LLC that represents the length of the longest chain of \(G(b',b'')\), vertex v belongs to. In the algorithm with the word “source” (“sink”) we refer to nodes with in-degree (out-degree) equal to zero. LLC labeling is computed by leveraging a topological ordering algorithm described in [15].
We performed a first experiment computing graph \(G_1 = G(417113,417256)\) which correspond approximately to 24 h of activity. We then ran Algorithm 1 to label each node with its LLC. \(G_1\) has 219,084 nodes and 264,084 edges. About 10% of the nodes (20,975) have \(LLC = 0\) and about 7% of the nodes (15,775) have \(LLC = 1\). Figure 4 shows the probability density function of LLC using logarithmic scales on both axes. We left out of the chart nodes with \(LLC = 0\) or 1 in order to be able to draw on logarithmic axes. We also computed a power trendline for values of LLC lower than 100 (see the dashed red line on the chart and the equation in the top-right corner). We found out that the left part of the distribution seems to be following a power-law (we recall that straight lines on doubly logarithmic axes are equivalent to exponentially decreasing curves on linear axes). We ran the same experiment on 30 different, randomly selected days obtaining very similar charts and interpolations. Additionally, Fig. 5 shows the cumulative distribution function of LLC. We can observe that about 60% of the daily txs have \(LLC \le 200\) and that about 90% of them have \(LLC \le 700\).
3.2 Extending the Analysis to a Wider Range of Blocks
To better understand the LLC distribution, we computed graph \(G_2 = G(413000, 419143)\) referred to 6144 consecutive blocks (this number corresponds to the maximum blocks we managed to load in the memory of a single machine) and we ran Algorithm 1. Graph \(G_2\) has 9,344,879 nodes and 18,375,705 edges. Txs for which \(LLC = 0\) are 0.49% of the total whereas txs for which \(LLC = 1\) are 0.27% of the total. Figure 6 shows the probability density function of LLC. Values on the y-axis have been normalized by multiplying them by 144/6144 where 144 is the average number of blocks per day and 6144 is the number of considered blocks. We have therefore obtained the average number of transactions per day (y-axis) with a certain value of LLC (x-axis). We can observe that also in this case, values of LLC lower than about 100 seem to be following a power-law and that even though values are much higher, the interpolated trendline for values lower than 100 has essentially the same shape and slope as before. The CDF instead (reported in Fig. 7), has a very different shape. This can be explained as follows: even if the number of nodes, that is the average number of txs per day, is roughly the same as the previous experiment, the set of edges increased quite a lot, since its number grows non linearly with the number of involved blocks. We have that 10% of the samples have LLC \(\le \) 20,000, whereas 50% have LLC \(\le \) 50,000.
3.3 Trying to Separate Human and Non-human Activity
As reported in [8], power-law distributions have been termed “the signature of human activity”. As it is clear from Figs. 4 and 6, the probability density functions of LLC for our experiments do not exhibit the classical power-law shape in their entirety. In particular our distributions lack very long steady tails. In fact, for values higher than about 100, LLC does not seem to follow any regular trendline. Looking at Figs. 4 and 6 we suspect that the power-law portions of the distributions represent human activity whereas the rest represent algorithmically generated txs. Zooming into the figures, we observed that for values higher than about 100, a series of consecutive peaks appear. Such peaks might be interpreted as a sequence of automatic phenomena, each of which introduces at its own frequency new “artificial” transactions in the blockchain. Examples of these peaks can be observed in Fig. 8 where we zoomed into Fig. 6 and we considered the number of daily txs for which \(1000\le LLC \le 10000\).
4 A Deeper Analysis of a Specific Set of Transactions
In this section we describe the outcome of an experiment aimed at understanding how LLC values change over time for a specific set of txs. We decided to deepen the analysis on a recent block and arbitrarily selected block \(B = 416000\). The number of txs in B is 1205. We computed graphs \(G_k = G(B-k+1, B+k)\) with \(k=3,6,12,\dots ,1536,3072\). Such graphs refer to sequences of blocks “centered” around B, including a number of blocks that grows exponentially. We therefore obtained graphs \(G(415998,416003), G(415995,416006), \dots , G(414465,417536), G(412929,419072)\) where the first graph refers to 6 blocks (corresponding to about one hour of activity) and the last graph refers to 6144 blocks (corresponding to about 42 days of activity). For each \(G_k\) we computed with Algorithm 1, the LLC value for each tx in B. Then we normalized such values, i.e., we computed \((LLC \times 144)/b\), where b is the number of blocks of \(G_k\) obtaining the txs per day (TPD) belonging to the longest chain traversing each tx. Figure 9 shows the evolution of TPD for every tx in B. In order to be able to represent null values on logarithmic axes we changed them to be 0.01.
Interestingly, TPD for almost all txs, in the long run, converges to a value included in [300, 1300] as indicated by the red bar in the top-right corner of the figure. This suggests that, after some time, most txs in B will be connected to chains that evolve at the pace of h TPDs, with \(h \in [300,1300]\). Note that, for each graph \(G_k\) (see Fig. 9) there is a certain number of txs that “get in the game”, i.e., txs whose TPD for the graph \(G_{k}\) is 0 and for the subsequent graph is \(\ne 0\). We say that such txs “wake up”. We drew in red those txs “waking up” in graph \(G_{6}\). We decided to pay specific attention to those txs because, in a sense, they start belonging to some chains exactly at the same time. Txs with this characteristic will be the object of our last experiment.
5 The Bitcoin Heartbeat
In this section we describe our last experiment aimed at investigating how the interval h, introduced in Sect. 4 changed in the Bitcoin history. We considered as a starting date Jan. 2011 because looking at the numbers [3] in this month the number of daily Bitcoin transaction started to be steadily above 1000. Following the same procedure of Sect. 4, we built 22 families of graphs such that each of them refers to 6144 consecutive blocks. The 22 families of graphs correspond to intervals of blocks centered in a random block of the first day of the months Feb., May, Aug. and Nov. of years 2011–2015 and partially 2016 (only Feb. and May). The considered intervals span in total about 141,000 blocks. We then built, for each family, a charts similar to the one of Fig. 9. In particular for each family we only considered txs “waking up” when graph \(G_{6}\) of the family is taken into account. Finally, we computed one h-interval for each family of graphs as in the previous section, restricting the attention to those txs.
Since the h-interval is the set of frequency values where txs tend to converge over time, we call its average value the Bitcoin Heartbeat. Figure 10 shows the evolution of the Bitcoin heartbeat. Note that, in the figure, frequencies are normalized to txs per hour rather than to TPDs as in Fig. 9. Each point represents the frequency at which on average, LLC values of “waking up” txs grow in an hour. In May 2016 we were standing around 52. This means that at the given pace, on average, tx chains get longer by about 1248 (i.e., \(52 \times 24\)) TPDs.
6 Conclusions
In this paper we have analyzed long chains in the Bitcoin transaction graph performing several experiments each spanning a considerable amount of time and involving a large number of blocks.
The experiments put in evidence what follows. (i) The distribution of the lengths of the longest chains passing through transactions exhibit a shape that is hard to believe to be produced by explicit human activities. In fact, it consists of a low frequency portion that resembles a power-law distribution and an high frequency portion that contains several peaks. (ii) If we consider a sufficiently large amount of time the transactions surprisingly tend to be traversed by long chains with frequencies distributed in a somehow small interval. We call the average of such interval Bitcoin Heartbeat. (iii) The Bitcoin Heartbeat has a rather stable value that has slowly grown in the recent Bitcoin history.
Although our observations highlight some interesting properties of the tx-graph we believe that further reasoning is needed about the found results. We believe that a better understanding of the dynamics taking place in Bitcoin has two positive side effects. On one hand it stimulates new research in the field and on the other hand it leads to a more conscious digital economy.
Notes
- 1.
From Blockchain.info: “There are many legitimate reasons to create long transaction chains; however, they may also be caused by coin mixing or possible attempts to manipulate transaction volume.”
References
Blockchain.info. https://blockchain.info/
Coindesk. http://www.coindesk.com/state-of-bitcoin-blockchain-2016/
Evolution of the number of bitcoin transactions. https://blockchain.info/it/charts/n-transactions?timespan=all
Bonneau, J., Miller, A., Clark, J., Narayanan, A., Kroll, J.A., Felten, E.W.: SoK: research perspectives and challenges for bitcoin and cryptocurrencies. In: 2015 IEEE Symposium on Security and Privacy, pp. 104–121, May 2015
Di Battista, G., Di Donato, V., Patrignani, M., Pizzonia, M., Roselli, V., Tamassia, R.: Bitconeview: visualization of flows in the bitcoin transaction graph. In: 2015 IEEE Symposium on Visualization for Cyber Security (VizSec), pp. 1–8 (2015)
Di Francesco Maesa, D., Marino, A., Ricci, L.: Uncovering the Bitcoin Blockchain: an analysis of the full users graph. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 537–546, October 2016
Di Francesco Maesa, D., Marino, A., Ricci, L.: Data-driven analysis of bitcoin properties: exploiting the users graph. Int. J. Data Sci. Anal. (2017). https://doi.org/10.1007/s41060-017-0074-x
Fabrikant, A., Koutsoupias, E., Papadimitriou, C.H.: Heuristically optimized trade-offs: a new paradigm for power laws in the internet. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 110–122. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45465-9_11
McGinn, D., Birch, D., Akroyd, D., Molina-Solana, M., Guo, Y., Knottenbelt, W.: Visualizing dynamic bitcoin transaction patterns. Big Data http://hdl.handle.net/10044/1/32752
Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G.M., Savage, S.: A fistful of bitcoins: characterizing payments among men with no names. In: Proceedings of the ACM Internet Measurement Conference, IMC, pp. 127–140 (2013)
Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008). http://www.bitcoin.org/bitcoin.pdf
Ober, M., Katzenbeisser, S., Hamacher, K.: Structure and anonymity of the bitcoin transaction graph. Future Internet 5(2), 237–250 (2013). http://www.mdpi.com/1999-5903/5/2/237
Reid, F., Harrigan, M.: An analysis of anonymity in the bitcoin system. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pp. 1318–1326, October 2011
Ron, D., Shamir, A.: Quantitative analysis of the full bitcoin transaction graph. In: Sadeghi, A.-R. (ed.) FC 2013. LNCS, vol. 7859, pp. 6–24. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39884-1_2
Skiena, S.S.: The Algorithm Design Manual, 2nd edn. Springer Publishing Company, Incorporated, London (2008). https://doi.org/10.1007/978-1-84800-070-4
Yli-Huumo, J., Ko, D., Choi, S., Park, S., Smolander, K.: Where is current research on blockchain technology?—a systematic review. PLoS One 11(10), 1–27 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Di Battista, G., Di Donato, V., Pizzonia, M. (2018). Long Transaction Chains and the Bitcoin Heartbeat. In: Heras, D., et al. Euro-Par 2017: Parallel Processing Workshops. Euro-Par 2017. Lecture Notes in Computer Science(), vol 10659. Springer, Cham. https://doi.org/10.1007/978-3-319-75178-8_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-75178-8_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75177-1
Online ISBN: 978-3-319-75178-8
eBook Packages: Computer ScienceComputer Science (R0)