A reinforcement learning model for the reliability of blockchain oracles

https://doi.org/10.1016/j.eswa.2022.119160Get rights and content

Highlights

  • We propose BLOR, a Bayesian Bandit Learning model for Oracles Reliability.

  • BLOR identifies trustless and cost-efficient oracles.

  • BLOR integrates reinforcement learning to a Bayesian cost-driven reputation model.

  • BLOR is implemented in Ethereum and benchmarked against several algorithms.

  • BLOR shows a steady performance even in low to high noisy dynamic environments.

Abstract

Smart contracts struggle with the major limitation of operating on data that is solely residing on the blockchain network. The need of recruiting third parties, known as oracles, to assist smart contracts has been recognized with the emergence of blockchain technology. Oracles could be deviant and commit ill-intentioned behaviors, or be selfish and hide their actual available resources to gain optimal profit. Current research proposals employ oracles as trusted entities with no robust assessment mechanism, which entails a risk of turning them into centralized points of failure. The need for an effective method to select the most economical and rewarding oracles that are self-interested and act independently is somehow neglected. Thus, this paper proposes a Bayesian Bandit Learning Oracles Reliability (BLOR) mechanism to identify trustless and cost-efficient oracles. Within BLOR, we learn the behavior of oracles by formulating a Bayesian cost-dependent reputation model and utilize reinforcement learning (knowledge gradient algorithm) to guide the learning process. BLOR enables all the blockchain validators to verify the obtained results while running the algorithm at the same time by dealing with the randomness issue within the limited blockchain structure. We implement and experiment with BLOR using Python and the Solidity language on Ethereum. BLOR is benchmarked against several models where it proved to be highly efficient in selecting the most reliable and economical oracles with a fair balance.

Introduction

Blockchain technology has the ability to cut the role of middlemen by enabling self-enforcing digital contracts (called smart contracts), whose execution does not require any human involvement in a safe, secure, and immutable way. The emergence of the blockchain as a revolutionary technology has been compared to the Internet, and it has predicted that it will erode power from centralized authorities. With its deployment as a service (Lu, Xu, Liu, Weber, Zhu, & Zhang, 2019) and its integration with IoT (Baygin et al., 2022, Ho et al., 2021), blockchain has a promising approach in supporting business collaborations by ensuring transparency to all the stakeholders if conflicts arise (Hull et al., 2016). However, the integration of blockchain with external data is one of the major obstacles preventing widespread adoption. Imagine that two persons place a bet on who wins a football match and deposit their funds in a smart contract. Based on the results of the game, the smart contract should release the funds to the winner. However, a smart contract does not have access to the data out of its network and should ask a trusted party to learn who won the match.

In blockchain, the term oracle refers to an entity that can access external data without compromising the integrity of the blockchain. Oracles are assumed to be third-party agents that are trustworthy and can communicate with the outside world, and fetch the data into the blockchain Xu et al. (2016). Oracles are also able to connect the blockchain to external databases. This way, costly computations can be carried out outside of the blockchain. Oracles ensure the integrity of the retrieved data by providing some evidences (Kochovski, Gec, Stankovski, Bajec, & Drobintsev, 2019). Thus, cryptographic-based evidences such as the ones used by Oraclize,1 or trusted hardware-based evidences such as the ones used by the Town Crier system that leverages Intel SGX (Zhang, Cecchetti, Croman, Juels, & Shi, 2016) are used as part of a number of oracle-based systems. These evidences are not only insufficient to ensure that the data is tamper-proof, they are impractical in many real-world applications where the digital data is not available or human involvement is required.

Oracles could display ill-intentioned behaviors, or unable to perform their tasks due to lack of capacity and being selfish by failing to report their real available resources (Lo, Xu, Staples, & Yao, 2020). Thus, placing a reliable mechanism to select the right oracles plays a significant role in a blockchain network’s success. There are several proposals for organizing one or more oracles as a group with trustworthy mechanisms, specifically designed for computer hardware and software (Berryhill and Veneris, 2019, Goel et al., 2020). However, these methods are not applicable when human intervention is involved or when the original data source is malicious. Moreover, these proposals sought to organize one or more oracles with enhanced security features or incentive mechanisms (Khosravifar, Bentahar, Moazin, & Thiran, 2010). To the best of our knowledge, there is no smart mechanism to promote how to select the most rewarding oracles among the existing ones in a market of oracles that might act selfishly to gain optimal profit.

In this paper, we utilize a Bayesian multi-armed bandit to learn the most rewarding oracles from the two perspectives of reliability and cost efficiency, to perform specific tasks within a blockchain. Multi-armed bandit is a reinforcement learning method that assumes the player does not know how much it will earn each time playing a particular slot machine, but the player has a distribution of belief, which could be wrong. The only way the player learns who has the highest expected reward is to try all machines, even those that do not appear to be the best. While trying these machines, the player may be earning lower rewards. The ultimate goal is to balance what we earn against what we learn (to improve future decisions) to maximize the expected sum of rewards. In our case, oracles are considered to be slot machines and blockchain beneficiaries are players who try to recruit the best oracles. Reinforcement learning methods have been applied in many real-world applications (Alagha et al., 2022, Rjoub et al., 2021, Rjoub, Wahab, et al., 2022, Sami et al., 2022, Sami, Mourad, et al., 2021, Sami, Otrok, et al., 2021) and their employment within blockchain has great advantages including high accuracy, ability to learn with few or no historical record, and low computational resources consumption (Sutton & Barto, 2018). To the best of our knowledge, these methods have not been applied in the field of blockchain yet, and even though it would be very interesting and novel, serious challenges in design and implementation within current platforms arise.

Theoretical and practical challenges: The issue of selecting the most rewarding oracle is a decision-making problem that should capture the tensity between exploration of new oracles and exploitation of the good and well-known ones. For simple and low number of choices, dynamic programming can compute the optimal solution. However, it is very computationally inefficient in the blockchain environment with the growing number of oracles working for blockchains. There is a need for an algorithm that runs quickly with a very minimal computation surcharge. The reason is that this algorithm has to be running by all blockchain validators (i.e., miners) acting within the network. Furthermore, current solutions of multi-armed bandit assume that the player retains little information about the past, or switch between exploration and exploitation either randomly or after a fixed number of trials. These solutions are not practical for our problem, since oracles could be run and managed by intelligent agents that can change their behavior anytime. Another challenge of utilizing current solutions is that our decision-making procedure should be based not only on the oracles’ performance, but also on their cost of performing the task considering applications’ limited budgets. There could be some reliable and high performance oracles that are expensive, but current solutions would always select them based on their past performance records. We assume a fixed cost for each oracle, and consider the oracles reputation and cost of other oracles in the market could change the behavior of each individual oracle.

To overcome the aforementioned challenges, we formulate a Bayesian cost-dependent reputation model to learn the behavior of oracles and utilize knowledge gradient algorithm which guides the learning process based on the marginal value of information. Using a Bayesian model for blockchain is complex, since the algorithm has to produce the same results in every course of experiment. This is because all the validators should verify the results and it only happens if all of them come up with the same results while running the algorithm. This adds further complexity since all the Bayesian reinforcement learning methods include randomness and use random variables. At last, the current platforms of blockchains and smart contracts are very limited, for example no floating number can be defined within blockchain, or limited number of variables can be defined for Ethereum. This paper discusses how the proposed model and mechanism tackles and solves these issues by formalizing the oracles’ performance optimization as a Bayesian bandit problem. Our algorithmic model defines a distribution over oracles with different reputations (representing their reliability and costs) to be used by blockchain participants to choose best performing oracles on future requests.

Contributions: This paper contributes as follows:

  • 1.

    Formulating a new model using a Bayesian cost-dependent reputation model (BCRM) and knowledge gradient (KG) to find the most rewarding oracles. BCRM captures the behavior of the oracles elegantly, and KG unfolds the exploration/ exploitation dilemma in multi-armed bandit with very low computational cost and high accuracy.

  • 2.

    Proposing a framework to show how to employ the model within a blockchain where all the validators need to achieve a consensus. This framework incentivizes oracles to continuously act honestly and provide a fair balance of quality and price with minimal possibility of acting maliciously.

  • 3.

    Adapting a reinforcement learning algorithm for blockchain environment with limited computational resources and capabilities (e.g., there is no floating number in Ethereum). Designing and implementing a reinforcement learning solution for the oracle selection problem is an objective yet to be achieved.

We simulated and implemented our proposed model using Python on Google Colab and Solidity on Ethereum. The implementation of BLOR deals with many challenges raised by the complexity of machine learning and limitations of blockchain and Ethereum, such as floating numbers, randomness and advanced mathematical numbers that are not supported in blockchain. Since there is no real-world data on oracles working for blockchains, we had to simulate the behavior of 100 oracles during 1000 observations to assess the performance of our model and compare it with other comparative algorithms.

The reminder of this paper contains the following sections: Section 2 explains the trust paradox of oracles and blockchains to motivate the problem statement. Section 3 discusses the related work. Section 4 presents BLOR as our proposed model and framework and provides an illustrative example to show how the model works. Section 5 provides a case study in which BLOR is applied. Experimental details and results are covered in Section 6. Lastly, the conclusion is drawn in Section 7.

Section snippets

Motivational scenario: Trust paradox of oracles and blockchains

Many blockchain platforms have been experiencing the oracle idea since the beginning of Ethereum, but the oracle dilemma continues unsolved at a large scale. The most challenging part is that majority of oracles require a level of trust, which directly opposes the trustless blockchains’ nature. The main complication of using oracles is trusting them as outside sources of information. The trust issue connected with oracles is referred to as the oracle problem.

Fig. 1 presents the motivating

Background and related work

The literature review is summarized from three different areas: blockchain, multi-armed bandit, and crowdsensing. As the blockchain oracle selection is somehow neglected in the literature, we were not able to find a proper related work and compare different methods of a third-party selection in a blockchain environment. Therefore, the most similar approach, that is “worker selection in a blockchain-based crowdsensing” is reviewed in this section.

BLOR: A Markovian multi-armed bandit-based solution

The main concern of a blockchain-based system, which requires obtaining data from the outside world, is how to maximize total rewards from various oracles in an uncertain setting through trial and observation. BLOR provides an optimal solution using Bayesian theorem and reinforcement learning techniques. In the process of BLOR’s sequential decision to choose a proper, reliable, and cost-efficient oracle, two components have to be considered:

  • 1.

    Learning: BLOR utilizes observations to update its

A case study of cloudchain (cloud services trading over blockchain)

The aim of the Cloudchain case study is to present how BLOR can offer a unique smart model for employment of oracles and transform the way cloud services are delivered. Cloudchain (Taghavi, Bentahar, Otrok, & Bakhtiyari, 2018) is a blockchain-based platform designed to allow cloud providers to interact, co-operate and compete through outsourcing their pending or unmet computing demands.

With the help of smart contracts, Cloudchain is able to provide higher transparency, visibility, and reliance

Experimental results

Because there is no available dataset about blockchains’ oracles, in order to evaluate the performance of BLOR, we simulated 100 oracles operating within a blockchain in 1000 observations. We implemented and experimented with BLOR using Python on Google Colab and the Solidity language on Ethereum, the code is publicly available on Github.3 Because a bandit is an online learner, it needs a record of the oracles history prior to the current time step we are

Conclusion

Oracles gather information from the real world and transport it onto the blockchain for further use. Hence, the use of oracles is imperative to promote a widespread adoption of smart contracts. Yet, research about oracles and their practical application is very immature. This paper tried to shed some light by addressing two major challenges in this area. The first challenge is about employing a smart mechanism in place to identify the trustless and cost-efficient oracles. This challenge was

CRediT authorship contribution statement

Mona Taghavi: Conceptualization, Formal analysis, Investigation, Methodology, Software, Writing – original draft. Jamal Bentahar: Conceptualization, Formal analysis, Investigation, Methodology, Validation, Funding acquisition, Writing – review & editing. Hadi Otrok: Conceptualization, Investigation, Methodology, Validation, Writing – review & editing. Kaveh Bakhtiyari: Conceptualization, Investigation, Formal analysis, Software, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

M. Taghavi was supported by NSERC Vanier, and J. Bentahar is supported by NSERC, FRQNT, and MITACS.

References (54)

  • SamiH. et al.

    Graph convolutional recurrent networks for reward shaping in reinforcement learning

    Inf. Sci.

    (2022)
  • WahabO.A. et al.

    Federated against the cold: A trust-based federated learning approach to counter the cold start problem in recommendation systems

    Information Sciences

    (2022)
  • AuerP. et al.

    Finite-time analysis of the multiarmed bandit problem

    Machine Learning

    (2002)
  • BentaharJ. et al.

    Quantitative group trust: A two-stage verification approach

  • BerryhillR. et al.

    ASTRAEA: A decentralized blockchain oracle

    (2019)
  • BhatiaG.K. et al.

    WorkerRep: Immutable reputation system for crowdsourcing platform based on blockchain

    (2020)
  • ButerinV. et al.

    Casper the friendly finality gadget

    (2017)
  • ChatzopoulosD. et al.

    Flopcoin: A cryptocurrency for computation offloading

    IEEE Transactions on Mobile Computing

    (2017)
  • ChatzopoulosD. et al.

    Privacy preserving and cost optimal mobile crowdsensing using smart contracts on blockchain

  • DingY. et al.

    Blockchain-based credit and arbitration mechanisms in crowdsourcing

  • DrawelN. et al.

    Formalizing group and propagated trust in multi-agent systems

  • DrawelN. et al.

    Formal verification of group and propagated trust in multi-agent systems

    Autonomous Agents and Multi-Agent Systems

    (2022)
  • Ellis, S., Juels, A., & Nazarov, S. (2017). Chainlink a decentralized oracle network. White paper,...
  • FrazierP. et al.

    The knowledge-gradient policy for correlated normal beliefs

    INFORMS journal on Computing

    (2009)
  • GaoS. et al.

    TrustWorker: A trustworthy and privacy-preserving worker selection scheme for blockchain-based crowdsensing

    IEEE Transactions on Services Computing

    (2021)
  • GoelN. et al.

    Infochain: A decentralized, trustless and transparent oracle on blockchain

  • Hull, R., Batra, V. S., Chen, Y.-M., Deutsch, A., Heath III, F. F. T., & Vianu, V. (2016). Towards a shared ledger...
  • Cited by (15)

    • Reinforcement learning with smart contracts on blockchains

      2023, Future Generation Computer Systems
    View all citing articles on Scopus
    View full text