A reinforcement learning model for the reliability of blockchain oracles
Introduction
Blockchain technology has the ability to cut the role of middlemen by enabling self-enforcing digital contracts (called smart contracts), whose execution does not require any human involvement in a safe, secure, and immutable way. The emergence of the blockchain as a revolutionary technology has been compared to the Internet, and it has predicted that it will erode power from centralized authorities. With its deployment as a service (Lu, Xu, Liu, Weber, Zhu, & Zhang, 2019) and its integration with IoT (Baygin et al., 2022, Ho et al., 2021), blockchain has a promising approach in supporting business collaborations by ensuring transparency to all the stakeholders if conflicts arise (Hull et al., 2016). However, the integration of blockchain with external data is one of the major obstacles preventing widespread adoption. Imagine that two persons place a bet on who wins a football match and deposit their funds in a smart contract. Based on the results of the game, the smart contract should release the funds to the winner. However, a smart contract does not have access to the data out of its network and should ask a trusted party to learn who won the match.
In blockchain, the term oracle refers to an entity that can access external data without compromising the integrity of the blockchain. Oracles are assumed to be third-party agents that are trustworthy and can communicate with the outside world, and fetch the data into the blockchain Xu et al. (2016). Oracles are also able to connect the blockchain to external databases. This way, costly computations can be carried out outside of the blockchain. Oracles ensure the integrity of the retrieved data by providing some evidences (Kochovski, Gec, Stankovski, Bajec, & Drobintsev, 2019). Thus, cryptographic-based evidences such as the ones used by Oraclize,1 or trusted hardware-based evidences such as the ones used by the Town Crier system that leverages Intel SGX (Zhang, Cecchetti, Croman, Juels, & Shi, 2016) are used as part of a number of oracle-based systems. These evidences are not only insufficient to ensure that the data is tamper-proof, they are impractical in many real-world applications where the digital data is not available or human involvement is required.
Oracles could display ill-intentioned behaviors, or unable to perform their tasks due to lack of capacity and being selfish by failing to report their real available resources (Lo, Xu, Staples, & Yao, 2020). Thus, placing a reliable mechanism to select the right oracles plays a significant role in a blockchain network’s success. There are several proposals for organizing one or more oracles as a group with trustworthy mechanisms, specifically designed for computer hardware and software (Berryhill and Veneris, 2019, Goel et al., 2020). However, these methods are not applicable when human intervention is involved or when the original data source is malicious. Moreover, these proposals sought to organize one or more oracles with enhanced security features or incentive mechanisms (Khosravifar, Bentahar, Moazin, & Thiran, 2010). To the best of our knowledge, there is no smart mechanism to promote how to select the most rewarding oracles among the existing ones in a market of oracles that might act selfishly to gain optimal profit.
In this paper, we utilize a Bayesian multi-armed bandit to learn the most rewarding oracles from the two perspectives of reliability and cost efficiency, to perform specific tasks within a blockchain. Multi-armed bandit is a reinforcement learning method that assumes the player does not know how much it will earn each time playing a particular slot machine, but the player has a distribution of belief, which could be wrong. The only way the player learns who has the highest expected reward is to try all machines, even those that do not appear to be the best. While trying these machines, the player may be earning lower rewards. The ultimate goal is to balance what we earn against what we learn (to improve future decisions) to maximize the expected sum of rewards. In our case, oracles are considered to be slot machines and blockchain beneficiaries are players who try to recruit the best oracles. Reinforcement learning methods have been applied in many real-world applications (Alagha et al., 2022, Rjoub et al., 2021, Rjoub, Wahab, et al., 2022, Sami et al., 2022, Sami, Mourad, et al., 2021, Sami, Otrok, et al., 2021) and their employment within blockchain has great advantages including high accuracy, ability to learn with few or no historical record, and low computational resources consumption (Sutton & Barto, 2018). To the best of our knowledge, these methods have not been applied in the field of blockchain yet, and even though it would be very interesting and novel, serious challenges in design and implementation within current platforms arise.
Theoretical and practical challenges: The issue of selecting the most rewarding oracle is a decision-making problem that should capture the tensity between exploration of new oracles and exploitation of the good and well-known ones. For simple and low number of choices, dynamic programming can compute the optimal solution. However, it is very computationally inefficient in the blockchain environment with the growing number of oracles working for blockchains. There is a need for an algorithm that runs quickly with a very minimal computation surcharge. The reason is that this algorithm has to be running by all blockchain validators (i.e., miners) acting within the network. Furthermore, current solutions of multi-armed bandit assume that the player retains little information about the past, or switch between exploration and exploitation either randomly or after a fixed number of trials. These solutions are not practical for our problem, since oracles could be run and managed by intelligent agents that can change their behavior anytime. Another challenge of utilizing current solutions is that our decision-making procedure should be based not only on the oracles’ performance, but also on their cost of performing the task considering applications’ limited budgets. There could be some reliable and high performance oracles that are expensive, but current solutions would always select them based on their past performance records. We assume a fixed cost for each oracle, and consider the oracles reputation and cost of other oracles in the market could change the behavior of each individual oracle.
To overcome the aforementioned challenges, we formulate a Bayesian cost-dependent reputation model to learn the behavior of oracles and utilize knowledge gradient algorithm which guides the learning process based on the marginal value of information. Using a Bayesian model for blockchain is complex, since the algorithm has to produce the same results in every course of experiment. This is because all the validators should verify the results and it only happens if all of them come up with the same results while running the algorithm. This adds further complexity since all the Bayesian reinforcement learning methods include randomness and use random variables. At last, the current platforms of blockchains and smart contracts are very limited, for example no floating number can be defined within blockchain, or limited number of variables can be defined for Ethereum. This paper discusses how the proposed model and mechanism tackles and solves these issues by formalizing the oracles’ performance optimization as a Bayesian bandit problem. Our algorithmic model defines a distribution over oracles with different reputations (representing their reliability and costs) to be used by blockchain participants to choose best performing oracles on future requests.
Contributions: This paper contributes as follows:
- 1.
Formulating a new model using a Bayesian cost-dependent reputation model (BCRM) and knowledge gradient (KG) to find the most rewarding oracles. BCRM captures the behavior of the oracles elegantly, and KG unfolds the exploration/ exploitation dilemma in multi-armed bandit with very low computational cost and high accuracy.
- 2.
Proposing a framework to show how to employ the model within a blockchain where all the validators need to achieve a consensus. This framework incentivizes oracles to continuously act honestly and provide a fair balance of quality and price with minimal possibility of acting maliciously.
- 3.
Adapting a reinforcement learning algorithm for blockchain environment with limited computational resources and capabilities (e.g., there is no floating number in Ethereum). Designing and implementing a reinforcement learning solution for the oracle selection problem is an objective yet to be achieved.
We simulated and implemented our proposed model using Python on Google Colab and Solidity on Ethereum. The implementation of BLOR deals with many challenges raised by the complexity of machine learning and limitations of blockchain and Ethereum, such as floating numbers, randomness and advanced mathematical numbers that are not supported in blockchain. Since there is no real-world data on oracles working for blockchains, we had to simulate the behavior of 100 oracles during 1000 observations to assess the performance of our model and compare it with other comparative algorithms.
The reminder of this paper contains the following sections: Section 2 explains the trust paradox of oracles and blockchains to motivate the problem statement. Section 3 discusses the related work. Section 4 presents BLOR as our proposed model and framework and provides an illustrative example to show how the model works. Section 5 provides a case study in which BLOR is applied. Experimental details and results are covered in Section 6. Lastly, the conclusion is drawn in Section 7.
Section snippets
Motivational scenario: Trust paradox of oracles and blockchains
Many blockchain platforms have been experiencing the oracle idea since the beginning of Ethereum, but the oracle dilemma continues unsolved at a large scale. The most challenging part is that majority of oracles require a level of trust, which directly opposes the trustless blockchains’ nature. The main complication of using oracles is trusting them as outside sources of information. The trust issue connected with oracles is referred to as the oracle problem.
Fig. 1 presents the motivating
Background and related work
The literature review is summarized from three different areas: blockchain, multi-armed bandit, and crowdsensing. As the blockchain oracle selection is somehow neglected in the literature, we were not able to find a proper related work and compare different methods of a third-party selection in a blockchain environment. Therefore, the most similar approach, that is “worker selection in a blockchain-based crowdsensing” is reviewed in this section.
BLOR: A Markovian multi-armed bandit-based solution
The main concern of a blockchain-based system, which requires obtaining data from the outside world, is how to maximize total rewards from various oracles in an uncertain setting through trial and observation. BLOR provides an optimal solution using Bayesian theorem and reinforcement learning techniques. In the process of BLOR’s sequential decision to choose a proper, reliable, and cost-efficient oracle, two components have to be considered:
- 1.
Learning: BLOR utilizes observations to update its
A case study of cloudchain (cloud services trading over blockchain)
The aim of the Cloudchain case study is to present how BLOR can offer a unique smart model for employment of oracles and transform the way cloud services are delivered. Cloudchain (Taghavi, Bentahar, Otrok, & Bakhtiyari, 2018) is a blockchain-based platform designed to allow cloud providers to interact, co-operate and compete through outsourcing their pending or unmet computing demands.
With the help of smart contracts, Cloudchain is able to provide higher transparency, visibility, and reliance
Experimental results
Because there is no available dataset about blockchains’ oracles, in order to evaluate the performance of BLOR, we simulated 100 oracles operating within a blockchain in 1000 observations. We implemented and experimented with BLOR using Python on Google Colab and the Solidity language on Ethereum, the code is publicly available on Github.3 Because a bandit is an online learner, it needs a record of the oracles history prior to the current time step we are
Conclusion
Oracles gather information from the real world and transport it onto the blockchain for further use. Hence, the use of oracles is imperative to promote a widespread adoption of smart contracts. Yet, research about oracles and their practical application is very immature. This paper tried to shed some light by addressing two major challenges in this area. The first challenge is about employing a smart mechanism in place to identify the trustless and cost-efficient oracles. This challenge was
CRediT authorship contribution statement
Mona Taghavi: Conceptualization, Formal analysis, Investigation, Methodology, Software, Writing – original draft. Jamal Bentahar: Conceptualization, Formal analysis, Investigation, Methodology, Validation, Funding acquisition, Writing – review & editing. Hadi Otrok: Conceptualization, Investigation, Methodology, Validation, Writing – review & editing. Kaveh Bakhtiyari: Conceptualization, Investigation, Formal analysis, Software, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
M. Taghavi was supported by NSERC Vanier, and J. Bentahar is supported by NSERC, FRQNT, and MITACS.
References (54)
- et al.
Target localization using multi-agent deep reinforcement learning with proximal policy optimization
Future Gener. Comput. Syst.
(2022) - et al.
A blockchain-based approach to smart cargo transportation using UHF RFID
Expert Systems with Applications
(2022) A taxonomy of blockchain consensus protocols: A survey and classification framework
Expert Systems with Applications
(2021)- et al.
Specification and automatic verification of trust-based multi-agent systems
Future Generation Computer Systems
(2020) - et al.
A blockchain-based system to enhance aircraft parts traceability and trackability for inventory management
Expert Systems with Applications
(2021) - et al.
SenseChain: A blockchain-based crowdsensing framework for multiple requesters and multiple workers
Future Generation Computer Systems
(2020) - et al.
Trust management in a blockchain based fog computing platform with trustless smart oracles
Future Generation Computer Systems
(2019) - et al.
Prioritized experience replay based on multi-armed bandit
Expert Systems with Applications
(2022) - et al.
Reliability analysis for blockchain oracles
Computers and Electrical Engineering
(2020) - et al.
uBaaS: A unified blockchain as a service platform
Future Generation Computer Systems
(2019)
Graph convolutional recurrent networks for reward shaping in reinforcement learning
Inf. Sci.
Federated against the cold: A trust-based federated learning approach to counter the cold start problem in recommendation systems
Information Sciences
Finite-time analysis of the multiarmed bandit problem
Machine Learning
Quantitative group trust: A two-stage verification approach
ASTRAEA: A decentralized blockchain oracle
WorkerRep: Immutable reputation system for crowdsourcing platform based on blockchain
Casper the friendly finality gadget
Flopcoin: A cryptocurrency for computation offloading
IEEE Transactions on Mobile Computing
Privacy preserving and cost optimal mobile crowdsensing using smart contracts on blockchain
Blockchain-based credit and arbitration mechanisms in crowdsourcing
Formalizing group and propagated trust in multi-agent systems
Formal verification of group and propagated trust in multi-agent systems
Autonomous Agents and Multi-Agent Systems
The knowledge-gradient policy for correlated normal beliefs
INFORMS journal on Computing
TrustWorker: A trustworthy and privacy-preserving worker selection scheme for blockchain-based crowdsensing
IEEE Transactions on Services Computing
Infochain: A decentralized, trustless and transparent oracle on blockchain
Cited by (15)
AI-enhanced blockchain technology: A review of advancements and opportunities
2024, Journal of Network and Computer ApplicationsReinforcement learning with smart contracts on blockchains
2023, Future Generation Computer SystemsEfficient Resource Utilization in IoT and Cloud Computing
2023, Information (Switzerland)