A reinforcement learning model for the reliability of blockchain oracles

doi:10.1016/j.eswa.2022.119160

Expert Systems with Applications

Volume 214, 15 March 2023, 119160

https://doi.org/10.1016/j.eswa.2022.119160 Get rights and content

Highlights

•
We propose BLOR, a Bayesian Bandit Learning model for Oracles Reliability.
•
BLOR identifies trustless and cost-efficient oracles.
•
BLOR integrates reinforcement learning to a Bayesian cost-driven reputation model.
•
BLOR is implemented in Ethereum and benchmarked against several algorithms.
•
BLOR shows a steady performance even in low to high noisy dynamic environments.

Abstract

Smart contracts struggle with the major limitation of operating on data that is solely residing on the blockchain network. The need of recruiting third parties, known as oracles, to assist smart contracts has been recognized with the emergence of blockchain technology. Oracles could be deviant and commit ill-intentioned behaviors, or be selfish and hide their actual available resources to gain optimal profit. Current research proposals employ oracles as trusted entities with no robust assessment mechanism, which entails a risk of turning them into centralized points of failure. The need for an effective method to select the most economical and rewarding oracles that are self-interested and act independently is somehow neglected. Thus, this paper proposes a Bayesian Bandit Learning Oracles Reliability (BLOR) mechanism to identify trustless and cost-efficient oracles. Within BLOR, we learn the behavior of oracles by formulating a Bayesian cost-dependent reputation model and utilize reinforcement learning (knowledge gradient algorithm) to guide the learning process. BLOR enables all the blockchain validators to verify the obtained results while running the algorithm at the same time by dealing with the randomness issue within the limited blockchain structure. We implement and experiment with BLOR using Python and the Solidity language on Ethereum. BLOR is benchmarked against several models where it proved to be highly efficient in selecting the most reliable and economical oracles with a fair balance.

Introduction

Blockchain technology has the ability to cut the role of middlemen by enabling self-enforcing digital contracts (called smart contracts), whose execution does not require any human involvement in a safe, secure, and immutable way. The emergence of the blockchain as a revolutionary technology has been compared to the Internet, and it has predicted that it will erode power from centralized authorities. With its deployment as a service (Lu, Xu, Liu, Weber, Zhu, & Zhang, 2019) and its integration with IoT (Baygin et al., 2022, Ho et al., 2021), blockchain has a promising approach in supporting business collaborations by ensuring transparency to all the stakeholders if conflicts arise (Hull et al., 2016). However, the integration of blockchain with external data is one of the major obstacles preventing widespread adoption. Imagine that two persons place a bet on who wins a football match and deposit their funds in a smart contract. Based on the results of the game, the smart contract should release the funds to the winner. However, a smart contract does not have access to the data out of its network and should ask a trusted party to learn who won the match.

In blockchain, the term oracle refers to an entity that can access external data without compromising the integrity of the blockchain. Oracles are assumed to be third-party agents that are trustworthy and can communicate with the outside world, and fetch the data into the blockchain Xu et al. (2016). Oracles are also able to connect the blockchain to external databases. This way, costly computations can be carried out outside of the blockchain. Oracles ensure the integrity of the retrieved data by providing some evidences (Kochovski, Gec, Stankovski, Bajec, & Drobintsev, 2019). Thus, cryptographic-based evidences such as the ones used by Oraclize,¹ or trusted hardware-based evidences such as the ones used by the Town Crier system that leverages Intel SGX (Zhang, Cecchetti, Croman, Juels, & Shi, 2016) are used as part of a number of oracle-based systems. These evidences are not only insufficient to ensure that the data is tamper-proof, they are impractical in many real-world applications where the digital data is not available or human involvement is required.

Oracles could display ill-intentioned behaviors, or unable to perform their tasks due to lack of capacity and being selfish by failing to report their real available resources (Lo, Xu, Staples, & Yao, 2020). Thus, placing a reliable mechanism to select the right oracles plays a significant role in a blockchain network’s success. There are several proposals for organizing one or more oracles as a group with trustworthy mechanisms, specifically designed for computer hardware and software (Berryhill and Veneris, 2019, Goel et al., 2020). However, these methods are not applicable when human intervention is involved or when the original data source is malicious. Moreover, these proposals sought to organize one or more oracles with enhanced security features or incentive mechanisms (Khosravifar, Bentahar, Moazin, & Thiran, 2010). To the best of our knowledge, there is no smart mechanism to promote how to select the most rewarding oracles among the existing ones in a market of oracles that might act selfishly to gain optimal profit.

In this paper, we utilize a Bayesian multi-armed bandit to learn the most rewarding oracles from the two perspectives of reliability and cost efficiency, to perform specific tasks within a blockchain. Multi-armed bandit is a reinforcement learning method that assumes the player does not know how much it will earn each time playing a particular slot machine, but the player has a distribution of belief, which could be wrong. The only way the player learns who has the highest expected reward is to try all machines, even those that do not appear to be the best. While trying these machines, the player may be earning lower rewards. The ultimate goal is to balance what we earn against what we learn (to improve future decisions) to maximize the expected sum of rewards. In our case, oracles are considered to be slot machines and blockchain beneficiaries are players who try to recruit the best oracles. Reinforcement learning methods have been applied in many real-world applications (Alagha et al., 2022, Rjoub et al., 2021, Rjoub, Wahab, et al., 2022, Sami et al., 2022, Sami, Mourad, et al., 2021, Sami, Otrok, et al., 2021) and their employment within blockchain has great advantages including high accuracy, ability to learn with few or no historical record, and low computational resources consumption (Sutton & Barto, 2018). To the best of our knowledge, these methods have not been applied in the field of blockchain yet, and even though it would be very interesting and novel, serious challenges in design and implementation within current platforms arise.

Theoretical and practical challenges: The issue of selecting the most rewarding oracle is a decision-making problem that should capture the tensity between exploration of new oracles and exploitation of the good and well-known ones. For simple and low number of choices, dynamic programming can compute the optimal solution. However, it is very computationally inefficient in the blockchain environment with the growing number of oracles working for blockchains. There is a need for an algorithm that runs quickly with a very minimal computation surcharge. The reason is that this algorithm has to be running by all blockchain validators (i.e., miners) acting within the network. Furthermore, current solutions of multi-armed bandit assume that the player retains little information about the past, or switch between exploration and exploitation either randomly or after a fixed number of trials. These solutions are not practical for our problem, since oracles could be run and managed by intelligent agents that can change their behavior anytime. Another challenge of utilizing current solutions is that our decision-making procedure should be based not only on the oracles’ performance, but also on their cost of performing the task considering applications’ limited budgets. There could be some reliable and high performance oracles that are expensive, but current solutions would always select them based on their past performance records. We assume a fixed cost for each oracle, and consider the oracles reputation and cost of other oracles in the market could change the behavior of each individual oracle.

To overcome the aforementioned challenges, we formulate a Bayesian cost-dependent reputation model to learn the behavior of oracles and utilize knowledge gradient algorithm which guides the learning process based on the marginal value of information. Using a Bayesian model for blockchain is complex, since the algorithm has to produce the same results in every course of experiment. This is because all the validators should verify the results and it only happens if all of them come up with the same results while running the algorithm. This adds further complexity since all the Bayesian reinforcement learning methods include randomness and use random variables. At last, the current platforms of blockchains and smart contracts are very limited, for example no floating number can be defined within blockchain, or limited number of variables can be defined for Ethereum. This paper discusses how the proposed model and mechanism tackles and solves these issues by formalizing the oracles’ performance optimization as a Bayesian bandit problem. Our algorithmic model defines a distribution over oracles with different reputations (representing their reliability and costs) to be used by blockchain participants to choose best performing oracles on future requests.

Contributions: This paper contributes as follows:

1.
Formulating a new model using a Bayesian cost-dependent reputation model (BCRM) and knowledge gradient (KG) to find the most rewarding oracles. BCRM captures the behavior of the oracles elegantly, and KG unfolds the exploration/ exploitation dilemma in multi-armed bandit with very low computational cost and high accuracy.
2.
Proposing a framework to show how to employ the model within a blockchain where all the validators need to achieve a consensus. This framework incentivizes oracles to continuously act honestly and provide a fair balance of quality and price with minimal possibility of acting maliciously.
3.
Adapting a reinforcement learning algorithm for blockchain environment with limited computational resources and capabilities (e.g., there is no floating number in Ethereum). Designing and implementing a reinforcement learning solution for the oracle selection problem is an objective yet to be achieved.

We simulated and implemented our proposed model using Python on Google Colab and Solidity on Ethereum. The implementation of BLOR deals with many challenges raised by the complexity of machine learning and limitations of blockchain and Ethereum, such as floating numbers, randomness and advanced mathematical numbers that are not supported in blockchain. Since there is no real-world data on oracles working for blockchains, we had to simulate the behavior of 100 oracles during 1000 observations to assess the performance of our model and compare it with other comparative algorithms.

The reminder of this paper contains the following sections: Section 2 explains the trust paradox of oracles and blockchains to motivate the problem statement. Section 3 discusses the related work. Section 4 presents BLOR as our proposed model and framework and provides an illustrative example to show how the model works. Section 5 provides a case study in which BLOR is applied. Experimental details and results are covered in Section 6. Lastly, the conclusion is drawn in Section 7.

Section snippets

Motivational scenario: Trust paradox of oracles and blockchains

Many blockchain platforms have been experiencing the oracle idea since the beginning of Ethereum, but the oracle dilemma continues unsolved at a large scale. The most challenging part is that majority of oracles require a level of trust, which directly opposes the trustless blockchains’ nature. The main complication of using oracles is trusting them as outside sources of information. The trust issue connected with oracles is referred to as the oracle problem.

Fig. 1 presents the motivating

Background and related work

The literature review is summarized from three different areas: blockchain, multi-armed bandit, and crowdsensing. As the blockchain oracle selection is somehow neglected in the literature, we were not able to find a proper related work and compare different methods of a third-party selection in a blockchain environment. Therefore, the most similar approach, that is “worker selection in a blockchain-based crowdsensing” is reviewed in this section.

BLOR: A Markovian multi-armed bandit-based solution

The main concern of a blockchain-based system, which requires obtaining data from the outside world, is how to maximize total rewards from various oracles in an uncertain setting through trial and observation. BLOR provides an optimal solution using Bayesian theorem and reinforcement learning techniques. In the process of BLOR’s sequential decision to choose a proper, reliable, and cost-efficient oracle, two components have to be considered:

1.
Learning: BLOR utilizes observations to update its

A case study of cloudchain (cloud services trading over blockchain)

The aim of the Cloudchain case study is to present how BLOR can offer a unique smart model for employment of oracles and transform the way cloud services are delivered. Cloudchain (Taghavi, Bentahar, Otrok, & Bakhtiyari, 2018) is a blockchain-based platform designed to allow cloud providers to interact, co-operate and compete through outsourcing their pending or unmet computing demands.

With the help of smart contracts, Cloudchain is able to provide higher transparency, visibility, and reliance

Experimental results

Because there is no available dataset about blockchains’ oracles, in order to evaluate the performance of BLOR, we simulated 100 oracles operating within a blockchain in 1000 observations. We implemented and experimented with BLOR using Python on Google Colab and the Solidity language on Ethereum, the code is publicly available on Github.³ Because a bandit is an online learner, it needs a record of the oracles history prior to the current time step we are

Conclusion

Oracles gather information from the real world and transport it onto the blockchain for further use. Hence, the use of oracles is imperative to promote a widespread adoption of smart contracts. Yet, research about oracles and their practical application is very immature. This paper tried to shed some light by addressing two major challenges in this area. The first challenge is about employing a smart mechanism in place to identify the trustless and cost-efficient oracles. This challenge was

CRediT authorship contribution statement

Mona Taghavi: Conceptualization, Formal analysis, Investigation, Methodology, Software, Writing – original draft. Jamal Bentahar: Conceptualization, Formal analysis, Investigation, Methodology, Validation, Funding acquisition, Writing – review & editing. Hadi Otrok: Conceptualization, Investigation, Methodology, Validation, Writing – review & editing. Kaveh Bakhtiyari: Conceptualization, Investigation, Formal analysis, Software, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

M. Taghavi was supported by NSERC Vanier, and J. Bentahar is supported by NSERC, FRQNT, and MITACS.

References (54)

AlaghaA. et al.
Target localization using multi-agent deep reinforcement learning with proximal policy optimization
Future Gener. Comput. Syst.
(2022)
BayginM. et al.
A blockchain-based approach to smart cargo transportation using UHF RFID
Expert Systems with Applications
(2022)
BouragaS.
A taxonomy of blockchain consensus protocols: A survey and classification framework
Expert Systems with Applications
(2021)
DrawelN. et al.
Specification and automatic verification of trust-based multi-agent systems
Future Generation Computer Systems
(2020)
HoG. et al.
A blockchain-based system to enhance aircraft parts traceability and trackability for inventory management
Expert Systems with Applications
(2021)
KadadhaM. et al.
SenseChain: A blockchain-based crowdsensing framework for multiple requesters and multiple workers
Future Generation Computer Systems
(2020)
KochovskiP. et al.
Trust management in a blockchain based fog computing platform with trustless smart oracles
Future Generation Computer Systems
(2019)
LiuX. et al.
Prioritized experience replay based on multi-armed bandit
Expert Systems with Applications
(2022)
LoS.K. et al.
Reliability analysis for blockchain oracles
Computers and Electrical Engineering
(2020)
LuQ. et al.
uBaaS: A unified blockchain as a service platform
Future Generation Computer Systems
(2019)

SamiH. et al.

Graph convolutional recurrent networks for reward shaping in reinforcement learning

Inf. Sci.

(2022)

WahabO.A. et al.

Federated against the cold: A trust-based federated learning approach to counter the cold start problem in recommendation systems

Information Sciences

(2022)

AuerP. et al.

Finite-time analysis of the multiarmed bandit problem

Machine Learning

(2002)

BentaharJ. et al.

Quantitative group trust: A two-stage verification approach

BerryhillR. et al.

ASTRAEA: A decentralized blockchain oracle

(2019)

BhatiaG.K. et al.

WorkerRep: Immutable reputation system for crowdsourcing platform based on blockchain

(2020)

ButerinV. et al.

Casper the friendly finality gadget

(2017)

ChatzopoulosD. et al.

Flopcoin: A cryptocurrency for computation offloading

IEEE Transactions on Mobile Computing

(2017)

ChatzopoulosD. et al.

Privacy preserving and cost optimal mobile crowdsensing using smart contracts on blockchain

DingY. et al.

Blockchain-based credit and arbitration mechanisms in crowdsourcing

DrawelN. et al.

Formalizing group and propagated trust in multi-agent systems

DrawelN. et al.

Formal verification of group and propagated trust in multi-agent systems

Autonomous Agents and Multi-Agent Systems

(2022)

Ellis, S., Juels, A., & Nazarov, S. (2017). Chainlink a decentralized oracle network. White paper,...

FrazierP. et al.

The knowledge-gradient policy for correlated normal beliefs

INFORMS journal on Computing

(2009)

GaoS. et al.

TrustWorker: A trustworthy and privacy-preserving worker selection scheme for blockchain-based crowdsensing

IEEE Transactions on Services Computing

(2021)

GoelN. et al.

Infochain: A decentralized, trustless and transparent oracle on blockchain

Hull, R., Batra, V. S., Chen, Y.-M., Deutsch, A., Heath III, F. F. T., & Vianu, V. (2016). Towards a shared ledger...

Cited by (15)

AI-enhanced blockchain technology: A review of advancements and opportunities
2024, Journal of Network and Computer Applications
Blockchain technology has rapidly gained popularity, permeating various fields due to its inherent features of security, transparency, and decentralization. Blockchain-based applications, spanning from financial transactions to supply chain management, have revolutionized numerous industries. Concurrently, Artificial Intelligence (AI) techniques have emerged as a powerful tool for efficiently solving complex problems. The integration of AI into blockchain applications has shown promise in addressing key challenges such as security, consensus, scalability, and interoperability. While existing literature offers several surveys on the intersection of AI and blockchain, our work takes a distinct perspective by focusing on how AI solutions can enhance and optimize blockchain technology and its applications. Our goal is to provide a comprehensive literature overview of the methods that have been employed to improve blockchain technology through AI, encompassing machine learning, deep learning, natural language processing and reinforcement learning.
Our contribution highlights AI’s potential to enhance blockchain, improving efficiency, security, and reliability of blockchain-based applications. By exploring AI’s role in consensus, smart contracts, and data privacy, it advances theory and practical applications, fostering innovation across sectors for a more secure and efficient digital future.
Reinforcement learning with smart contracts on blockchains
2023, Future Generation Computer Systems
In recent years Machine Learning and Blockchain technologies have been at the spearhead of innovation, both in the research and application fields. Machine Learning is predominantly used to enable data knowledge extraction while Blockchain excels in providing a ‘public ledger’ upon which data are securely, consistently and irreversibly recorded. Machine Learning may use data stored on Blockchains and pursue to exploit distributed computing resources. On the other hand, Blockchain may exploit Machine Learning and capitalize user data and establish marketplaces for Machine Learning models. In this work we propose a combination of Machine Learning and in particular Reinforcement Learning (RL) and Imitation Learning (IL) with Blockchain. RL allows a software agent to interact with its environment and learn – via ‘trial and error’ techniques – based exclusively on its own activity, experiences and observations. The software agent will learn via an interactions’ reward/ penalize set of measures, immediately received from its own environment. Designing an interactions’ reward/penalize mechanism is challenging as designers need to draw focused techniques securing that agents’ immediate environment will consistently recognize and reward desirable agent behaviour and that the rewarding mechanism cannot be tapped, corrupted or circumvented. In this work, we have approached this via a coordinated collaboration of RL and IL. A Trainer Agent takes on the task of training Trainee agents using RL/IL via recording its own environmental behaviour in demonstration files. In this respect trainees may imitate trainers’ good practices and get effectively trained. This work proposes the concept of an expert trainer software agent (the Trainer Agent) who records its own behaviour in demonstration files and distributes these files via Blockchain to other (receiving) software agents (Trainee agents). Trainees’ training is applied using RL techniques (i.e. reward/ penalize) in conjunction with IL (based on demo files). Demo files are ‘stored’ on smart contract Blockchains, who in the end get to reward Trainer Agents; pro-rated according to the level with which the Trainer has assisted to the improvement of the Trainee agent models. The invariant Blockchain structure with its unmodifiable smart contracts’ nature secure the demo files and nurture credible all interactions among stakeholders involved. The developed application (dApp) fully automates the workflow of trading demonstration files and of training the Trainee agents.
Deep reinforcement learning for the computation offloading in MIMO-based Edge Computing
2023, Ad Hoc Networks
Multi-access Edge Computing (MEC) has recently emerged as a potential technology to serve the needs of mobile devices (MDs) in 5G and 6G cellular networks. By offloading tasks to high-performance servers installed at the edge of the wireless networks, resource-limited MDs can cope with the proliferation of the recent computationally-intensive applications. In this paper, we study the computation offloading problem in a massive multiple-input multiple-output (MIMO)-based MEC system where the base stations are equipped with a large number of antennas. Our objective is to minimize the power consumption and offloading delay at the MDs under the stochastic system environment. To this end, we introduce new formulation of the problem as a Markov Decision Process (MDP) and propose two Deep Reinforcement Learning (DRL) algorithms to learn the optimal offloading policy without any prior knowledge of the environment dynamics. First, a Deep Q-Network (DQN)-based algorithm to solve the curse of the state space explosion is defined. Then, a more general Proximal Policy Optimization (PPO)-based algorithm to solve the problem of discrete action space is introduced. Simulation results show that our DRL-based solutions outperform the state-of-the-art algorithms. Moreover, our PPO algorithm exhibits stable performance and efficient offloading results compared to the benchmarks DQN and Double DQN (DDQN) strategies.
Safeguarding the Truth of High-Value Price Oracle Task: A Dynamically Adjusted Truth Discovery Method
2024, arXiv
A Comprehensive Survey Integrating Scientometric Analysis and ML approaches for Data Protection
2024, Research Square
Efficient Resource Utilization in IoT and Cloud Computing
2023, Information (Switzerland)

View all citing articles on Scopus

View full text

A reinforcement learning model for the reliability of blockchain oracles

Highlights

Abstract

Introduction

Section snippets

Motivational scenario: Trust paradox of oracles and blockchains

Background and related work

BLOR: A Markovian multi-armed bandit-based solution

A case study of cloudchain (cloud services trading over blockchain)

Experimental results

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Future Gener. Comput. Syst.

Expert Systems with Applications

Expert Systems with Applications

Future Generation Computer Systems

Expert Systems with Applications

Future Generation Computer Systems

Future Generation Computer Systems

Expert Systems with Applications

Computers and Electrical Engineering

Future Generation Computer Systems

Inf. Sci.

Information Sciences

Finite-time analysis of the multiarmed bandit problem

Machine Learning

Quantitative group trust: A two-stage verification approach

ASTRAEA: A decentralized blockchain oracle

WorkerRep: Immutable reputation system for crowdsourcing platform based on blockchain

Casper the friendly finality gadget

Flopcoin: A cryptocurrency for computation offloading

IEEE Transactions on Mobile Computing

Privacy preserving and cost optimal mobile crowdsensing using smart contracts on blockchain

Blockchain-based credit and arbitration mechanisms in crowdsourcing

Formalizing group and propagated trust in multi-agent systems

Formal verification of group and propagated trust in multi-agent systems

Autonomous Agents and Multi-Agent Systems

The knowledge-gradient policy for correlated normal beliefs

INFORMS journal on Computing

TrustWorker: A trustworthy and privacy-preserving worker selection scheme for blockchain-based crowdsensing

IEEE Transactions on Services Computing

Infochain: A decentralized, trustless and transparent oracle on blockchain