1 Introduction

In Multi-Agent Systems, several interacting intelligent agents pursue a goal by performing a set of tasks. To do so, they need to share information and coordinate to maximize the group gain. Among different topics that orbit around multi-agent coordination [13], one of the most common is task assignment. In particular, when dealing with homogeneous teams, where any agent can assume any available role, fixing the role per agent would not be the wisest decision, since it would restrict the ability to dynamically change roles between team members, therefore decreasing the overall performance. On the other hand, dynamically assigning roles in a team can have some associated costs, namely processing time and computing power.

One environment that has been empowering research in multi-agent coordination on autonomous and mobile robots is the RoboCup [7] - an event that occurs annually since 1997 and gathers together researchers and robotics enthusiasts from all around the world in a series of competitions around R-Sports (Robotic-Sports) and a symposium. Among the different leagues present in RoboCup, one of the strongest theme is robotic soccer, for which the Federation defined a bold objective: “By the middle of the 21st century, a team of fully autonomous humanoid robot soccer players shall win a soccer game, complying with the official rules of FIFA, against the winner of the most recent World Cup”.

One of the RoboCup soccer leagues is the MSL (Middle-Size League) [14], a league that is very challenging, not only in terms of regulations, but also due to its rich environment (Fig. 1) - complex and semi-structured - that provides an excellent testbed for autonomous robotic teams in stochastic and highly dynamic environments.

Fig. 1.
figure 1

Middle-Size League finals in RoboCup 2013 at Eindhoven, Netherlands

The reason why a robotic soccer match has proved over the years to be an excellent testbed is because it resembles more the real world than typical pre-conceived research lab setups, especially due to its natural characteristics: agents must be resilient both to unexpected situations (because the opponent team actions can only be predicted up to a certain point) and to expeditious changes of the world state at any time.

In this league, teams play with 5 field robots (including the goalkeeper) and an auxiliary computer that usually acts as the coach, providing the user interface for visualisation and establishes the link between the referee signals and the playing robots. Humans are not allowed to interact with any system that is taking part in the match. Additionally, since the coach computer is not allowed to have sensors and there are no extra sensors installed around the field, the playing robots must act as a sensor network for the coach, providing it with the necessary information to base its decisions on.

The work described in this paper was accomplished within the MSL context, more specifically in the CAMBADA (Cooperative Autonomous Mobile roBots with Advanced Distributed Architecture) team [10], the MSL Robotic Soccer team from the University of Aveiro. The project was founded in 2003 and is currently hosted by the IEETA IRIS (Intelligent Robotics and Intelligent Systems) group.

For inter-robot communications, this team uses the Realtime Data Base (RtDB) middleware [1], that provides seamless access to the complete state of the team using a distributed database, partially replicated to all team members. Robots push information such as perceived ball position, self pose belief, coordination signals provided to achieve team-work tasks and other state variables. A portion of the information present locally on an agent is periodically and asynchronously broadcasted to all team members by a running process (‘comm’) that runs on each robot processing unit.

In a fast-paced and highly dynamic environment such as the RoboCup MSL, if task assignment constantly relied on negotiation techniques, they could easily become timely ineffective, due to the strict timing restrictions of the application and the delays caused by the negotiation itself. This is why fully-distributed techniques have generally been avoided for applications with realtime constraints. However, roles switch occurs with a low frequency, thus promoting the use of the coach computer to perform role selection among the team. However, as it will be discussed later, this constitutes a major single point of failure that, in the case of an actual collapse, may compromise the performance of the team.

In this paper, a thorough description of the leader election mechanism implemented in the CAMBADA team is described. We will start by discussing the coordination techniques widely used and described in the literature in Sect. 2. In Sect. 3, we formally describe the consensus problem and present two of the most reputable solutions to solve it, as well as their limitations. Sections 4 and 5 describe the proposed solution and its results, respectively. We finally conclude the paper with some final considerations on Sect. 6.

2 State of the Art

2.1 Distributed Assignment

In multi-agent systems, a common distributed approach is to dynamically assign roles locally (on each agent) based on a set of pre-defined policies that depend on their world-state belief. These policies must be defined in a way that guarantees convergence and avoid conflicts between intentions and/or actions [4]. This is often achieved using policy reconstruction methods, which make explicit predictions about an agent action, by explicitly running the decision-making algorithm of that agent, using shared plans [5] or by using a learned model of the other agent behaviour [3, 6]. Some related work can achieve coordination when in low-communication and time-critical environments, provided that agents can periodically have full connectivity [15].

However, each agent has a slightly different world-state belief at a given time. Assuming that it would be possible to, in a real application, take an instant snapshot of the beliefs of all agents at the same time and, by looking into them, we would find (slight) differences between them. This is due to the fact that the information residing on the agent world state is affected by many disturbances (starting on the measurement itself, partial observance, unideal modelling, noise, integration errors and even network delays). Therefore, the obvious drawback of the distributed approach is that agents are basing their decisions upon different beliefs, which can easily lead to lack of consensus and conflicting decisions. Most distributed policies designed to reach a consensus on task assignment are based on the fact that there is a common world state belief for all agents, which in real realtime highly dynamic applications is rare.

To overcome this problem the agents can, instead of deciding locally and instantly committing to that decision, broadcast the intention and then use distributed negotiation algorithms to deal with any conflicts [17]. However, this approach is dependent on the network conditions and negotiation is not practical when the application demands a high level of reactivity.

2.2 Centralised Assignment

As opposed to fully distributed approaches, centralised architectures rely on a single agent to control and monitor the action plans of all other agents. Since this coordinator agent gets to decide on the final plan (which includes all agents partial plans), any conflicts between agents’ plans can be taken into account during the planning process.

Centralising the decision to achieve consensus has some advantages over a distributed approach:

  • Centrally solves the problem of tightly-coupled coordination that arises whenever there are one or more actions of one agent that affect the optimal action choice of another team member, since the decisions are based on a single belief of the world state.

  • From a software architecture point of view, it is simpler to implement and maintain.

Albeit these positive aspects, a completely centralised approach has three main drawbacks with respect to decentralised solutions:

  • No redundancy. The coordinator agent constitutes a single point of failure - if it fails, the whole team may fail due to lack of coordination.

  • Network delays can propagate to actions. Decisions need to be communicated to the agents, which takes time that, in some applications, depending on the authority level of the coordinator agent, may be critical.

  • Limited scalability. A higher number of agents will require higher computational power on the coordinator agent to process and to devise a plan for the complete team of agents.

2.3 Centralised Assignment with Leader Election

When scalability is not a priority, some systems rely on centralised decision taken by one of the participating agents. However, network delays make it unfeasible for the leader to make realtime decisions when the environment requires a high level of reactivity. Therefore, the leader has to provide the team high-level coaching hints that will work towards a group consensus, while leaving low-level decisions (the fast-paced action that needs to be taken locally) to the other team-members.

The election of the coordinator agent is a fundamental part of this type of architecture to overcome a possible faulty coordinator. In case the coordinator fails, the agents need to recognise a coordinator failure and then coordinate to find the next leader - the agent that will replace the previous one.

3 The Consensus Problem

The consensus algorithms described below were designed to achieve consensus between processes or between server clusters, but in this section, we will refrain from using the term agent or server and will use the more inclusive term node. Consensus is a general term used to describe a state where participating nodes on a system agree on something, bound under certain conditions.

When applied to Multi-Agent Systems, consensus algorithms allow a group of agents to work coherently, enabling the system as a whole to survive in the event of sporadic failures of one or more of its members. For example, consensus algorithms have been successfully implemented for distributed storage on server clusters using log replication [11], as well as in robotic networks [9].

3.1 Paxos

Over the last decade, Paxos [8] has dominated the subject of consensus algorithms for software systems. Paxos has either been applied to or influenced many systems to solve a consensus problem. Multi-Paxos [16] is also referred in the literature - it was proposed as an optimisation to Paxos, since it essentially skips one step, which has no impact on coherence, provided that the leader remains the same and online for a long period of time.

However, the (Multi-)Paxos algorithm, which was conceived upon a complex theoretical model makes it less convenient to implement in real-world systems: in order to properly run it on practical systems, significant changes to its architecture are required [2].

3.2 Raft

As a response to the concerns mentioned above, Diego Ongaro and John Ousterhout have developed the Raft [12] protocol, with understandability and implementability as a primary goal, but without compromising the correctness or efficiency of the (Multi-)Paxos. Using techniques such as decomposition and state space reduction, the authors have not only been able to separate leader election, log replication and safety, but also reduce the possible states of the protocol to a minimal functional subset.

Assumptions.

The following three assumptions are made by Raft:

  • Machines run asynchronously: there is no clock synchronisation between different systems and there are no upper bounds on message delays or computation times.

  • Unreliable links: possibility of indefinite networks delays, packet loss, partitions, reordering, or duplication of messages.

  • Unreliable nodes: processes may crash, may eventually recover, and in that case rejoin the protocol. Byzantine failures are assumed not to occur (Fig. 2).

Fig. 2.
figure 2

Timeline of example execution of Raft, adapted from [12]. In three of the presented terms, a leader was elected after an election period. Only in term \(i + 2\), consensus was not reached during the election process, so a new election stars.

Consensus by Strong Leadership.

Raft achieves the consensus by a strong leadership approach. In steady-state, a node in a Raft cluster is either a LEADER or a FOLLOWER. There can only be one leader in the cluster and when the leader becomes unavailable, an election occurs, and nodes can become CANDIDATEs.

In the original Raft system applied to log replication, the LEADER is fully responsible for managing log replication to the followers (the remaining nodes) and regularly informs the followers of its existence by sending heartbeat messages. Upon receiving this heartbeat, a FOLLOWER node resets a timer and whenever it reaches a timeout value the node can become a CANDIDATE to initiate a new election.

Each leader is elected for a term - a discrete temporal identifier (counter). At most, one leader can be elected in a given term and the event of a new election marks the start of a new term.

During an election, three situations can occur:

  1. 1.

    The majority of the nodes vote for the CANDIDATE, meaning this node can switch to the LEADER state and start sending heartbeat messages to others in the cluster to establish authority.

  2. 2.

    If other CANDIDATEs receive a packet, they check for the term number. If the term number is greater than their own, they accept the node as the leader and return to FOLLOWER state. If the term number is smaller, they reject the packet and still remain a CANDIDATE.

  3. 3.

    The CANDIDATE neither loses nor wins. If more than one node becomes a CANDIDATE at the same time, the vote can be split with no clear majority. In this case, a new election begins after one of the CANDIDATEs times out.

Limitations.

Two main limitations have been identified in the Raft protocol:

  • 1 and 2 active nodes corner-cases: when there are less than 3 nodes available, Raft will fail to elect a leader, because it is impossible to achieve the majority of votes in either of the cases.

  • No prioritisation: nodes are equally probable of becoming the leader. In some heterogeneous clusters, the user might want to defer the leadership to a node that has more computing power available.

4 Proposed Solution

Our leader election solution is based on the Raft algorithm for that purpose, with some adaptations to overcome the aforementioned limitations. Furthermore, we have integrated it in the RtDB middleware as an asynchronous service. By doing so, the information from the current leader is available for all agents at any time without re-configuration.

4.1 Timing Parameters

Three crucial aspects to consider when implementing this solution are the parameterisation of the sending frequency of heartbeat packets (\(f_{HB} = 1/\varDelta T_{HB}\)), the heartbeat timeout (\(T_{max,HB}\)) and the election timeout (\(T_{max,E}\)). Despite Raft originally suggesting times in the order of tens or hundreds of milliseconds, the selection of these times depend a lot on the application, fail-frequency and the communication medium between nodes.

In most mobile robotic teams, the robots communicate with each other in one or more of the many different available forms of radio communication. In this particular application, robots are using the Wi-Fi (IEEE 802.11a standard) in a spectrally dense environment, with strict bandwidth limitations (currently 2.2 Mbit/s).

To select the heartbeat frequency \(f_{HB}\), a trade-off between delay in the start of a new election and bandwidth expense has to be considered, while accounting for the actual role of the leader and the frequency that its orders change. This is important because the robots will follow the latest available order while a new leader is being elected.

The heartbeat timeout \(T_{max,HB}\) should be selected in line with the packet loss experienced in the testbed environment. For example, when selecting an heartbeat timeout that is more than twice the maximum heartbeat packet period, then a new election will occur when two consecutive heartbeats are not received from the leader.

The election timeout \(T_{max,E}\) accounts for the time we allow the exchange of vote packets and is important whenever no majority of votes is achieved by any of the candidates.

Based on these assumptions, the values were selected as follows, with the ranges defining the limits of random uniform distributions.

$$\begin{aligned} 40\,\mathrm{ms} \le \varDelta T_{HB} \le 60\,\mathrm{ms} \end{aligned}$$
(1)
$$\begin{aligned} 250\,\mathrm{ms} \le T_{max,HB} \le 400\,\mathrm{ms} \end{aligned}$$
(2)
$$\begin{aligned} T_{max,E} = 100\,\mathrm{ms} \end{aligned}$$
(3)

4.2 The Backup State

To tackle the first limitation (failure to achieve majority in an election with less than 3 active nodes), apart from LEADER, FOLLOWER and CANDIDATE states in the original Raft algorithm, we have introduced the BACKUP state, which is triggered whenever there is only 1 or 2 active nodes in the system. The complete state machine is present in Fig. 3.

Fig. 3.
figure 3

State machine of the proposed solution

A node maintains a dynamic dictionary of timers, indexed by the peer ID. The timer belonging to a peer node is reset whenever a heartbeat or acknowledge packet is received. The active nodes list is determined by the peer IDs in the dictionary that have an elapsed timer lower than the heartbeat timeout. When in a BACKUP state, the leader is determined by the lowest ID among the active nodes.

4.3 Preferred Leader Agent

In our particular application, in order to free computational resources from the robots, it is wise to give preference to a coach computer to be the leader, whenever it is available, while keeping the leader election functionality active as a redundancy mechanism for when it fails.

In order to achieve this priority for the coach agent, while keeping harmony and consistency among the voting agents and the voting process, we skip the heartbeat timer reset on the coach agent. When joining the network, the coach agent will start as a follower and will start receiving the heartbeat packets from the current leader, updating its term accordingly, but ignoring the heartbeat timer reset step. When reaching the heartbeat timeout, the coach agent will start a new election in a higher term and the previous leader will retreat. This constitutes the only situation when an agent intentionally takes over the team leadership.

5 Experimental Setup and Results

To test this solution, an experimental setup has been devised with 5 computers running the communication process with the leader election algorithm described in the previous section and an experiment coordinator. The coach agent was disconnected. The coordinator was impaired from becoming the leader, but participates in the voting phase and is responsible for monitoring the leader selection evolution, measuring times and forcing periodic communication failures (each 5 s, approximately) on the elected leader, hence triggering a new election, and logging data for offline analysis. The setup is depicted in Fig. 4.

Fig. 4.
figure 4

Experimental setup

The system has been setup and worked for 16 h and 37 min, producing a total number of 11970 terms. From this dataset it is possible to statistically analyse the performance of the proposed solution with respect to term period, election time, occurrence of simultaneous multiple candidates in an election, leader attribution distribution and also the number of failed elections due to lack of majority in the voting process. Among all samples, the measured term time was \(5000.15 \pm 92.73\) ms (mean and standard deviation), which is consistent with our experimental setup described above.

5.1 Failed Elections

Having the coordinator participating in the voting rounds, makes it possible to tie a voting, because the total population consists of 6 agents. A failed election occurs whenever none of the candidates receives the majority of votes. From the total number of 11970 terms, there were 2 registered failed elections.

Fig. 5.
figure 5

Election time histogram

5.2 Election Time

Election times were inspected (Fig. 5) and it was verified that they follow a distribution that is consistent with the selected heartbeat timeout time \(T_{max,HB}\,\in \,\)[250 –400] ms, picked by a random uniform distribution on that range.

It is also important to mention that apart from the results presented on Fig. 5, there were 5 other samples with higher election times, namely: 1.1, 2.2, 2.6, 2.7 and 2.8 s. These 5 samples were not included in the Figure to improve visibility of most samples in the plot.

5.3 Simultaneous Multiple Candidates

The occurrence of multiple candidates for election was also analysed. These results are shown in Fig. 6 (with the y axis in a logarithmic scale), were we can see that in 97.4% of the samples there is only one candidate, 2.5% two candidates and 0.03% (only 3 times in the whole run) three candidates, which was the maximum count for multiple candidates in a single round.

Fig. 6.
figure 6

Number of candidates histogram - y axis in logarithmic scale

5.4 Leadership Attribution

A uniform distribution of leaders among the eligible agents was expected, however there are small differences between the agents, as shown in Fig. 7a Since agent 3 (the laptop closer to the access-point) showed the maximum number of wins in elections, we wanted to investigate further if this was merely a coincidence or if the relative position to the access-point would affect the priority of being selected as a leader when there are multiple candidates.

Fig. 7.
figure 7

Leadership attribution analysis

To test that hypothesis, we analysed the leader attribution in the terms for which there were more than one candidate (Fig. 7b). These results do not show a clear higher chance of agent 3 to become a leader in conflict situations. Because in this setup many external factors can influence the communication medium, further tests must be performed with laptops’ positions shuffled between runs.

6 Conclusion

After discussing the original Raft approach to achieve consensus on machine clusters, two main limitations have been identified - a corner-case when there are only 1 or 2 active nodes and the lack of prioritisation among agents to become a leader. A solution that is based on the Raft leader election protocol has been described and successfully implemented to overcome the aforementioned limitations.

An experimental setup has been created to test our leader election solution, by continuously forcing a new election. The obtained results are in line with the timings set for the asynchronous activity of this mechanism, set accordingly to the requirements of the application - in this case, a robotic soccer team.

In a nutshell, the proposed solution is suitable to select a leader among the team agents. It accounts for the possibility of having a preferred leader agent, providing a fault-tolerant and reliable redundancy mechanism whenever the leader becomes inactive.