Abstract
Methods for socially-aware robot path planning are increasingly needed as robots and humans increasingly coexist in shared industrial spaces. The practice of clearly separated zones for humans and robots in shop floors is transitioning towards spaces where both humans and robot operate, often collaboratively. To allow for safer and more efficient manufacturing operations in shared workspaces, mobile robot fleet path planning needs to predict human movement. Accounting for the spatiotemporal nature of the problem, the present work introduces a spatiotemporal graph neural network approach that uses graph convolution and gated recurrent units, together with an attention mechanism to capture the spatial and temporal dependencies in the data and predict human occupancy based on past observations. The obtained results indicate that the graph network-based approach is suitable for short-term predictions but the rising uncertainty beyond short-term would limit its applicability. Furthermore, the addition of learnable edge weights, a feature exclusive to graph neural networks, enhances the predictive capabilities of the model. Adding workspace context-specific embeddings to graph nodes has additionally been explored, bringing modest performance improvements. Further research is needed to extend the predictive capabilities beyond the range of scenarios captured through the original training, and towards establishing standardised benchmarks for testing human motion prediction in industrial environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The coexistence of humans and automated vehicles in production spaces is expanding. While previously vehicles would operate within designated spaces and corridors only, the operating boundaries become less restrictive with the emergence of cobots. In contrast to robots, cobots are developed to collaborate either directly with human operators on a specific task or to physically interact with humans in a shared workspace. A cobot can be designed for tasks with various levels of autonomy and complexity. The specific type of robot–human interaction considered in this paper is cobots with the capability of sharing a dynamic workspace without a common task [9]. More specifically, the cobots in this case are autonomous mobile robots (AMR) tasked to move goods within an industrial environment. Navigation of robots in such spaces considers not only robot perception and navigation but also human robot interaction and human behaviour analysis and modelling, as a basis for the prediction [17]. For AMR planning and operation, human behaviour analysis and modelling requires the ability to anticipate future human movement and can enhance the efficiency and safety of path planning. This is naturally a challenging task, as humans invariably exhibit unpredictability in their movements.
Assuming that the nature of the movement is determined by internal human motivation towards a goal, methods which seek to account for the intention in the human movement [25] [12] have been developed. Addressing movement in outdoor environments, they involve intentions such as starting, crossing or stopping. However, the set of possible intentions is defined by the context of the application scenario and reported outdoor contexts focussed on driver assistance systems or driverless operation of vehicles , concern a limited set of intentions, for example regarding crossing a street, especially close to intersections. These intentions are not sufficiently representative of intentions in working environments. For example, human indoor workspace movement intentions can be motivated by specific work environment goals. Such goals can be part of a work process and can therefore be somewhat well-defined for a given industrial environment. However, they differ from reported works on outdoor spaces and may differ from one environment to another. Furthermore, they may also differ for the same work environment, when operations sequences or business processes change. Therefore, the development of human movement prediction based on intent recognition is not sufficiently served by works focussed on pedestrian movement in outdoor spaces, and needs to be further developed for industrial work environment contexts. While such contexts could be inferred by continuous monitoring and tracking of human activity with the explicit aim to predict the movement of each monitored person, this may easily verge towards privacy breaches. It is therefore of interest to understand if human movement prediction in indoor workspaces could be achieved via methods which do not rely on personalised tracking of individuals and their specific intentions.
This paper addresses the problem of human movement prediction for indoor workspaces in a way such that the specific individual’s movement is not explicitly considered but instead the workspace can be seen as a heatmap occupancy grid where human presence is anonymised. Specifically, the aim is to investigate the extent to which future human movement prediction could be achieved based on a historical record of sequences of human presence in workspaces. The training of such data-driven predictive could allow them to develop internal representations that capture movement intention patterns for a given workspace without explicit personalised human tracking. In a workspace, many workers are all simultaneously working on their own (different) tasks and interacting with each other within a limited physical space. A shared workspace also adds cobots into the mix and thereby further increases the complexity of the setting. To account for the spatiotemporal nature of human movement in such workspaces, the human movement prediction is posed as the spatiotemporal problem of predicting human presence in a grid cell (spatial dimension) at a specific future time point (temporal dimension). Given historical records of occupancy sequences in a heatmap grid format, the development of such human motion prediction can enable AMR socially-aware navigation in shared workspaces. The historical records are records of occupancy in the grid, showing movable (robots, humans) and immovable entities (e.g. walls, obstacles, production machines, etc.). Specifically, given a sequence of occupancy grid instances, the goal is to project the future sequence of human occupancy for each cell in the grid.
While there is substantial research on socially-aware robot navigation for outdoor spaces, there is limited work targeting the problem of human movement prediction in industrial workspaces. Models suitable to capture the complex spatial and temporal dependencies in the data are, for example, convolutional neural networks (CNN) or recurrent neural networks (RNN). Such models can be leveraged to take the spatiotemporal information into account and are frequently used in the literature for human movement predictions [23]. However, in this paper, human movement can be predicted using a graph-based approach. By representing the problem using graphs, the spatial dependencies can directly be captured by the topology of a graph. Such graph neural networks (GNN) use the concept of message passing, meaning that each node representation is updated with the message received from its connected neighbours [20]. A GNN can be combined with RNNs or CNNs, where the GNN captures the spatial dynamics and the RNNs or CNNs models the temporal dependency. Such spatiotemporal graph neural networks are with success have been successfully applied to traffic speed forecasting, COVID-19 forecasting and trajectory prediction [31]. Graph neural networks have properties that render themselves suitable for spatiotemporal problems, yet such graph-based models have not been applied to predicting indoor workspaces human motion in the literature. Given that graph neural networks have performed work well in other spatiotemporal problems, e.g. traffic forecasting [13] and pedestrian trajectory prediction [16], the aim of this paper is to explore the extent to which graph-based neural networks can be used to predict human presence in the context of shared human–robot workspaces.
The following section discusses the background of the problem and relevant literature. Thereafter, Sect. 3 describes the methodology, including the necessary data preparation for a graph-based approach, introduces the specific graph neural network approach and elaborates on the training and evaluation process. Section 4 presents the obtained results, employing data derived from simulations emulating real environments, whereas Sect. 5 discusses these results, alongside their implications and limitations. Section 6 is the conclusion.
2 Background and related work
Human movement prediction is a spatiotemporal problem. In order to predict where a human will be at a specific time in the future, it is important to understand what motivates and influences the movement. Movement generally follows complex, nonlinear patterns. Humans are usually driven by their inner motivation towards some goal intent. However, along their path they are influenced by social and physical environment constraints [15]. For example, obstacles such as walls are physical constraints. Social constraints can be social rules, norms and the actions of surrounding agents. These factors are generally not directly observable but could potentially be inferred from data or directly modelled from the context.
Overall, the human movement prediction problem is typically handled through three different modelling approaches: physics-based, planning-based and pattern-based methods [23]. Physics-based approaches are based on Newton’s law of motion. A set of equations are explicitly defined and are employed to model the future human movement. A popular example of this is the social force model by Helbing and Molnar [8] which models attraction and repulsion forces from other agents and obstacles [10, 11]. Based on the physics of movement, likely human trajectories can be predicted and robot path planning can produce sequences of waypoints which jointly meet robot movement targets, while maintaining sufficient distance from humans [10, 11]. Alternatively, and in contrast to physics-based models, planning-based methods explicitly reason on the longer-term goals of an agent’s movement. Planning-based methods assume the rationality of a human and thus consider the impact of current actions on the future [32]. These models compute path hypotheses that would enable an agent to reach its goals by minimising the total cost of a sequence of motions, and thus not the cost of one action in isolation. Both physics- and planning-based approaches have disadvantages that make them less applicable to the human movement prediction problem. For example, physics-based models generally may struggle to capture real-world complexity. Furthermore, with planning-based models, the computational time increases exponentially with the prediction horizon, the size of the environment and the number of agents, which makes it hard to implement a planning-based model in complex situations such as a manufacturing floor.
Pattern-based approaches do not, in general, suffer from the same disadvantages and are therefore frequently reported in the literature. A pattern-based approach learns human movement from data of historical trajectories. This approach can discover behavioural patterns and make predictions about future human movement. If the shared workspace is quantised in the form of an occupancy grid, with each state in the grid being either occupied or not, the human movement problem can be seen as a transition modelling problem, and can then be modelled with hidden Markov models (HMM), which can be appropriate for modelling state transitions [14]. While HMMs can thus capture the temporal relationship in the data, further customization is needed to achieve the same for the spatial relationships.
In the human trajectory prediction literature, CNNs and RNNs are frequently applied. These models are suitable for spatiotemporal problems. For example, Bartoli et al. [3] designed an LSTM model that is ‘context-aware’ and can learn and predict human motion in crowded spaces such as a sidewalk. Moreover, Zhang et al. [34] use a encoder-decoder architecture where the LSTM model is the fundamental block that captures the temporal structure of a trajectory. Nikhil and Morris [19] show that CNNs are also applicable and even argue that a CNN is superior to recurrent neural networks for trajectory prediction since the high spatiotemporal correlation can be learned more efficiently due to the computational efficiency of the convolution operation. However, most frequently RNNs and CNNs are combined to take advantage of the specific strengths of both architectures. Generally, a CNN is used to capture the spatial dimension while the temporal dimension is captured by an RNN. For example, Zhao et al. [36] use a similar concept by using a CNN to capture scene features and an LSTM to fuse trajectories and features.
As an alternative to CNNs and/or RNNs, a graph-based approach can be taken since many types of data can naturally be represented as a graph. A graph is a set of entities (nodes) and the relations (edges) between them, where each node, edge and/or the entire graph can store information [26]. GNNs are neural networks that can operate on graph data and work very well on data with high spatial dependencies such as with human movement. The spatial dependencies are directly captured by the topology of the graph. The learning of a GNN is done by the message-passing formalism. The edge and node attributes of a graph generate compressed representations (messages). These messages are then passed between nodes based on the message-passing rule; whereafter, these messages are then used to update the node and/or edge representations [22]. Hereby, a GNN can capture the interactions between multiple entities and can help find relations and complicated patterns embedded in the data. A better functioning and the most prevalent architecture in literature is the graph convolutional network (GCN). It uses the concept of convolution and generalises this into graphs.
The concept of GNN can be extended to handle sequential data and consider both spatial and temporal dependencies. A spatiotemporal graph neural network (STGNN) fuses the conceptual ideas of graph representation learning and temporal deep learning [33]. An STGNN considers the spatial and temporal dependencies at the same time [31], by performing message passing at each time point using the GNN and incorporating the new temporal information using a temporal deep learning block. The temporal and spatial layers are jointly trained together in a single machine learning model by exploiting the property that the STGNN model is end-to-end differentiable [22]. Generally, with STGNN models graph convolutions are used to capture the spatial dependence, whereas RNNs or CNNs are used to model the temporal dependency [31]. For example, Vemula et al. [28] make socially-aware predictions using a combination of graph neural networks and attention to capture the relative importance of each person in the crowd to predict human movement. Eiffert and Sukkarieh [6] build upon this work by predicting trajectories while accounting for the interaction of an agent with a robot’s path. Similar to Vemula et al. [28], they use a graph-based approach where each node represents an agent and the edge represents the interaction between them. The STGNN model developed by Eiffert and Sukkarieh [6] is successfully applied to a variety of datasets including crowds of pedestrians interacting with vehicles and bicycles, and livestock interacting with a robot. Alternatively, Choi et al. [5] use edge-level predictions on spatiotemporal graphs to make goal-oriented predictions. By levering a graph-based approach, their model incorporates the interactions between agents and the interactions with the environment. However, all the above graph-based approaches model the agents as nodes with the edges capturing the relationship between them. This approach requires that the agents can be accurately tracked over time. Therefore, this approach is not suitable in the scenario where agents are indistinguishable in the data, making it hard to observe the individual trajectories.
An alternative graph-based approach, used in traffic forecasting, is to model each node as a specific location in the environment and to predict the number of agents at each location. Li et al. [13] introduced this modelling approach and captured the spatial features using random walks on graphs. Thereafter, these spatial features are connected with the temporal features using encoder-decoder architecture. Following this work, Zhao et al. [35] intro a spatiotemporal graph neural network that outperformed the state-of-art baselines. Zhao et al. [35] introduce GCNs to capture the spatial dependencies and gated recurrent units (GRU) to capture the temporal dependencies. Building upon this work, Bai et al. [2] improve the predictive capabilities by adding an attention framework to this graph-based architecture. The attention mechanism adjusts for the importance of different time points and incorporates global temporal information. Bai et al. [2] note good short- and long-term prediction capabilities and show that this spatiotemporal graph neural network excels at capturing the spatial and temporal dependencies simultaneously. As the traffic forecasting problem can naturally be extended to a human presence prediction problem, these approaches are suitable candidates for the posed problem.
Overall, the analysed literature at large does not focus on human movement prediction in a workspace but on pedestrians’ movement on public roads [23]. A workspace is a more structured and less dynamic environment and could have restrictions and structures which are not seen in public spaces. Furthermore, the literature about graph-based models in human movement prediction is relatively limited and graph-based models have not been applied in a manufacturing setting. Moreover, the modelling approach used in the graph-based human movement prediction literature is not suitable when accurately tracking agents over time is hard. Alternatively, treating a node as fixed location in the environment has not been applied to the human movement problem in the literature. However, the ability of graph-based models to capture the spatial dynamics makes them highly relevant and applicable for human movement prediction within a manufacturing environment. Given the limited research in this area and the potential applicability of graph-based models, it is of interest to explore human movement prediction in shared workspaces with graph-based neural networks, on the basis of past movement data.
3 Methodology
This paper addresses the problem of human movement prediction in shared human–robot workspaces using a graph-based approach. Before introducing the data-driven model, Sect. 3.1 presents the data collection process. Human presence data are made available in an occupancy grid structure. The workspace and occupancy grid are converted into a graph representation, outlined in Sect. 3.2. The specific graph-based neural network types of models applied on this architecture are introduced in Sect. 3.3, explaining the workings and reasoning of using the model. This model is based on a similar graph-based spatiotemporal model identified in the literature review. However, some adjustments were made to make it appropriate for the problem formulation. After the model is introduced and input data are collected and processed, Sect. 3.4 explains the steps and methods employed for the training of the model. Moreover, this section also discusses the evaluation metrics used to assess the performance of the model.
3.1 Data collection and preparation
When employing data-driven machine learning solutions, the availability of annotated data with access to ground truth is typically a challenge. This issue is even greater when the data represent human behaviours, because of both regulation concerns (e.g. privacy protection) and the required scale of the training corpus. The use of simulations is a solution to overcome this issue. The presented work employs a simulation platform developed by THALES to provide large-scale pedestrian movements simulation. The simulator was originally developed to support control centres of critical infrastructure, such as railway stations (Fig. 1) and airports (Fig. 2), in support of dealing with the challenge of safe management of people in such spaces [18].
The simulator can be used in a serious game context to train control centre operators to react to rare incidents.Footnote 1 Furthermore, it can be used as a digital twin of such operating spacesFootnote 2 and create realistic data of human presence and movement [1] which can be used to train machine learning models. The simulator employs various approaches for moving agent trajectories and robot path planning including the Field D* algorithm [7] or ORCA (Van Den [27]) for collision avoidance. Human movement behaviour modelling is still an open research area, as it remains a challenge to produce solutions which represent with sufficient accuracy realistic phenomena and are therefore appropriate for actionable data creation. The present work seeks to bridge the distance between simulated and real-life data by concentrating on sequences of workspace occupancy grid instances, which can be optimised to follow patterns or real behaviour as observed by applying visual analytics on real visual scenes. This avoids privacy concerns which would have existed if tracking of individual persons were applied. The occupancy grid approach creates a discretisation of the environment, and it becomes possible to follow blurred sequences of movement from individuals, without identification becoming possible, in a way similar to having a very low-resolution blurred scene image. The flexibility to customise synthetic 3D operating and workspace environments allows the generation of high-level human behaviours that define how humans behave, for instance through following specific work sequence patterns. Therefore, the work presented in this paper capitalises on the opportunities offered by such simulations to produce realistic and rich data, which are used to train models for human movement prediction.
The human movement prediction presented in this paper is part of a solution developed to ensure safe and efficient robot fleet path planning in shared human–robot Industry 4.0-enabled workspaces [21]. The overall mobile robot fleet path planning employs a variation of globally guided reinforcement learning (B. [29]). It is beyond the scope of this paper to present the global guidance learning for path planning. Nonetheless, it can be said that in the path planning version without accounting for human movement prediction, the learned guidance steers the fleet of robots to attractor points. These attractors are the workstations where humans perform tasks. The added value of the human movement prediction is in influencing the planning to avoid areas in the grid with likely human presence and in optimising the fleet path guidance taking human likely presence into account. The considered use case involves specific physical work environments. However, to explore the potential of developing a solution that is applicable to a wider range of environments, a simulation approach is taken to create a range of fictitious industrial work environments and generate a rich set of sequences of human and robot movement sequences of data in such spaces. The approach is described next, and the associated data are made available.Footnote 3
The simulation environment employed to generate the fictitious industrial workspaces, alongside occupancy human–robot movement data have been developed by THALES and are employed in broader studies to assess scenarios relevant not only to workspaces, but also to indoor transportation hub public spaces, such as railway or airport terminals. Our experiments are conducted in a simulated industrial workspace layout, presenting different types of workstations and production machinery. Additionally, other static attractor points are defined, in the form of coffee/vending machines, to emulate worker short breaks and deviations from defined work sequences aligned with specific business processes. While multiple such scenarios were examined upon which human movement models were developed, the present work employs a simulated scenario with introduced randomness but with approximately 100 workers operating in it, belonging to four categories of worker profiles. Each category of worker is associated with a distinct business process described through sequences of visits to workstations, opting for the workstation with the shortest queue within their category. Moreover, randomness is introduced applying a normal distribution to govern the speed at which workers move, and a uniform distribution over the number of workers of each type. Furthermore, there is a 0.1 probability of workers deviating from their work process to visit another attractor point. An example layout of the manufacturing floor is shown in Fig. 3. The rectangles are workbenches and the small squares at the left border of the manufacturing floor are coffee stations, used as further attractors for human movement behaviour that deviates from work sequences. The proximity of the workbenches and the narrow passages between the walls adds complexity to the simulation. Although the workers are largely following work sequences patterns, it is hard to infer an individual worker’s type and work sequence from the data. The workers are unlabelled and indistinguishable in the data, accounting for privacy concerns. The prediction problem requires a model capable of capturing the underlying spatial and temporal dynamics of the data, which are influenced by the process sequences that the workers are part of.
Sections 3.2 and 3.3 introduce a spatiotemporal graph neural network to tackle this problem. The manufacturing floor is a 100 × 100 grid where each grid cell contains information about the number of workers and contextual information about the obstacles in that cell. The types of obstacles recorded in the simulation are walls, workbenches and coffee machines. The aim of incorporating this additional contextual information is to improve the predictive capabilities of the graph neural network by enabling it to make context-aware predictions. The obstacle information is extracted from the data and made binary, e.g. if a grid cell contains a coffee machine, a 1 is recorded and 0 otherwise. On the data, z-score normalisation is applied. This helps to suppress the impact of outliers and large values on the prediction and during training of the model. Empirical observations indicated the using normalised data improved the quality of the predictions. The next section introduces the graph architecture, while Sect. 3.3 presents in detail the specific version of the spatiotemporal graph network with an attention mechanism, which is implemented in the architecture.
3.2 Graph architecture
The simulation data have a grid structure where each grid cell contains information about the number of humans and the presence and type of obstacle(s) in a grid cell. This grid structure must be converted to a graph to be usable in a graph-based neural network. This paper adopts a node classification approach similar to the speed forecasting problem of Bai et al. [2]. Here each node represents a specific grid cell. The aim is to predict the human presence in that cell. Using the node classification approach, the graph structure of workspace G can be formulated as follows:
where V is the set of nodes, E is the set of edges and \({X}_{v\left(t\right)}\) is the set of node features at time t. More specifically, each node in V corresponds to a grid cell in the data. For example, cell (0,0) is node 1, cell (0, 5) is node 6 and cell (1,0) is node 101. The graph used has \(N=\left|V\right|= \text{10,000}\) total nodes. Furthermore, \({X}_{v\left(t\right)}\) is a set that contains a vector of node features for each node at time t. Each vector of node features contains information about the human occupancy and additional contextual information (if a wall, coffee machine or workbench is present). Furthermore, the adjacency matrix is a NxN matrix that stores the connectivity information of all nodes:
To construct the graph from the grid, it is determined that each node is connected to itself and all its first-degree neighbouring nodes including the diagonals. As the data have a time interval of 1 s, connecting only the first-degree neighbours would be sufficient as it is unlikely that workers would travel a larger distance in 1 s. Furthermore, since workers can move in both directions between nodes, the graph is bidirectional. Moreover, each edge has two weights since the actual direction of movement is important. These weights influence the effect of the node features in the graph convolution. With varying edge weights, the node features of different nodes are not contributing equally. The higher the edge weight, the greater the influence of these node features in the graph convolution. The edge weights are introduced as learnable parameters in the model. After each training step, the edge weights are updated. Hereby, the model would be able to capture underlying spatial dynamics in the workspace. By learning the edge weights, the trained model implicitly accounts for these different patterns. The weights are initialised by setting them equal to 1/8, where the value 8 corresponds to the average number of edges per node. To create a sparser graph, the nodes (and their connecting edges) that do not have any human presence during the entire simulation are omitted from the graph. The result is a graph with significantly fewer nodes and edges. Since these nodes were never visited during the entire simulation, it is reasonable to assume that these are unlikely to be visited in the future. The same modelling without this assumption is also applicable, only with added complexity. An example of the process of converting the grid to graph is visualised in Fig. 4, where the dotted lines represent nodes and edges that are omitted.
Following the outlined procedure, 6147 nodes are omitted from the graph. This results in a total of 3846 nodes in the final graph with 31510 edges, meaning that each node has on average 8.17 edges. Without omitting these nodes, the graph would have 10000 nodes with 88,804 edges. The constructed graph is a static one with a temporal signal in the sense that the graph structure remains the same while node features change over time. For training, the time series of graphs are sliced into smaller sequences. The model uses a historical time series of length q to predict a time series of length p, i.e. the model predicts the next p graph representations using the q previous observations. To evaluate model performance, predictions are compared with targets, i.e. the graph representations from t + 1 up to and including t + p. Sequences of input data are created and split into a test/train set to control for overfit. These sequences of node features, combined with the connectivity information, serve as input data for the graph-based model (Fig. 5). The employed spatiotemporal version of the graph neural network with built-in attention mechanism is outlined in the next section.
3.3 Graph neural network
A spatiotemporal graph neural network (STGNN) is designed to predict the human presence in the workspace. Most of the human movement prediction literature models the agents as nodes and captures the interactions between them. However, in this case the workers are indistinguishable and hard to follow over time, meaning that such an approach would not be suitable. Therefore, a different spatiotemporal approach is used. Specifically, the graph-based model is based on the work of Bai et al. [2]. The model by Bai et al. [2] is designed for a traffic forecasting problem which shares a similarity with the problem posed in this paper. Both problems are spatiotemporal problems using node prediction on a static graph with temporal signals. Instead of predicting the traffic speed, in the present work the human occupancy at the nodes at a specific time is predicted, given the previous node attributes and topology of the graph. The graph-based model developed by Bai et al. [2] has good long- and short-term prediction capability in the traffic forecasting problem. Therefore, this STGNN architecture can also be leveraged for the human movement prediction task in this paper.
The model captures the spatial dependency by a 2-layered GCN on the graph. The 2-layered GCN takes both the first- and second-order adjacent node attributes into account for the spatial characteristics of the graph. The implication is that information from second-degree neighbours which are not directly connected are also considered by each node. Contrary to Bai et al. [2], the edge weights in our model are set as learnable parameters. Bai et al. [2] use a weighted graph, where all nodes have a specified predetermined weight. These weights are manually set and fixed. The learned edge weights in our work are aimed to capture the underlying spatial dynamics within the data. They are used as inputs to the GCN after a ReLu operation is performed on the edge weights. This ensures the stability of the model by making the edge weights non-negative. Following the GCNs layers, the temporal dependencies are captured by GRUs, which learn short-term trends on the characteristics of the graph. The GRUs determine the human presence at a given time by using the hidden states from the previous moment and the information from the GCN at the current moment as input, thus retaining the temporal information through the gated mechanisms. The reset gate controls the degree of irrelevant information to be omitted for the forecast, whereas the update gate controls the quantity of information from the previous movement that should be considered for the current state. Putting both the GCNs and GRUs together yields the T-GCN model as developed by Zhao et al. [35]. Each graph representation goes through the GCN layers and is used as input by the GRU. Besides the input from the GCN, the GRU also receives information from the previous hidden state. Bai et al. [2] build upon this model by feeding the hidden states through an attention model. The attention model learns the importance of the occupancy information at every moment and learns the variation trends of the occupancy states. Finally, a fully connected layer produces the occupancy prediction in a single shot, i.e. it predicts the next p time steps at once. For all results p = 40 denotes that human occupancy predicted 40 s into the future. The architecture of the model is visualised in Fig. 6. The next section outlines how model training is performed and assessed.
3.4 Model training and performance metrics
Hereby, the evaluation assesses how the model predicts human occupancy in both the short and long term. The model is implemented using the python library PyTorch Geometric Temporal which is created by Rozemberczki et al. [22] to handle spatiotemporal graph neural networks. To update the parameters of the network and measure the performance, a mean squared error (MSE) loss function is used, defined over the occupancy of all nodes, and every prediction time step, as follows:
where N is the total number of nodes, p is number of seconds predicted, \({\widehat{y}}_{i}\) denotes the predicted value of node i and \({y}_{i}\) denoted the actual value of node i. During the training of the model, L2 regularisation is added to limit the effects of overfitting.
The network yields a regression output that by using a threshold can be converted into a binary classification. The performance of the network is evaluated using regression and classification metrics. The classification predictions are evaluated using accuracy, precision, recall and F-score. Furthermore, as the data are heavily unbalanced, with the majority of nodes being unoccupied, balanced accuracy is also included to provide a view of the performance of the model that accounts for the data imbalance. Furthermore, the F2-score is used to evaluate performance. The general F-score formula is shown in formula 7, where the F2-score has a \(\beta =2\). Whereas the F1-score equally weights precision and recall, any \(\beta >1\) provides more emphasis on recall than precision.
where TP denotes the true positives, TN the true negatives, FP the false positives, FN the false negatives and \(\beta\) is the configuration parameter of the general F-score metric. Performance estimation does not account for the omitted nodes. Including the omitted nodes with a value of 0 would inflate the reported assessment metrics, but not accurately reflect the ability of the model to predict human occupancy. Only when plotting the occupancy grids these omitted nodes are added back. The next section presents the results of applying the graph-based model for the human movement prediction.
4 Results
The obtained results are presented in this section. Section 4.1 specifically discusses the effect of adding learnable weights to the model. Section 4.2 evaluates the performance and behaviour of the model over time. In Sect. 4.3, the performance of the model with different contextual node information is assessed. The parameters used to train the model are empirically determined using the validation loss for different configurations. To determine the learning rate and the L2 regularisation parameter, a range of values between 0.001 and 0.3 has been tested. The best empirical results were found using a learning rate = 0.01 and a L2 regularisation = 0.05. Moreover, the model is trained with a batch size = 32 for 5 epochs. Furthermore, it is empirically determined that the model is best configured for this specific problem setting using 256 hidden units and an input time series of 5 s. Model training was performed using a 2 × NVIDIA T4 Tesla GPUs from the Google Cloud Computing platform with 32 GB of memory in total. Involving 256 hidden units and a batch size of 32 in the employed spatio-termporal graph formulation used most of the above allocated memory. In this setting, model training took roughly 30 min for 5 epochs. However, the response time when passing new data through the trained network is practically instantaneous.
4.1 Learnable edge weights
Making the edge weights a learnable parameter is different from the model as developed by Bai et al. [2]. This section assesses whether adding the learnable edge weights improves the performance of the model compared to using an unweighted graph. After training, the average edge weight is equal to 0.138. This is close to the edge weight at initialisation, however, that does not imply that no learning occurred. The edge weight has a standard deviation of 0.142, and the maximum weight is equal to 1.182. The distribution of the edge weights is illustrated in Fig. 7, exhibiting significant variability as a result of the learning process. Due to the ReLu operation on the edge weights, all weights are non-negative. Moreover, it can be observed that the majority of the edge weights have a value close to or of exactly zero. However, a significant number of edge weights have larger values than the values at initialisation. Overall, this indicates that during the training process, the model learns which edges are important and which edges are unimportant for the human movement prediction.
To provide a more robust performance assessment, the performance metrics presented in this section are averaged over the 40 predicted time periods. This actually under-reports the performance of shorter-term prediction (i.e. 10 time periods) by adding the larger errors observed at the end of the predictive horizon. However, it offers a broader assessment beyond short-term prediction. The model with learnable weights has an MSE of 0.4095 whereas the model without learnable weights has an MSE of 0.4501, which is a noticeable improvement. Table 1 shows the classification performance metrics for different thresholds. The table shows that the model with learnable weights yields a higher balanced accuracy, recall, precision and F2-score for almost all thresholds. The enhanced performance indicates that learnable weights improve the model’s predictive capabilities.
4.2 Model performance over extended time horizon
The previous sections presented the performance averaged over all 40 predicted time periods. However, averaging performance over a longer time period is less informative, if there is notable difference between shorter and longer time horizon predictive performance. Therefore, this section assesses how the model performance changes over a longer time horizon.
Figure 8 plots the balanced accuracy over time and shows that it decreases when the time horizon gets longer, until it reaches a flat level, presumably determined by the average occupancy of the grid, but shifted according to the cut-off determined by the threshold. The closer the performance gets to this level, the less meaningful and therefore less actionable is the prediction. However, in the first approximately 8 time steps the performance is relatively high. Beyond this time horizon, the prediction ceases to be meaningful and at approximately 10 s it offers little more than a guess based on first-order statistics, i.e. mean expected occupancy. A similar conclusion can be drawn from Fig. 9 and Fig. 10, which visually illustrate occupancy over time as coloured and black density heatmaps, respectively. Figure 9 shows the regression outputs as a heatmap over time. The regression outputs represent the belief by the model about human presence at that cell. While the first two plots show concrete and discrete occupancy patterns, from time horizon 10 and above, the occupancy prediction simply indicates the overall movement patterns associated with the work processes practically converge to the occupancy expectation over time. Therefore, only the first two images show a clear meaningful and potentially actionable prediction.
The case when the regression output is converted to a classification outcome, by applying a classification threshold, is shown in Fig. 10. Choosing a low classification threshold, such as up to 0.05, results in more grid cells being predicted as occupied, which would be consistent with a conservative ‘safety first’ interpretation for the AMR planner. In terms of accuracy, this would lead to a higher ‘false positive’ rate. The practical outcome of an excessively conservative threshold choice of 0.01 is that the AMR planner will be left with too limited options to optimise planning. Conversely, a higher threshold results in a reduced proportion of grid cells predicted as occupied. Therefore, very high threshold values would lead to a higher ‘false negative’ rate. This would allow more flexibility for the AMR planner. However, too aggressive threshold choices, such as 0.35 or higher in the specific case, would leave the handling of human presence missed by the predictor to be further processes during the real-time sensing and navigation of the robots.
4.3 Performance assessment with the addition of contextual node features
The results presented in the previous section were based on scenarios which did not explicitly take into account the structure of the workspace. One may assume nonetheless that this information might be implicitly present in (and therefore learned by) the data. Cells with walls define movement boundaries. Cells with workbenches are human movement attractor areas, and in particular with a specific sequence, according to job sequences. However, it is of interest to explore whether such contextual information can be made more explicit. To do so, additional contextual features were added to each node. Specifically, each node was set to carry information not only about the human occupancy, but also about the presence of a wall (fixed boundary without acting as attractor), workbench (fixed physical element, which acts as main job attractor) or a vending machine (fixed element, which acts as a non-job related attractor). The experiments were performed again after introducing such modifications to the graph nodes of the network. Table 2 reports the MSE for a full experiment with the different combinations of node contextual features.
Compared to the results without any additional context information, all additional node features see a small decrease in MSE and therefore increase in performance, as seen in this table. The combination of several contextual node features further increases the performance of the model. When including all node features the best performing model is obtained. It must be noted that the improvement in performance is modest. This is subject to interpretation though. Specifically, results are presented for the whole duration of 40 time steps. This implies that the performance is actually under-reported. As in the case of the results presented in the previous section, it is the performance over the first approximately 8 steps that offers a meaningful prediction. Therefore, the incorporation of the accumulated MSE from all remaining steps blurs the picture, as the largest part of the reported error is from the longer time horizon steps. The inclusion of learnable edge weights also appears to have a somewhat similar effect, and it remains unclear whether in terms of performance it would be better to encode contextual information into the data or simply rely on learnable weights. However, similar observations regarding the range of time horizon steps and classification threshold values which lead to meaningful, and therefore actionable predictions, can be made. This becomes more evident when focussing on prediction results from the first 10 only time horizon steps, Table 3 summarises such results, from experiments employing different combinations of contextual node features for a time horizon of 10 steps, and for different classification thresholds.
The results provide evidence that a classification threshold of approximately 0.1 and the inclusion of all contextual features result in the best observed performance, as demonstrated by the balanced accuracy. Choosing a lower threshold, such as 0.05, leads to similar performance, which may be preferable, when opting to err at the side of caution. In this case the robot path planning would be more sensitive when choosing areas to avoid, as predicted to be occupied by humans.
5 Discussion
The key objective of the study was to assess to what extent graph-based neural networks can be applicable to predicting human presence in a shared workspace. The results addressed this research question and offered insights into how graph neural networks, in the form employed in this research, perform in such a problem. The workspace is seen as an occupancy grid. The approach taken falls into the category of pattern-based methods, where the target is to predict future occupancy on the basis of past observations. The past observations can be the outcome of performing video analytics over videos acquired via a network of distributed cameras. In the considered simulation scenario, a richer set of simulated observations are produced, compared to what would have been possible with real video analytics from a dedicated workspace. Resembling to real work environments, the workspace locations (grid cells) are characterised by a relatively low occupancy rate. This makes the available datasets unbalanced, with a much higher number of cells never visited, compared to cells which are visited. Motivated in part by mitigating the low occupancy rate and in part to improve computational efficiency, the unvisited nodes were omitted from the data preparation. This improved the predictive capabilities and reduced the size of the model. As a consequence, the model cannot predict movement to nodes that are omitted even though it would be a feasible movement for a worker, which is a limitation. Within the simulation this did not result in any problems but in the real world with more randomness and variation, this could be avoided by considering all cells, albeit with higher computational costs. Furthermore, it is assumed that connecting all nodes with only their first-degree neighbours is sufficient since the 2-layered GCN can also consider information from second-degree neighbours without being directly connected to these nodes. The assumption that workers are unlikely to move a greater distance in 1 s appears to be adequate since the results show no indication of this being insufficient. However, future work can also look at relaxing such assumptions, generate data which allowing more extensive movements which cover the whole workspace and at higher movement speeds, and modelled by larger network structures, at the expense of higher computational costs, and increased imbalance in the employed datasets. Ultimately, the outputs of the model can be used to improve AMR planning in a shared workspace. Anticipating the future human movement can help the planning of AMRs to avoid potential collisions making smart manufacturing operations safer and more efficient. It is worth noting that comparisons with other established solutions for human movement prediction are not directly applicable, due to the specific framing of the problem in the present study via a heatmap occupancy format, as part of a broader integrated solution for AMR planning via occupancy data. Nonetheless, it would be beneficial for future research to establish relevant benchmarks for industrial environment contexts, extending current initiatives proposing benchmarks for outdoors, shopping malls and simple room-based scenarios [24]. Furthermore, predictive approaches for human trajectories would be appropriate to be integrated within broader safety assurance mechanisms for robotic systems [4] and digital twin solutions [30], for safer and trustworthy human–robot collaboration in manufacturing.
5.1 Learnable edge weights
Compared to the model developed by Bai et al., [2], the graph neural network in this paper makes the edge weights a learnable parameter. The edge weights influence the importance between connected nodes. By making these weights learnable, spatial patterns can be captured. These learned edge weights are introduced to implicitly account for different work sequences and other spatial patterns in the data, which otherwise would not have been accounted for. The results show that the model with learnable edge weights offers superior performance when compared with an unweighted graph. The model with learnable edge weights yields a lower mean squared error, and a higher balanced accuracy, precision and recall. Therefore, it is concluded that learnable edge weights capture some additional spatial dynamics in the data, compared to what would have been possible without these learnable weights. The learnable weights learn the specific layout and movement dynamics in the factory and thus implicitly account for the different work sequences of the workers.
Adding learnable weights is an exclusive feature to graph neural networks. The inclusion of learnable weights in a graph neural network has interesting implications for generalizability. Since the specific layout and dynamic of the factory are learned, the model cannot directly be applied to a different factory (setting). The model must be retrained using data from that specific setting. At the same time, this could be seen as an advantage for generalisation, to the extent that learnable weights can actually capture such implicit patterns in factory settings, and so such information may not have to be hard-encoded. Future research could explore different approaches that account for the work sequences which extend the generalisation capability of the model.
5.2 Occupancy prediction over time
The model makes a single-shot prediction for the next 40 time instances (seconds) based on the input data. The very short-term predictions show a high balanced accuracy. When evaluating the predictions for a longer time interval, the results show that the further the predictions are into the future, the worse the performance of the model becomes. The degradation of performance over longer prediction horizons is to be expected. From the performance metrics and visual inspection, it can be concluded that the model can provide meaningful prediction up until approximately 8 steps ahead. Predictions over longer time horizons are less meaningful and ultimately converged to the apparent average occupancy of cells, according to the number of cells and the average observed number of workers in the workspace. This implies that this graph-based model is only applicable for short-term predictions. This is a different conclusion compared to Bai et al. [2], who concluded that the model is applicable for both short-term and long-term forecasting tasks. However, the context of the prediction tasks is quite different: in our case, it is about human movement prediction in shared workspace, whereas in the case of Bai et al. [2] it is about traffic forecasting, with the latter likely to show stronger patterns, including time/seasonal ones, which are far less observable in our case. To address the limitation related to short-term predictions, future research could explore combining graph-based models with planning-based models. By combining the pattern-based graph network with a planning-based model, the long-term goal intent of a worker might be explicitly accounted for and would improve the performance of the graph-based model for a longer prediction horizon.
5.3 Contextual node features
By adding contextual information to the node attributes, the graph-based model can produce predictions which could be seen as being context-aware. Assuming that a worker is more likely to move towards a workbench than to a wall, adding this information potentially improves the predictive capabilities of the model. The obtained results confirm this and the MSE error gradually improves when adding additional contextual information, with the best results being obtained when including all available contextual information. A similar pattern is observed with the classification metrics, where the best results are obtained with all the available contextual information. Although the contextual information improves the predictive capabilities of the model, the reported differences in performance are somewhat small. However, this is partly due to the choice made earlier regarding how to report error-based performance. Specifically, instead of monitoring and reporting performance over the shorter time horizon that the prediction is meaningful (i.e. below 10 time steps and approximately 8 steps / seconds), the error is reported over 40 time steps. This implies that the largest part of the reported error is actually due to predictions beyond the shorter time horizon of good predictive performance and therefore observed differences are actually much more significant. In the future, it is of interest to narrow down the performance reporting to shorter-term predictions.
More research is also needed to look at how best to embed contextual information as the findings were inconclusive when considering possibilities also offered by learnable weights. Future research could study the performance of different types of graph convolutional layers and aggregation methods to improve context-aware prediction. Different approaches could possibly be better at extracting context information and thereby increasing the performance of the model in a shared workspace. Furthermore, the contextual information can also explicitly be considered by combining the graph neural network with a physics-based approach. For example, a variant of the social force by Helbing and Molnar [8] can be considered which could model workbenches as attraction forces and walls as repulsion forces. Thereby, the context information can be incorporated differently and enhance the performance of a graph neural network.
6 Conclusion
The main objective of this paper was to evaluate to what extent a graph-based neural network can predict human movement in a shared workspace. The literature analysis showed that graph-based models have not been applied to predict human movement in such spaces, yet they possess characteristics which are likely relevant to such tasks. The analysis found examples of similar spatiotemporal problems for which these graph-based models have been successfully applied, however, not for human movement prediction. This paper has selected and implemented a graph neural network to be applied to a workspace setting to evaluate the applicability of a graph-based model to the posed problem. Specifically, the model developed by Bai et al. [2] was selected based on the literature analysis. The model was further adapted to include learnable weights to better capture the spatial patterns and work sequences of the problem. This model was trained using simulation data supplied by employing a simulator already employed in other contexts, to simulate human movement in transportation hubs. The nature of the problem implies that historical data would be imbalanced towards low occupancy and this was reproduced in the simulated data.
Based on the results, it can be concluded that the implemented graph neural network yields sufficiently accurate predictions in the short term, i.e. for up to 8 time steps (seconds). For longer time horizon, the model predictions gradually shift towards the average occupancy rate that is reflected on the data, and therefore, the predictions miss the specific spatial and temporal intricacies of the human movement. It was additionally shown that by adding learnable edge weights the graph-based model can learn better some of the underlying spatial dynamics in the data. The optimised weights improve the model performance but at the cost of generalizability. However, this is to be expected and can be mitigated in the future by training different models for different workspaces and work sequences. The inclusion of contextual information appeared to improve the performance of the model, but more investigation is needed regarding how best to include contextual information.
Overall, this paper contributes to safer and more efficient robot–human coexistence. It is shown that graph-based neural networks are applicable to human movement predictions in shared workspaces, specifically in the short term. Furthermore, graph-based approaches can be used to make context-aware predictions. These predictions can serve as input for the AMR planning in a shared workspace. By being able to anticipate the likely future human occupancy, improved planning can reduce the risk that robots encounter workers, forcing the robot to stop. Although graph models are promising, future research could explore additional ways to improve predictive performance, such as how the context information in a workspace can better be incorporated by the model or, how to combine the graph-based model with a planning- and/or physics-based approach.
Data availability
Datasets generated and code generated during the current study are available at the following link: https://doi.org/https://doi.org/10.34894/FECLC3
References
Addad, B., Cavalletti, C., Duqueroie, B., Fojud, A., Lorin, S., & Tumilowicz, A. (2022, June 6). Digital Twins and Data Analysis for Crowd Management in High Capacity Stations. WCRR’22, 13th World Conference on Railway Research.
Bai J, Zhu J, Song Y, Zhao L, Hou Z, Du R, Li H (2021) A3T-GCN: attention temporal graph convolutional network for traffic forecasting. ISPRS Int J Geo-Information 10(7):485. https://doi.org/10.3390/ijgi10070485
Bartoli, F., Lisanti, G., Ballan, L., & Bimbo, A. del. (2018). Context-Aware Trajectory Prediction; Context-Aware Trajectory Prediction. In 2018 24th International Conference on Pattern Recognition (ICPR). https://doi.org/10.1109/ICPR.2018.8545447
Bi ZM, Luo C, Miao Z, Bing Zhang WJ, Zhang LW (2021) Safety assurance mechanisms of collaborative robotic systems in manufacturing. Robot Computer-Integr Manuf 67:102022. https://doi.org/10.1016/j.rcim.2020.102022
Choi, C., Malla, S., Patil, A., & Choi, J. H. (2019). DROGON: A Trajectory Prediction Model based on Intention-Conditioned Behavior Reasoning. http://arxiv.org/abs/1908.00024
Eiffert, S., & Sukkarieh, S. (2019). Predicting Responses to a Robot’s Future Motion using Generative Recurrent Neural Networks. CoRR, abs/1909.13486. http://arxiv.org/abs/1909.13486
Ferguson D, Stentz A (2006) Using interpolation to improve path planning the field D* algorithm. J Field Robot 23(2):79–101. https://doi.org/10.1002/rob.20109
Helbing D, Molnar P (1995) Social force model for pedestrian dynamics. Phys Rev E 51(5):4282–4286. https://doi.org/10.1103/PhysRevE.51.4282
Hentout A, Aouache M, Maoudj A, Akli I (2019) Human–robot interaction in industrial collaborative robotics: a literature review of the decade 2008–2017. Adv Robot 33(15–16):764–799. https://doi.org/10.1080/01691864.2019.1636714
Kamezaki M, Kobayashi A, Kono R, Hirayama M, Sugano S (2022) Dynamic waypoint navigation: model-based adaptive trajectory planner for human-symbiotic mobile robots. IEEE Access 10:81546–81555. https://doi.org/10.1109/ACCESS.2022.3194146
Kamezaki M, Tsuburaya Y, Kanada T, Hirayama M, Sugano S (2022) Reactive, proactive, and inducible proximal crowd robot navigation method based on inducible social force model. IEEE Robot Autom Letters 7(2):3922–3929. https://doi.org/10.1109/LRA.2022.3148451
Katyal KD, Hager GD, Huang C-M (2020) Intent-aware pedestrian prediction for adaptive crowd navigation. IEEE Int Conf on Robot Autom (ICRA) 2020:3277–3283. https://doi.org/10.1109/ICRA40945.2020.9197434
Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2017). Graph Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. CoRR, abs/1707.01926. http://arxiv.org/abs/1707.01926
Liu H, Wang L (2017) Human motion prediction for human-robot collaboration. J Manuf Syst 44:287–294. https://doi.org/10.1016/j.jmsy.2017.04.009
Luber M, Stork JA, Tipaldi GD, Arras KO (2010) People tracking with human motion predictions from social forces. IEEE Int Conf on Robot Autom 2010:464–469. https://doi.org/10.1109/ROBOT.2010.5509779
Mohamed A, Qian K, Elhoseiny M, Claudel C (2020) Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction. IEEE/CVF Conf Computer Vision Pattern Recognit (CVPR) 2020:14412–14420. https://doi.org/10.1109/CVPR42600.2020.01443
Möller R, Furnari A, Battiato S, Härmä A, Farinella GM (2021) A survey on human-aware robot navigation. Robot Auton Sys. https://doi.org/10.1016/j.robot.2021.103837
Navarro, L., Flacher, F., & Meyer, C. (2015). SE-Star: A Large-Scale Human Behavior Simulation for Planning, Decision-Making and Training. In R. Bordini, E. Elkind, G. Weiss, & P. Yolum (Eds.), AAMAS’15: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems (pp. 1939–1940).
Nikhil N, Morris BT (2019) Convolutional Neural Network for Trajectory Prediction. In: Leal-Taixé L, Roth S (eds) Computer Vision—ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018, Proceedings, Part III. Springer, Cham, pp 186–196. https://doi.org/10.1007/978-3-030-11015-4_16
Panagopoulos G, Nikolentzos G, Vazirgiannis M (2021) Transfer Graph Neural Networks for Pandemic Forecasting. Proc of the AAAI Conf on Artificial Int 35(6):4838–4845. https://doi.org/10.1609/aaai.v35i6.16616
Rožanec JM, Novalija I, Zajec P, Kenda K, Ghinani HT, Suh S, Veliou E, Papamartzivanos D, Giannetsos T, Menesidou SA, Alonso R, Cauli N, Meloni A, Recupero DR, Kyriazis D, Sofianidis G, Theodoropoulos S, Fortuna B, Mladenić D, Soldatos J (2022) Human-centric artificial intelligence architecture for industry 5.0 applications. Int J Prod Res 61(20):6847–6872. https://doi.org/10.1080/00207543.2022.2138611
Rozemberczki, B., Scherer, P., He, Y., Panagopoulos, G., Riedel, A., Astefanoaei, M., Kiss, O., Beres, F., López, G., Collignon, N., Sarkar, R. (2021). PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models. Proceedings of the 30th ACM International Conference on Information and; Knowledge Management, 4564–4573. https://doi.org/10.1145/3459637.3482014
Rudenko A, Palmieri L, Herman M, Kitani KM, Gavrila DM, Arras KO (2020) Human motion trajectory prediction: a survey. The Int J Robot Res 39(8):895–935. https://doi.org/10.1177/0278364920917446
Rudenko A, Palmieri L, Huang W, Lilienthal AJ, Arras KO (2022) The Atlas Benchmark: An Automated Evaluation Framework for Human Motion Prediction. IEEE, pp. 636–643. 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 636–643. https://doi.org/10.1109/RO-MAN53752.2022.9900656
Saleh K, Hossny M, Nahavandi S (2018) Intent Prediction of Pedestrians via Motion Trajectories Using Stacked Recurrent Neural Networks. IEEE Trans Intell Veh 3(4):414–424. https://doi.org/10.1109/TIV.2018.2873901
Sanchez-Lengeling B, Reif E, Pearce A, Wiltschko AB (2021) A Gentle Introduction to Graph Neural Networks. Distill. https://doi.org/10.23915/distill.00033
van den Berg Jur, Guy Stephen J, Lin Ming, Manocha Dinesh (2011) Reciprocal n-body Collision Avoidance. In: Pradalier C, Siegwart R, Hirzinger G (eds) Robotics Research. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 3–19. https://doi.org/10.1007/978-3-642-19457-3_1
Vemula, A., Mülling, K., & Oh, J. (2017). Social Attention: Modeling Attention in Human Crowds. CoRR, abs/1710.04689. http://arxiv.org/abs/1710.04689
Wang B, Liu Z, Li Q, Prorok A (2020) Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning. IEEE Robot Autom Lett 5(4):6932–6939. https://doi.org/10.1109/LRA.2020.3026638
Wang S, Zhang J, Wang P, Law J, Calinescu R, Mihaylova L (2024) A deep learning-enhanced Digital Twin framework for improving safety and reliability in human–robot collaborative manufacturing. Robotics and Computer-Integrated Manufacturing. https://doi.org/10.1016/j.rcim.2023.102608
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021) A Comprehensive Survey on Graph Neural Networks. IEEE Trans Neural Networks Learning Sys 32(1):4–24. https://doi.org/10.1109/TNNLS.2020.2978386
Yi S, Li H, Wang X (2016) Pedestrian Behavior Modeling from Stationary Crowds with Applications to Intelligent Surveillance. IEEE Trans Image Process 25(9):4354–4368. https://doi.org/10.1109/TIP.2016.2590322
Yu, B., Yin, H., & Zhu, Z. (2017). Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting. CoRR, abs/1709.04875. http://arxiv.org/abs/1709.04875
Zhang B, Wang T, Zhou C, Conci N, Liu H (2022) Human trajectory forecasting using a flow-based generative model. Engineering Applications of Artificial Intelligence. https://doi.org/10.1016/j.engappai.2022.105236
Zhao L, Song Y, Zhang C, Liu Y, Wang P, Lin T, Deng M, Li H (2020) T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans Intell Transp Syst 21(9):3848–3858. https://doi.org/10.1109/TITS.2019.2935152
Zhao T, Xu Y, Monfort M, Choi W, Baker C, Zhao Y, Wang Y, Wu YN (2019) Multi-Agent Tensor Fusion for Contextual Trajectory Prediction. IEEE/CVF Conf Computer Vision and Pattern Recognition (CVPR) 2019:12118–12126. https://doi.org/10.1109/CVPR.2019.01240
Acknowledgements
This work was supported by H2020 project STAR, grant ID 956573.
Funding
H2020 Industrial Leadership, 956573, Christos Emmanouilidis.
Author information
Authors and Affiliations
Contributions
Casper Dik was involved in conceptualization, data curation, investigation, methodology, software, experimental validation, visualisation and writing—original draft. Christos Emmanouilidis contributed to conceptualization, funding acquisition, supervision, methodology and writing—review and editing. Bertrand Duqueroie took part in conceptualisation, simulation, data generation, occupancy grid methodology and writing—review and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that no financial or non-financial interests are in anyway, directly or indirectly, related to the submitted work. This work was supported through H2020 project STAR, grant ID 956573.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dik, C., Emmanouilidis, C. & Duqueroie, B. Graph network-based human movement prediction for socially-aware robot navigation in shared workspaces. Neural Comput & Applic 36, 21743–21759 (2024). https://doi.org/10.1007/s00521-024-10369-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-10369-x