Algorithms for learning from spatial and mobility data
View/Open
Date
30/11/2020Author
Astefanoaei, Maria
Metadata
Abstract
Data from the numerous mobile devices, location-based applications, and collection
sensors used currently can provide important insights about human and natural processes. These insights can inform decision making in designing and optimising in frastructure such as transportation or energy. However, extracting patterns related to
spatial properties is challenging due to the large quantity of the data produced and the
complexity of the processes it describes. We propose scalable, multi-resolution approximation and heuristic algorithms that make use of spatial proximity properties to
solve fundamental data mining and optimisation problems with a better running time
and accuracy. We observe that abstracting from individual data points and working
with units of neighbouring points based on various measures on similarity, improves
computational efficiency and diminishes the effects of noise and overfitting. We consider applications in: mobility data compression, transit network planning, and solar
power output prediction.
Firstly, in order to understand transportation needs, it is essential to have efficient ways
to represent large amounts of travel data. In analysing spatial trajectories (for example
taxis travelling in a city), one of the main challenges is computing distances between
trajectories efficiently; due to their size and complexity this task is computationally
expensive. We build data structures and algorithms to sketch trajectory data that make
queries such as distance computation, nearest neighbour search and clustering, which
are key to finding mobility patterns, more computationally efficient. We use locality
sensitive hashing, a technique that associates similar objects to the same hash.
Secondly, to build efficient infrastructure it is necessary to satisfy travel demand by
placing resources optimally. This is difficult due to external constraints (such as limits
on budget) and the complexity of existing road networks that allow for a large number
of candidate locations. For this purpose, we present heuristic algorithms for efficient
transit network design with a case study on cycling lane placement. The heuristic is
based on a new type of clustering by projection, that is both computationally efficient
and gives good results in practice.
Lastly, we devise a novel method to forecast solar power output based on numerical
weather predictions, clear sky predictions and persistence data. The ensemble of a
multivariate linear regression model, support vector machines model, and an artificial neural network gives more accurate predictions than any of the individual models.
Analysing the performance of the models in a suite of frameworks reveals that building
separate models for each self-contained area based on weather patterns gives a better
accuracy than a single model that predicts the total. The ensemble can be further improved by giving performance-based weights to the individual models. This suggests
that the models identify different patterns in the data, which motivated the choice of an
ensemble architecture.