Find you if you drive: Inferring home locations for vehicles with surveillance camera data
Introduction
Motivation. Nowadays, spatiotemporal trajectory data can be collected from a variety of sources including location-sharing social networks (e.g., Foursquare check-ins), geo-tagged social media (e.g., Twitter and Weibo), location-based online services (e.g., Uber and Didi), and urban traffic surveillance systems (e.g., surveillance cameras and vehicle-mounted GPS). Thesespatiotemporal data provide us with a new dimension to understand human behaviors in the physical space and further benefit every aspect of life, such as transportation, healthcare, urban planning, and homeland security. Among these, inferring home locations for users has become increasingly important for real-world applications, such as security, localized recommendation, advertisement targeting, and transportation scheduling. For example, if we infer that two users live in the same residential community, it may imply that they have similar life demands, then we can recommend them to become online friends or suggest them carpool when one user goes to the places that the other user frequently visits. In addition, if we know the home location of a vehicle, we can easily understand each mobility for the vehicle and thus sense the individual behavior patterns. Many studies have been conducted to infer home location based on users’ spatiotemporal data in areas of location-based social networks [1], [2], [3], [4], [5], [6], [7], online social media [8], [9], [10], [11], [12], [13], [14], and dense GPS trajectories [15], [16], [17], [18].
Although extensive research has been done to infer home location from spatiotemporal data, the existing methods still have two key limitations for inferring home location for vehicles. First, the links of online users and drivers are unknown, thus the methods based on social networks and social media cannot be used to infer home location for vehicles. Second, although the methods based on GPS trajectories have high prediction accuracy, for government departments and managers, the GPS trajectories of private vehicles are not easily available due to privacy issues.
In recent years, surveillance cameras have been widely deployed to monitor traffic situations. These AI-equipped cameras can recognize individual vehicle information (e.g., license plate, speed, driving direction, etc.). Therefore, vehicles’ License Plate Recognition (LPR) data are available for almost all kinds of vehicles no matter they have GPS devices installed or not in urban surveillance systems. The properties of surveillance cameras enable us to consider home location inference for all vehicles based on the surveillance camera data. Although the drivers’ personal information is filed in the department of transportation, most of drivers do not live at the registered addresses due to multiple reasons such as changing rental address, collective registered residence, multiple housing, and off-site work. Home location inference for vehicles with surveillance cameras benefits varieties of applications ranging from homeland security, traffic scheduling, to urban planning. Although surveillance camera data is not available for users and service providers, government departments can also provide service providers with query interfaces to support more location-based recommendation and advertising services.
In this paper, we are interested in inferring the home location for vehicles with surveillance camera data in urban traffic surveillance systems. Despite the importance of inferring home locations for vehicles in urban management, to the best of our knowledge, this problem has not been considered in the previous literature.
Challenges. Despite the growing adoption of traffic surveillance cameras, their coverage in the city is still limited because of the cost of installment and maintenance. The observed trajectories are incomplete because they are obtained from the static and discrete cameras. Therefore, the problem of home location inference with surveillance camera data faces multiple challenges:
- •
Sparsity. Surveillance cameras only cover partial intersections and road segments in urban. A private vehicle only is recorded at a few cameras on some days in a city. Therefore, vehicle trajectories based on surveillance camera data are incomplete and highly sparse in both spatial and temporal dimensions.
- •
Noisy. The collected data is extremely “noisy”. For example, a vehicle may have different starting areas on different days, or inconsistent starting area and ending area on same day. In addition, we find that some surveillance cameras may not work on some days, which causes inconsistent camera records.
- •
Static. The locations of surveillance camera are fixed, thus the vehicles only are observed at fixed locations (usually at intersections) on road network. That is, the stay points in the vehicle trajectories are static with respect to the locations of cameras.
- •
Too many nearby residential communities. Since the locations of cameras are mainly located at road intersections, there are many residential communities nearby. Moreover, a vehicle may be observed at multiple starting and ending areas, which makes it more challenging to infer the true home location. In fact, the nearest residential community may not be the home location of vehicle due to the sparsity of vehicle trajectories and partial coverage of surveillance cameras.
In literature, a variety of studies has been done to predict home location based users’ check-in data and/or textual contents in social media [8], [19], [20], [21], [22], [23], [24], [25], [26]. Basically, these methods apply supervised learning to predict users’ home location based on the features of check-ins, places extracted from textual contents, and user profile. But most methods only achieve coarse-grained location inference in levels of town, city, post-code or state with a large error. Another line of studies utilizes the locations of users’ friends to infer their home locations in Location-Based Social Networks (LBSNs) [1], [2], [4], [5], [7]. These work leverages the social relationships and partial locatable friends to infer users’ home location. However, such methods require the social relationships between users and partial ground truth, which both cannot be directly apply to surveillance camera data.
Recently, several studies focus on fine-grained semantic location inference based GPS trajectory [15], [16], [18], [27], [28], [29]. In [15], four heuristic algorithms (i.e., Last Destination, Weighted Median, Largest Cluster and Best Time) are proposed to infer uses’ home location in GPS trajectories, with a median error of 60 m. Wan et al. [17] propose to mine spatial–temporal semantic mobility patterns from trajectories of private vehicles based on their GPS data and POI data. These two approaches cannot be applied to our problem because they fail to handle the sparsity and static challenges in vehicles’ LPR data. To annotate mobility data, Wu et al. [28] propose to capture the relevant semantic words with respect to a mobility record using contextual social media. In their follow-up work, Wu et al. [18], [29] attempt to understand taxi traffic dynamics from multiple external data sources including POI, weather, geo-tagged tweet, and collision records. They propose to use ridge regression with polynomial kernel to describe the non-linear non-additive relationships of impacting factors. However, in our problem, it is impossible to match and extract the meaningful textual information with LPR data from noisy external social media due to the sparsity and stationary of deployed surveillance cameras. Therefore, these approaches also cannot be applied to our problem.
Proposed solution. To address the aforementioned challenges, this article proposes a novel home location inference framework for vehicles in surveillance camera data by considering both spatial and temporal characteristics. First, we obtain a real-world road network with residential communities and surveillance cameras by projecting collected multiple contextual data to road network. Second, we propose a new discovery method to detect the potential home location areas for each vehicle by clustering Origin–Destination (-) pairs in its camera-based trajectories. Specifically, we propose an time pattern to distinguish the home area candidate from the - clusters by leveraging time-aware constraints. Third, to find the home community, we further propose a KDE-based inference method to effectively detect the home community from the residential communities near/in the home area candidate. To improve prediction accuracy, we finally propose a local camera selection strategy to choose the suitable local cameras for each community candidate in KDE-based model.
We use a large-scale real-world dataset collected from aprovincial capital in China for a whole month of August in 2016. There are more than 11 million unique vehicles with about 405 million camera records. We conduct comprehensive experiments to demonstrate the effectiveness of our proposed method.
To summarize, we make the following contributions:
- •
We are the first to propose and formally define the problem of home location inference for vehicles with surveillance camera data in urban traffic surveillance systems.
- •
We propose a novel home area candidate discovery method to detect the largest possible home areas for vehicles by clustering - pairs extracted from vehicle trajectories and matching them with time-aware constraints.
- •
We propose an effective home community inference method using KDE to model the density of residential community with respect to vehicle passing local cameras. We design a local camera selection strategy to better choose suitable local cameras for each community candidate in KDE-based model.
- •
We conduct extensive evaluations on a large-scale real-world dataset. Experimental results demonstrate the effectiveness of our proposed method.
Section snippets
Home location inference in social media
There have been many studies on how to infer the home locations for users in social media. A line of methods have been proposed to predict users’ city-level location in Twitter based purely on the content of users’ tweets [8], [19], [20], [21], [22], [23], [24], [25], [26]. Mahmud et al. [24], [30] present a method that uses an ensemble of statistical and heuristic classifiers to predict locations and makes use of a geographic gazetteer dictionary to identify place-name entities for inferring
Problem definition
In this section, we define the key notations used in the paper and then formally define our problem. We define camera record as follows:
Definition 1 Camera Record A camera record is a triple that represents vehicle passing through camera at timestamp .
Definition 2 Vehicle Trajectory The trajectory of a vehicle is a sequence of tuples in chronological order, denoted by , whereeach tuple indicates passing through camera at timestamp .
Note
Overview
Fig. 1 shows the overall framework of our proposed home location inference method. Our framework is mainly composed of three parts: data preprocessing, home area candidate discovery and KDE-based home community inference. The data preprocessing mainly includes extracting vehicle trajectories from surveillance camera data, and matching cameras and residential communities with road network. The second part aims to find the candidate area of home location for each vehicle by clustering stay points
Experiments
In this section, we conduct quantitative evaluations to demonstrate the effectiveness of the proposed framework on real-world datasets.
Conclusion and future work
In this paper, we propose a home location inference framework for vehicles, called HomInf, which effectively predicts the home community for each vehicle with surveillance camera data. Our framework mainly consists of three parts: data preprocessing, home area candidate discovery and KDE-based home community inference. First, we obtain a context-rich road network with residential communities and cameras by collecting and preprocessing multiple contextual data with surveillance camera data.
CRediT authorship contribution statement
Kai Chen: Methodology, Software, Data curation, Visualization, Writing - original draft. Yanwei Yu: Conceptualization, Supervision, Writing - review & editing, Funding acquisition. Peng Song: Data curation, Investigation. Xianfeng Tang: Validation, Writing - review & editing. Lei Cao: Writing - review & editing. Xiangrong Tong: Resources, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable comments and helpful suggestions. This work is partially supported by the National Natural Science Foundation of China under Grant Nos. : 61773331, 61703360, 61572418 and 61403328.
References (51)
- et al.
Home location profiling for users in social media
Inf. Manage.
(2016) - et al.
Where are you from: Home location profiling of crowd sensors from noisy and sparse crowdsourcing data
- et al.
Smopat: Mining semantic mobility patterns from trajectories of private vehicles
Inform. Sci.
(2018) - et al.
An unsupervised approach to inferring the localness of people using incomplete geotemporal online check-in data
ACM Trans. Intell. Syst. Technol. (TIST)
(2017) - et al.
Beware of what you share: Inferring home location in social networks
- et al.
Where are you settling down: Geo-locating twitter users based on tweets and social networks
- et al.
Towards social user profiling: unified and discriminative influence model for inferring home locations
- et al.
Exploiting text and network context for geolocation of social media users
(2015) - et al.
We know where you are: Home location identification in location-based social networks
- et al.
A multi-indicator approach for geolocalization of tweets.