Abstract
Crime risk prediction is helpful for urban safety and citizens’ life quality. However, existing crime studies focused on coarse-grained prediction, and usually failed to capture the dynamics of urban crimes. The key challenge is data sparsity, since that 1) not all crimes have been recorded, and 2) crimes usually occur with low frequency. In this paper, we propose an effective framework to predict fine-grained and dynamic crime risks in each road using heterogeneous urban data. First, to address the issue of unreported crimes, we propose a cross-aggregation soft-impute (CASI) method to deal with possible unreported crimes. Then, we use a novel crime risk measurement to capture the crime dynamics from the perspective of influence propagation, taking into consideration of both time-varying and location-varying risk propagation. Based on the dynamically calculated crime risks, we design contextual features (i.e., POI distributions, taxi mobility, demographic features) from various urban data sources, and propose a zero-inflated negative binomial regression (ZINBR) model to predict future crime risks in roads. The experiments using the real-world data from New York City show that our framework can accurately predict road crime risks, and outperform other baseline methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
UCR F. Crime in the U.S. 2017-robbery, 2017
UCR F. Crime in the U.S. 2017-larceny-theft, 2017
Zhou B, Chen L, Zhao S, Zhou F, Li S, Pan G. Spatio-temporal analysis of urban crime leveraging multisource crowdsensed data. Personal and Ubiquitous Computing, 2020, DOI: https://doi.org/10.1007/S00779-020-01456-6
Department N Y C P. Nypd complaint data, 2018
Crime-recording: making the victim count. HMIC, November 2014
Masucci M, Langton L. Hate crime victimization, 2004–2015. Special Report.(No. NCJ 250653). Washington, DC: Bureau of Justice Statistics. US Department of Justice, 2017
Planty M, Langton L, Krebs C, Berzofsky M, Smiley-McDonald H. Female victims of sexual violence, 1994–2010. Special Report (No. NCJ 240655). Washington, DC: Bureau of Justice Statistics. US Department of Justice, 2013
Zheng Y. Urban computing: enabling urban intelligence with big data. Frontiers of Computer Science, 2017, 11(1): 1–3
Jiang Z, Liu Y, Fan X, Wang C, Li J, Chen L. Understanding urban structures and crowd dynamics leveraging large-scale vehicle mobility data. Frontiers of Computer Science, 2020, 14(5): 1–12
Chen C, Gao L, Xie X, Wang Z. Enjoy the most beautiful scene now: a memetic algorithm to solve two-fold time-dependent arc orienteering problem. Frontiers of Computer Science, 2020, 14(2): 364–377
Yi F, Yu Z, Chen H, Du H, Guo B. Cyber-physical-social collaborative sensing: from single space to cross-space. Frontiers of Computer Science, 2018, 12(4): 609–622
Block R L, Block C R. Space, place and crime: hot spot areas and hot places of liquor-related crime. Crime and Place, 1995, 4(2): 145–184
Cohen L E, Felson M. Social change and crime rate trends: a routine activity approach. American Sociological Review, 1979, 44(4): 588–608
Cohn E G. Weather and crime. The British Journal of Criminology, 1990, 30(1): 51–64
Field S. The effect of temperature on crime. The British Journal of Criminology, 1992, 32(3): 340–351
Mazumder R, Hastie T, Tibshirani R. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 2010, 11: 2287–2322
Mohler G O, Short M B, Brantingham P J, Schoenberg F P, Tita G E. Self-exciting point process modeling of crime. Journal of the American Statistical Association, 2011, 106(493): 100–108
Yu C H, Ding W, Chen P, Morabito M. Crime forecasting using spatiotemporal pattern with ensemble learning. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2014, 174–185
Yi F, Yu Z, Zhuang F, Zhang X, Xiong H. An integrated model for crime prediction using temporal and spatial factors. In: Proceedings of IEEE International Conference on Data Mining. 2018, 1386–1391
Zhao X, Tang J. Modeling temporal-spatial correlations forcrime prediction. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017, 497–506
Huang C, Zhang J, Zheng Y, Chawla N V. Deepcrime: attentive hierarchical recurrent networks for crime prediction. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018, 1423–1432
Vomfell L, Härdle W K, Lessmann S. Improving crime count forecasts using twitter and taxi data. Decision Support Systems, 2018, 113: 73–85
Yi F, Yu Z, Zhuang F, Guo B. Neural network based continuous conditional random field for fine-grained crime prediction. In: Proceedings of International Joint Conferences on Artificial Intelligence. 2019, 4157–4163
Gerber M S. Predicting crime using twitter and kernel density estimation. Decision Support Systems, 2014, 61: 115–125
Wang H, Kifer D, Graif C, Li Z. Crime rate inference with big data. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 635–644
Kang Z, Peng C, Cheng Q. Top-n recommender system via matrix completion. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 179–185
Shin D, Cetintas S, Lee K C, Dhillon I S. Tumblr blog recommendation with boosted inductive matrix completion. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 2015, 203–212
Chi E C, Zhou H, Chen G K, Del Vecchyo D O, Lange K. Genotype imputation via matrix completion. Genome Research, 2013, 23(3): 509–518
Cai T, Cai T T, Zhang A. Structured matrix completion with applications to genomic data integration. Journal of the American Statistical Association, 2016, 111(514): 621–633
Argyriou A, Evgeniou T, Pontil M. Convex multi-task feature learning. Machine Learning, 2008, 73(3): 243–272
Biswas P, Lian T C, Wang T C, Ye Y. Semidefinite programming based algorithms for sensor network localization. ACM Transactions on Sensor Networks (TOSN), 2006, 2(2): 188–220
Singer A, Cucuringu M. Uniqueness of low-rank matrix completion by rigidity theory. SIAM Journal on Matrix Analysis and Applications, 2010, 31(4): 1621–1641
Chen P, Suter D. Recovering the missing components in a large noisy low-rank matrix: application to SFM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(8): 1051–1063
Liu G, Liu Q, Li P. Blessing of dimensionality: recovering mixture data via dictionary pursuit. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(1): 47–60
Chistov A L, Grigor’Ev D Y. Complexity of quantifier elimination in the theory of algebraically closed fields. In: Proceedings of International Symposium on Mathematical Foundations of Computer Science. 1984, 17–31
Candès E J, Recht B. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 2009, 9(6): 717
National crime victimization survey. Special Report (No. NCJ 240655). Washington, DC: Bureau of Justice Statistics. US Department of Justice, 2010–2016 (2017)
Cameron A C, Trivedi P K. Regression Analysis of Count Data. Cambridge University Press, 2013
Khoshgoftaar T M, Gao K, Szabo R M. An application of zero-inflated poisson regression for software fault prediction. In: Proceedings of the 12th International Symposium on Software Reliability Engineering. 2001, 66–73
Gardner W, Mulvey E P, Shaw E C. Regression analyses of counts and rates: poisson, overdispersed poisson, and negative binomial models. Psychological Bulletin, 1995, 118(3): 392
Lambert D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 1992, 34(1): 1–14
Osgood D W. Poisson-based regression analysis of aggregate crime rates. Journal of Quantitative Criminology, 2000, 16(1): 21–43
Xiao K, Liu Q, Liu C, Xiong H. Price shock detection with an influence-based model of social attention. ACM Transactions on Management Information Systems, 2017, 9(1): 1–21
Weisel D L. Analyzing repeat victimization. US Department of Justice, Office of Community Oriented Policing Services Washington, DC, 2005
Yu H F, Rao N, Dhillon I S. Temporal regularized matrix factorization for high-dimensional time series prediction. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 847–855
Stekhoven D J, Bühlmann P. Missfores-non-parametric missing value imputation for mixed-type data. Bioinformatics, 2011, 28(1): 112–118
Gondara L, Wang K. Multiple imputation using deep denoising autoencoders. 2017, arXiv preprint arXiv:1705.02737
Yoon J, Jordon J, Schaar v. d M. Gain: missing data imputation using generative adversarial nets. In: Proceedings of International Conference on Machine Learning. 2018, 5689–5698
Cai J F, Candès E J, Shen Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010, 20(4): 1956–1982
Ji S, Ye J. An accelerated gradient method for trace norm minimization. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 457–464
Donoho D L, Johnstone I M, Kerkyacharian G, Picard D. Wavelet shrinkage: asymptopia? Journal of the Royal Statistical Society, Series B (Methodological), 1995, 57(2): 301–337
Lichman M, Smyth P. Prediction of sparse user-item consumption rates with zero-inflated poisson regression. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web. 2018, 719–728
Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993–1022
Salton G, McGill M J. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., 1986
Foursquare. see Foursquare website, 2018
Ehrlich I. On the relation between education and crime. National Bureau of Economic Research, 1975
Patterson E B. Poverty, income inequality, and community crime rates. Criminology, 1991, 29(4): 755–776
New York City Department of City Planning, U.S. Census Bureau, New York City PUMAS and Community Districts. see Nyc.gov/asets/planningwebsite, 2010
Zhou B, Chen L, Zhou F, Li S, Zhao S, Das S K, Pan G. Escort: fine-grained urban crime risk inference leveraging heterogeneous open data. IEEE Systems Journal, 2021, 15(3): 4656–4667
Moon T K. The expectation-maximization algorithm. IEEE Signal Processing Magazine, 1996, 13(6): 47–60
Kingma D, Ba J. Adam: a method for stochastic optimization. 2014, arXiv preprint axXiv: 1412.6980
OpenStreetMap. Open street map. see Openstreetmap.org website, 2018
NYC Taxi and Limousine Commission. NYC Taxi Dataset. see Nyc.gov/taxi website, 2018
Census Bureau. American Community Survey. see Census.gov/programs-surveys/acswebsite, 2018
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
Zhang J, Zheng Y, Qi D. Deep spatio-temporal residual networks forcity-wide crowd flows prediction. In: Proceedings of the 31st AAAZ Conference on Artificial Intelligence. 2017
Acknowledgements
This work was partly supported by the National Natural Science Foundation of China (Grant No. 61772460) and Ten Thousand Talent Program of Zhejiang Province (2018R52039).
Author information
Authors and Affiliations
Corresponding author
Additional information
Binbin Zhou received her MPhil degree in computer science from Hongkong Polytechnic University, China in 2011. She is currently pursuing the PhD degree in the Department of Computer Science, Zhejiang University, China. Her research interests include urban computing, spatio-temporal data analysis and intelligent transportation.
Longbiao Chen is an assistant professor with Fujian Key Laboratory of Sensing and Computing for Smart City, Xiamen University, China. He obtained his PhD degree in computer science from Sorbonne University, France in 2018 and Zhejiang University, China in 2016. His research interests include ubiquitous computing, urban computing, and big data analytics.
Fangxun Zhou received the BSc Degree in digital media technology from Zhejiang University, China in 2018. He is currently pursuing the Master degree in computer science and technology at Zhejiang University, China.
Shijian Li received the PhD degree from Zhejiang University, China in 2006. He is currently a professor with the College of Computer Science and Technology, Zhejiang University, China. His research interests include sensor networks, ubiquitous computing, and social computing. Professor Li serves as an editor of the International Journal of Distributed Sensor Networks and as Reviewer or PC Member of more than ten conferences.
Sha Zhao received her BSc degree in computer science from Jinan University, China in 2011, and PhD degree in computer science from Zhejiang University, China in 2016. She is currently a post-doc with college of computer science and technology, Zhejiang University, China. Her research interests include pervasive computing, mobile sensing, and machine learning. Dr. Zhao has served as a reviewer or member of several conferences.
Gang Pan received BSc and PhD degrees in computer science from Zhejiang University, China in 1998 and 2004, respectively. He is currently a professor with the College of Computer Science and Technology, Zhejiang University, China. His research interests include pervasive computing, computer vision, and pattern recognition. Prof. Pan has served as a Program Committee Member for more than ten prestigious international conferences, such as ICCV and CVPR.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Zhou, B., Chen, L., Zhou, F. et al. Dynamic road crime risk prediction with urban open data. Front. Comput. Sci. 16, 161609 (2022). https://doi.org/10.1007/s11704-021-0136-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-021-0136-z