Street-view change detection with deconvolutional networks

Alcantarilla, Pablo F.; Stent, Simon; Ros, Germán; Arroyo, Roberto; Gherardi, Riccardo

doi:10.1007/s10514-018-9734-5

Street-view change detection with deconvolutional networks

Published: 15 May 2018

Volume 42, pages 1301–1322, (2018)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Pablo F. Alcantarilla ORCID: orcid.org/0000-0001-7185-2911¹,
Simon Stent²,
Germán Ros³,
Roberto Arroyo⁴ &
…
Riccardo Gherardi⁵

5070 Accesses
192 Citations
3 Altmetric
Explore all metrics

Abstract

We propose a system for performing structural change detection in street-view videos captured by a vehicle-mounted monocular camera over time. Our approach is motivated by the need for more frequent and efficient updates in the large-scale maps used in autonomous vehicle navigation. Our method chains a multi-sensor fusion SLAM and fast dense 3D reconstruction pipeline, which provide coarsely registered image pairs to a deep Deconvolutional Network (DN) for pixel-wise change detection. We investigate two DN architectures for change detection, the first one is based on the idea of stacking contraction and expansion blocks while the second one is based on the idea of Fully Convolutional Networks. To train and evaluate our networks we introduce a new urban change detection dataset which is an order of magnitude larger than existing datasets and contains challenging changes due to seasonal and lighting variations. Our method outperforms existing literature on this dataset, which we make available to the community, and an existing panoramic change detection dataset, demonstrating its wide applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

Article 12 August 2023

Notes

Available from: http://3dvis.ri.cmu.edu/data-sets/localization/.
http://ghsi.github.io/proj/RSS2016.html.
Note that the VL-CMU dataset provides information from two vertically single line LiDAR scanners. However, there is no calibration information available between the LiDARs and the cameras.
http://ghsi.github.io/proj/RSS2016.html.

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.
Article Google Scholar
Alcantarilla, P., Stent, S., Ros, G., Arroyo, R., & Gherardi, R. (2016). Street-view change detection with deconvolutional networks. In Robotics: Science and systems (RSS), Michigan, USA.
Alcantarilla, P. F., Bartoli, A., & Davison, A. J. (2012). KAZE features. In European conference on computer vision (ECCV).
Alcantarilla, P. F., Nuevo, J., & Bartoli, A. (2013). Fast explicit diffusion for accelerated features in nonlinear scale spaces. In British machine vision conference (BMVC).
Arroyo, R., Alcantarilla, P. F., Bergasa, L. M., & Romera, E. (2016). Fusion and binarization of CNN features for robust topological localization across seasons. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 4656–4663).
Badino, H., Huber, D., & Kanade, T. (2011). Visual topometric localization. In IEEE intelligent vehicles symposium (IV).
Badino, H., Huber, D., & Kanade, T. (2012). Real-time topometric localization. In IEEE international conference on robotics and automation (ICRA).
Badrinarayanan, V., Kendall, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561.
Berjón, D., Pagés, R., & Morán, F. (2016). Fast feature matching for detailed point cloud generation. In IEEE international conference on image processing theory, tools and applications (IPTA).
Carlevaris-Bianco, N., Ushani, A. K., & Eustice, R. M. (2015). University of Michigan North Campus long-term vision and LiDAR dataset. International Journal of Robotics Research, 35(9), 1023–1035
Carlevaris-Blanco, N., & Eustice, R. M. (2014). Learning visual feature descriptors for dynamic lighting conditions. In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Chaurasia, G., Duchene, S., Sorkine-Hornung, O., & Drettakis, G. (2013). Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics, 32(3), 30. https://doi.org/10.1145/2487228.2487238.
Article Google Scholar
den Bergh, M. V., Boix, X., Roig, G., de Capitani, B., & Gool, L. V. (2012). SEEDS: Superpixels extracted via energy-driven sampling. In European conference on computer vision (ECCV).
Dong, J., Burnham, J. G., Boots, B., Rains, G. C., & Dellaert, F. (2016). 4D crop monitoring: Spatio-temporal reconstruction for agriculture. arXiv preprint arXiv:1610.02482.
Durrant-Whyte, H., & Bailey, T. (2006). Simultaneous localization and mapping: part I. IEEE Robotics and Automation Magazine, 13(2), 99–110.
Article Google Scholar
Finman, R., Whelan, T., Kaess, M., Leonard, J. J. (2013). Toward lifelong object segmentation from change detection in dense RGB-D maps. In European conference on mobile robots (ECMR).
Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multi-view stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376.
Article Google Scholar
Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In IEEE conference on computer vision and pattern recognition (CVPR).
Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In international conference on computer vision (ICCV).
Geiger, A., Roser, M., & Urtasun, R. (2010). Efficient large-scale stereo matching. In Asian conference on computer vision (ACCV) (pp. 25–38).
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In International conference on artificial intelligence and statistics (AISTATS) (pp. 315–323).
Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S. (2007). Multi-view stereo for community photo collections. In International conference on computer vision (ICCV) (pp. 1–8).
Goyette, N., Jodoin, P. M., Porikli, F., Konrad, J., & Ishwar, P. (2012). Changedetection. net: A new change detection benchmark dataset. In Computer vision and pattern recognition workshops (CVPRW).
Guizzo, E. (2011). How Google’s self-driving car works. IEEE Spectrum.
Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016a). Understanding real indoor scenes with synthetic data. In IEEE conference on computer vision and pattern recognition (CVPR).
Handa, A., Pătrăucean, V., Stent, S., & Cipolla, R. (2016b). Scenenet: An annotated model generator for indoor scene understanding. In IEEE international conference on robotics and automation (ICRA).
Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.
Book MATH Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In International conference on computer vision (ICCV).
Indelman, V., Williams, S., Kaess, M., & Dellaert, F. (2013). Information fusion in navigation systems via factor graph based incremental smoothing. Journal of Robotics and Autonomous Systems, 61(8), 721–738.
Article Google Scholar
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (ICML).
Kaess, M., Ranganathan, A., & Dellaert, F. (2008). iSAM: Incremental smoothing and mapping. IEEE Transactions on Robotics, 24(6), 1365–1378.
Article Google Scholar
Kaess, M., Ni, K., & Dellaert, F. (2009). Flow separation for fast and robust stereo odometry. In IEEE International conference on robotics and automation (ICRA)
Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J. J., & Dellaert, F. (2012). iSAM2: Incremental smoothing and mapping using the Bayes tree. The International Journal of Robotics Research, 31(2), 216–235.
Article Google Scholar
Kang, Z., & Medioni, G. (2015). 3D urban reconstruction from wide area aerial surveillance video. In Workshop on applications for aerial video exploitation (WAVE)
Kataoka, H., Shirakabe, S., Miyashita, Y., Nakamura, A., Iwata, K., & Satoh, Y. (2016). Semantic change detection with hypermaps. arXiv preprint arXiv:1604.07513.
Kim, S., Min, D., Ham, B., Ryu, S., Do, M.N., & Sohn, K. (2015). DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In IEEE conference on computer vision and pattern recognition (CVPR).
Kingma, D. P., & Ba, J. (2015) Adam: A method for stochastic optimization. In International conference on learning representations (ICLR).
Krizhevsky, A., Sustkever, I., Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Liu, M. Y., Lin, S., Ramalingam, S., & Tuzel, O. (2015). Layered interpretation of street view images. In Robotics: Science and systems (RSS).
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Maddern, W., Pascoe, G., Linegar, C., & Newman, P. (2016). 1 Year, 1000 km: The Oxford RobotCar Dataset. The International Journal of Robotics Research, 36(1), 3–15.
Article Google Scholar
Martin-Brualla, R., Gallup, D., & Seitz, S. M. (2015). Time-lapse mining from internet photos. ACM Transactions on Graphics (TOG), 34(4), 62.
Article Google Scholar
Matzen, K., & Snavely, N. (2014) Scene chronology. In European conference on computer vision (ECCV).
McCormac, J., Handa, A., Leutenegger, S., & Davison, A. J. (2016). SceneNet RGB-D: 5M photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079.
Milford, M. J., & Wyeth, G. F. (2012). SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In IEEE International conference on robotics and automation (ICRA) (pp. 1643–1649).
Min, D., Choi, S., Lu, J., Ham, B., Sohn, K., & Do, M. (2014). Fast global image smoothing based on weighted least squares. IEEE Transactions on Image Processing, 23(12), 5638–5653.
Article MathSciNet MATH Google Scholar
Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., & Sayd, P. (2009). Generic and real-time structure from motion using local bundle adjustment. Image and Vision Computing, 27, 1178–1193.
Article Google Scholar
Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In International conference on computer vision (ICCV).
Qin, R., & Gruen, A. (2014). 3D change detection at street level using mobile laser scanning point clouds and terrestrial images. ISPRS Journal of Photogrammetry and Remote Sensing, 90, 23–35.
Article Google Scholar
Radke, R. J., Andra, S., Al-Kofahi, O., & Roysam, B. (2005). Image change detection algorithms: A systematic survey. IEEE Transactions on Image Processing, 14(3), 294–307.
Article MathSciNet Google Scholar
Richter, S., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In European conference on computer vision (ECCV).
Ros, G., Sellart, L., Materzynska, J., Vázquez, D., & López, A. M. (2016a). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3234–3243).
Ros, G., Stent, S., Alcantarilla, P. F., & Watanabe, T. (2016b). Training constrained deconvolutional networks for road scene semantic segmentation. arXiv preprint arXiv:1604.01545.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International conference on computer vision.
Sakurada, K., & Okatani, T. (2015). Change detection from a street image pair using CNN features and superpixel segmentation. In British Machine Vision Conference (BMVC).
Sakurada, K., Okatani, T., & Deguchi, K. (2013). Detecting changes in 3D structure of a scene from multi-view images captured by a vehicle-mounted camera. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 137–144).
Schindler, G., & Dellaert, F. (2010). Probabilistic temporal inference on reconstructed 3D scenes. In IEEE conference on computer vision and pattern recognition (CVPR).
Schöps, T., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos in unstructured scenes. In IEEE conference on computer vision and pattern recognition (CVPR).
Shan, Q., Curless, B., Furukawa, Y., Hernandez, C., & Seitz, S. M. (2014) Occluding contours for multi-view stereo. In IEEE conference on computer vision and pattern recognition (CVPR).
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).
Stent, S., Gherardi, R., Stenger, B., & Cipolla, R. (2015) Detecting change for multi-view, long-term surface inspection. In British machine vision conference (BMVC).
Sünderhauf, N., Shirazi, S., Jacobson, A., Ferab, D., Pepperell, E., Upcroft, B., et al. (2015). Place recognition with ConvNet landmarks: Viewpoint-robust, condition-robust, training-free. In Robotics: Science and systems (RSS).
Taneja, A., Ballan, L., & Pollefeys, M. (2011). Image based detection of geometric changes in urban environments. In International conference on computer vision (ICCV) (pp. 2336–2343).
Taneja, A., Ballan, L., & Pollefeys, M. (2013). City-scale change detection in cadastral 3D models using images. In IEEE conference on computer vision and pattern recognition (CVPR).
Tola, E., Lepetit, V., & Fua, P. (2010). DAISY: An efficient dense descriptor applied to wide-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5), 815–830.
Article Google Scholar
Ulusoy, A. O., & Mundy, J. L. (2014). Image-based 4-D reconstruction using 3-D change detection. In European conference on computer vision (ECCV) (pp. 31–45).
Urban, S., & Weinmann, M. (2015). Finding a good feature detector-descriptor combination for the 2D keypoint-based registration of TLS point clouds. ISPRS annals of photogrammetry, remote sensing and spatial information sciences (pp. 121–128).
Vedaldi, A., & Lenc, K. (2015). Matconvnet—convolutional neural networks for matlab. In Proceedings of the 23rd ACM international conference on Multimedia.
Wolcott, R. W., & Eustice, R. M. (2014). Visual localization within LIDAR maps for automated urban driving. In IEEE/RSJ International conference on intelligent robots and systems (IROS).
Zeisl, B., Georgel, P. F., Schweiger, F., Steinbach, E., & Navab, N. (2009). Estimation of location uncertainty for scale invariant feature points. In British machine vision conference(BMVC).
Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J. Y., Jin, H., Funkhouser, T. (2016). Physically-based rendering for indoor scene understanding using convolutional neural networks. arXiv preprint arXiv:1612.07429.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015) Object detectors emerge in deep scene CNNs. In International conference on learning representations (ICLR).

Download references

Acknowledgements

We are grateful to Toshiba Research Europe for supporting the project and hosting the dataset, to the conference reviewers for their helpful feedback, to the Spanish MEC Project TRA2014-57088-C2-1-R and to NVIDIA Corporation for their generous support with hardware. This work was mainly done while the authors were working and interning at Toshiba.

Author information

Authors and Affiliations

SLAMcore Ltd., 84 Eccleston Square, Victoria, London, UK
Pablo F. Alcantarilla
Toyota Research Institute, Cambridge, MA, USA
Simon Stent
Intel Labs, 3600 Juliette Lane, Santa Clara, CA, USA
Germán Ros
Department of Electronics, University of Alcalá, Madrid, Spain
Roberto Arroyo
Amazon, Seattle, WA, USA
Riccardo Gherardi

Authors

Pablo F. Alcantarilla
View author publications
You can also search for this author in PubMed Google Scholar
Simon Stent
View author publications
You can also search for this author in PubMed Google Scholar
Germán Ros
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Arroyo
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Gherardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo F. Alcantarilla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alcantarilla, P.F., Stent, S., Ros, G. et al. Street-view change detection with deconvolutional networks. Auton Robot 42, 1301–1322 (2018). https://doi.org/10.1007/s10514-018-9734-5

Download citation

Received: 15 February 2017
Accepted: 02 April 2018
Published: 15 May 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10514-018-9734-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Street-view change detection with deconvolutional networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Street-view change detection with deconvolutional networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation