Skip to main content
Log in

Street-view change detection with deconvolutional networks

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

We propose a system for performing structural change detection in street-view videos captured by a vehicle-mounted monocular camera over time. Our approach is motivated by the need for more frequent and efficient updates in the large-scale maps used in autonomous vehicle navigation. Our method chains a multi-sensor fusion SLAM and fast dense 3D reconstruction pipeline, which provide coarsely registered image pairs to a deep Deconvolutional Network (DN) for pixel-wise change detection. We investigate two DN architectures for change detection, the first one is based on the idea of stacking contraction and expansion blocks while the second one is based on the idea of Fully Convolutional Networks. To train and evaluate our networks we introduce a new urban change detection dataset which is an order of magnitude larger than existing datasets and contains challenging changes due to seasonal and lighting variations. Our method outperforms existing literature on this dataset, which we make available to the community, and an existing panoramic change detection dataset, demonstrating its wide applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Available from: http://3dvis.ri.cmu.edu/data-sets/localization/.

  2. http://ghsi.github.io/proj/RSS2016.html.

  3. Note that the VL-CMU dataset provides information from two vertically single line LiDAR scanners. However, there is no calibration information available between the LiDARs and the cameras.

  4. http://ghsi.github.io/proj/RSS2016.html.

References

  • Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.

    Article  Google Scholar 

  • Alcantarilla, P., Stent, S., Ros, G., Arroyo, R., & Gherardi, R. (2016). Street-view change detection with deconvolutional networks. In Robotics: Science and systems (RSS), Michigan, USA.

  • Alcantarilla, P. F., Bartoli, A., & Davison, A. J. (2012). KAZE features. In European conference on computer vision (ECCV).

  • Alcantarilla, P. F., Nuevo, J., & Bartoli, A. (2013). Fast explicit diffusion for accelerated features in nonlinear scale spaces. In British machine vision conference (BMVC).

  • Arroyo, R., Alcantarilla, P. F., Bergasa, L. M., & Romera, E. (2016). Fusion and binarization of CNN features for robust topological localization across seasons. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 4656–4663).

  • Badino, H., Huber, D., & Kanade, T. (2011). Visual topometric localization. In IEEE intelligent vehicles symposium (IV).

  • Badino, H., Huber, D., & Kanade, T. (2012). Real-time topometric localization. In IEEE international conference on robotics and automation (ICRA).

  • Badrinarayanan, V., Kendall, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561.

  • Berjón, D., Pagés, R., & Morán, F. (2016). Fast feature matching for detailed point cloud generation. In IEEE international conference on image processing theory, tools and applications (IPTA).

  • Carlevaris-Bianco, N., Ushani, A. K., & Eustice, R. M. (2015). University of Michigan North Campus long-term vision and LiDAR dataset. International Journal of Robotics Research, 35(9), 1023–1035

  • Carlevaris-Blanco, N., & Eustice, R. M. (2014). Learning visual feature descriptors for dynamic lighting conditions. In IEEE/RSJ international conference on intelligent robots and systems (IROS).

  • Chaurasia, G., Duchene, S., Sorkine-Hornung, O., & Drettakis, G. (2013). Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics, 32(3), 30. https://doi.org/10.1145/2487228.2487238.

    Article  Google Scholar 

  • den Bergh, M. V., Boix, X., Roig, G., de Capitani, B., & Gool, L. V. (2012). SEEDS: Superpixels extracted via energy-driven sampling. In European conference on computer vision (ECCV).

  • Dong, J., Burnham, J. G., Boots, B., Rains, G. C., & Dellaert, F. (2016). 4D crop monitoring: Spatio-temporal reconstruction for agriculture. arXiv preprint arXiv:1610.02482.

  • Durrant-Whyte, H., & Bailey, T. (2006). Simultaneous localization and mapping: part I. IEEE Robotics and Automation Magazine, 13(2), 99–110.

    Article  Google Scholar 

  • Finman, R., Whelan, T., Kaess, M., Leonard, J. J. (2013). Toward lifelong object segmentation from change detection in dense RGB-D maps. In European conference on mobile robots (ECMR).

  • Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multi-view stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376.

    Article  Google Scholar 

  • Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In international conference on computer vision (ICCV).

  • Geiger, A., Roser, M., & Urtasun, R. (2010). Efficient large-scale stereo matching. In Asian conference on computer vision (ACCV) (pp. 25–38).

  • Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In International conference on artificial intelligence and statistics (AISTATS) (pp. 315–323).

  • Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S. (2007). Multi-view stereo for community photo collections. In International conference on computer vision (ICCV) (pp. 1–8).

  • Goyette, N., Jodoin, P. M., Porikli, F., Konrad, J., & Ishwar, P. (2012). Changedetection. net: A new change detection benchmark dataset. In Computer vision and pattern recognition workshops (CVPRW).

  • Guizzo, E. (2011). How Google’s self-driving car works. IEEE Spectrum.

  • Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016a). Understanding real indoor scenes with synthetic data. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Handa, A., Pătrăucean, V., Stent, S., & Cipolla, R. (2016b). Scenenet: An annotated model generator for indoor scene understanding. In IEEE international conference on robotics and automation (ICRA).

  • Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In International conference on computer vision (ICCV).

  • Indelman, V., Williams, S., Kaess, M., & Dellaert, F. (2013). Information fusion in navigation systems via factor graph based incremental smoothing. Journal of Robotics and Autonomous Systems, 61(8), 721–738.

    Article  Google Scholar 

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (ICML).

  • Kaess, M., Ranganathan, A., & Dellaert, F. (2008). iSAM: Incremental smoothing and mapping. IEEE Transactions on Robotics, 24(6), 1365–1378.

    Article  Google Scholar 

  • Kaess, M., Ni, K., & Dellaert, F. (2009). Flow separation for fast and robust stereo odometry. In IEEE International conference on robotics and automation (ICRA)

  • Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J. J., & Dellaert, F. (2012). iSAM2: Incremental smoothing and mapping using the Bayes tree. The International Journal of Robotics Research, 31(2), 216–235.

    Article  Google Scholar 

  • Kang, Z., & Medioni, G. (2015). 3D urban reconstruction from wide area aerial surveillance video. In Workshop on applications for aerial video exploitation (WAVE)

  • Kataoka, H., Shirakabe, S., Miyashita, Y., Nakamura, A., Iwata, K., & Satoh, Y. (2016). Semantic change detection with hypermaps. arXiv preprint arXiv:1604.07513.

  • Kim, S., Min, D., Ham, B., Ryu, S., Do, M.N., & Sohn, K. (2015). DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Kingma, D. P., & Ba, J. (2015) Adam: A method for stochastic optimization. In International conference on learning representations (ICLR).

  • Krizhevsky, A., Sustkever, I., Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  • Liu, M. Y., Lin, S., Ramalingam, S., & Tuzel, O. (2015). Layered interpretation of street view images. In Robotics: Science and systems (RSS).

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Maddern, W., Pascoe, G., Linegar, C., & Newman, P. (2016). 1 Year, 1000 km: The Oxford RobotCar Dataset. The International Journal of Robotics Research, 36(1), 3–15.

    Article  Google Scholar 

  • Martin-Brualla, R., Gallup, D., & Seitz, S. M. (2015). Time-lapse mining from internet photos. ACM Transactions on Graphics (TOG), 34(4), 62.

    Article  Google Scholar 

  • Matzen, K., & Snavely, N. (2014) Scene chronology. In European conference on computer vision (ECCV).

  • McCormac, J., Handa, A., Leutenegger, S., & Davison, A. J. (2016). SceneNet RGB-D: 5M photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079.

  • Milford, M. J., & Wyeth, G. F. (2012). SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In IEEE International conference on robotics and automation (ICRA) (pp. 1643–1649).

  • Min, D., Choi, S., Lu, J., Ham, B., Sohn, K., & Do, M. (2014). Fast global image smoothing based on weighted least squares. IEEE Transactions on Image Processing, 23(12), 5638–5653.

    Article  MathSciNet  MATH  Google Scholar 

  • Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., & Sayd, P. (2009). Generic and real-time structure from motion using local bundle adjustment. Image and Vision Computing, 27, 1178–1193.

    Article  Google Scholar 

  • Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In International conference on computer vision (ICCV).

  • Qin, R., & Gruen, A. (2014). 3D change detection at street level using mobile laser scanning point clouds and terrestrial images. ISPRS Journal of Photogrammetry and Remote Sensing, 90, 23–35.

    Article  Google Scholar 

  • Radke, R. J., Andra, S., Al-Kofahi, O., & Roysam, B. (2005). Image change detection algorithms: A systematic survey. IEEE Transactions on Image Processing, 14(3), 294–307.

    Article  MathSciNet  Google Scholar 

  • Richter, S., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In European conference on computer vision (ECCV).

  • Ros, G., Sellart, L., Materzynska, J., Vázquez, D., & López, A. M. (2016a). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3234–3243).

  • Ros, G., Stent, S., Alcantarilla, P. F., & Watanabe, T. (2016b). Training constrained deconvolutional networks for road scene semantic segmentation. arXiv preprint arXiv:1604.01545.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International conference on computer vision.

  • Sakurada, K., & Okatani, T. (2015). Change detection from a street image pair using CNN features and superpixel segmentation. In British Machine Vision Conference (BMVC).

  • Sakurada, K., Okatani, T., & Deguchi, K. (2013). Detecting changes in 3D structure of a scene from multi-view images captured by a vehicle-mounted camera. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 137–144).

  • Schindler, G., & Dellaert, F. (2010). Probabilistic temporal inference on reconstructed 3D scenes. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Schöps, T., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos in unstructured scenes. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Shan, Q., Curless, B., Furukawa, Y., Hernandez, C., & Seitz, S. M. (2014) Occluding contours for multi-view stereo. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).

  • Stent, S., Gherardi, R., Stenger, B., & Cipolla, R. (2015) Detecting change for multi-view, long-term surface inspection. In British machine vision conference (BMVC).

  • Sünderhauf, N., Shirazi, S., Jacobson, A., Ferab, D., Pepperell, E., Upcroft, B., et al. (2015). Place recognition with ConvNet landmarks: Viewpoint-robust, condition-robust, training-free. In Robotics: Science and systems (RSS).

  • Taneja, A., Ballan, L., & Pollefeys, M. (2011). Image based detection of geometric changes in urban environments. In International conference on computer vision (ICCV) (pp. 2336–2343).

  • Taneja, A., Ballan, L., & Pollefeys, M. (2013). City-scale change detection in cadastral 3D models using images. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Tola, E., Lepetit, V., & Fua, P. (2010). DAISY: An efficient dense descriptor applied to wide-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5), 815–830.

    Article  Google Scholar 

  • Ulusoy, A. O., & Mundy, J. L. (2014). Image-based 4-D reconstruction using 3-D change detection. In European conference on computer vision (ECCV) (pp. 31–45).

  • Urban, S., & Weinmann, M. (2015). Finding a good feature detector-descriptor combination for the 2D keypoint-based registration of TLS point clouds. ISPRS annals of photogrammetry, remote sensing and spatial information sciences (pp. 121–128).

  • Vedaldi, A., & Lenc, K. (2015). Matconvnet—convolutional neural networks for matlab. In Proceedings of the 23rd ACM international conference on Multimedia.

  • Wolcott, R. W., & Eustice, R. M. (2014). Visual localization within LIDAR maps for automated urban driving. In IEEE/RSJ International conference on intelligent robots and systems (IROS).

  • Zeisl, B., Georgel, P. F., Schweiger, F., Steinbach, E., & Navab, N. (2009). Estimation of location uncertainty for scale invariant feature points. In British machine vision conference(BMVC).

  • Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J. Y., Jin, H., Funkhouser, T. (2016). Physically-based rendering for indoor scene understanding using convolutional neural networks. arXiv preprint arXiv:1612.07429.

  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015) Object detectors emerge in deep scene CNNs. In International conference on learning representations (ICLR).

Download references

Acknowledgements

We are grateful to Toshiba Research Europe for supporting the project and hosting the dataset, to the conference reviewers for their helpful feedback, to the Spanish MEC Project TRA2014-57088-C2-1-R and to NVIDIA Corporation for their generous support with hardware. This work was mainly done while the authors were working and interning at Toshiba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo F. Alcantarilla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alcantarilla, P.F., Stent, S., Ros, G. et al. Street-view change detection with deconvolutional networks. Auton Robot 42, 1301–1322 (2018). https://doi.org/10.1007/s10514-018-9734-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-018-9734-5

Keywords

Navigation