EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames

Gehrig, Daniel; Rebecq, Henri; Gallego, Guillermo; Scaramuzza, Davide

doi:10.1007/s11263-019-01209-w

EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames

Published: 22 August 2019

Volume 128, pages 601–618, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

2810 Accesses
87 Citations
6 Altmetric
Explore all metrics

A Correction to this article was published on 20 September 2019

This article has been updated

Abstract

We present EKLT, a feature tracking method that leverages the complementarity of event cameras and standard cameras to track visual features with high temporal resolution. Event cameras are novel sensors that output pixel-level brightness changes, called “events”. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the same scene pattern can produce different events depending on the motion direction, establishing event correspondences across time is challenging. By contrast, standard cameras provide intensity measurements (frames) that do not depend on motion direction. Our method extracts features on frames and subsequently tracks them asynchronously using events, thereby exploiting the best of both types of data: the frames provide a photometric representation that does not depend on motion direction and the events provide updates with high temporal resolution. In contrast to previous works, which are based on heuristics, this is the first principled method that uses intensity measurements directly, based on a generative event model within a maximum-likelihood framework. As a result, our method produces feature tracks that are more accurate than the state of the art, across a wide variety of scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 16

Asynchronous, Photometric Feature Tracking Using Events and Frames

Continuous-Time Intensity Estimation Using Event Cameras

Events-to-Frame: Bringing Visual Tracking Algorithm to Event Cameras

Change history

20 September 2019
The original version of this article was unfortunately omitted to publish the footnote “The best result per row is highlighted in bold” in Table 7. This has been corrected by publishing this erratum. The correct version of Table 7 with the caption has been given below:

Notes

Event cameras such as the DVS (Lichtsteiner et al. 2008) respond to logarithmic brightness changes, i.e., $L\doteq \log I$, with brightness signal I, so that (1) represents logarithmic changes.
Eq. (3) can be shown (Gallego et al. 2015) by substituting the brightness constancy assumption (i.e., optical flow constraint) $ \frac{\partial L}{\partial t}(\mathbf {u}(t),t) + \nabla L(\mathbf {u}(t),t) \cdot \dot{\mathbf {u}}(t) = 0, $ with image-point velocity $\mathbf {v}\equiv \dot{\mathbf {u}}$, in Taylor’s approximation $\Delta L(\mathbf {u},t) \doteq L(\mathbf {u},t) - L(\mathbf {u},t - \Delta \tau ) \approx \frac{\partial L}{\partial t}(\mathbf {u},t) \Delta \tau $.
The datasets are publicly available at: http://rpg.ifi.uzh.ch/direct_event_camera_tracking/.
Code can be found here: https://github.com/uzh-rpg/rpg_feature_tracking_analysis.

References

Agarwal, S., Mierle, K., et al. (2010–2019). Ceres solver. http://ceres-solver.org.
Alzugaray, I., & Chli, M. (2018). Asynchronous corner detection and tracking for event cameras in real time. IEEE Robotics and Automation Letters, 3(4), 3177–3184.
Article Google Scholar
Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3), 221–255.
Article Google Scholar
Bardow, P., Davison, A. J., & Leutenegger, S. Simultaneous optical flow and intensity estimation from an event camera. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 884–892).
Barranco, F., Teo, CL., Fermuller, C., & Aloimonos, Y. (2015). Contour detection and characterization for asynchronous event sensors. In International conference on computer and vision (ICCV).
Benosman, R., Ieng, S.-H., Clercq, C., Bartolozzi, C., & Srinivasan, M. (2012). Asynchronous frameless event-based optical flow. Neural Networks, 27, 32–37.
Article Google Scholar
Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis & Machine Intelligence, 14(2), 239–256.
Article Google Scholar
Brandli, C., Berner, R., Yang, M., Liu, S.-C., & Delbruck, T. (2014). A 240 $\times $ 180 130 dB 3us latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10), 2333–2341.
Article Google Scholar
Bryner, S., Gallego, G., Rebecq, H., & Scaramuzza, D. (2019). Event-based, direct camera tracking from a photometric 3D map using nonlinear optimization. In IEEE international conference on robotics and automation (ICRA).
Chaudhry, R., Ravichandran, A., Hager, G., & Vidal, R. Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1932–1939).
Clady, X., Ieng, S.-H., & Benosman, R. (2015). Asynchronous event-based corner detection and matching. Neural Networks, 66, 91–106.
Article Google Scholar
Clady, X., Maro, J.-M., Barré, S., & Benosman, R. B. (2017). A motion-based feature for event-based pattern recognition. Frontiers in Neuroscience, 10, 594.
Article Google Scholar
Delmerico, J., Cieslewski, T., Rebecq, H., Faessler, M., & Scaramuzza, D. (2019). Are we ready for autonomous drone racing?. In IEEE international conference on robotics and automation (ICRA). The UZH-FPV Drone Racing Dataset.
Evangelidis, G. D., & Psarakis, E. Z. (2008). Parametric image alignment using enhanced correlation coefficient maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1858–1865.
Article Google Scholar
Forster, C., Zhang, Z., Gassner, M., Werlberger, M., & Scaramuzza, D. (2017). SVO: Semidirect visual odometry for monocular and multicamera systems. IEEE Transactions on Robotics, 33(2), 249–265.
Article Google Scholar
Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., et al. (2019). Event-based vision: A survey. arXiv:1904.08405.
Gallego, G., Forster, C., Mueggler, E., & Scaramuzza, D. (2015). Event-based camera pose tracking using a generative event model. arXiv:1510.01972.
Gallego, G., Lund, J. E. A., Mueggler, E., Rebecq, H., Delbruck, T., & Scaramuzza, D. (2018). Event-based, 6-DOF camera tracking from photometric depth maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10), 2402–2412.
Article Google Scholar
Gallego, G., Rebecq, H., & Scaramuzza, D. (2018). A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3867–3876).
Gallego, G., & Scaramuzza, D. (2017). Accurate angular velocity estimation with an event camera. IEEE Robotics and Automation Letters, 2(2), 632–639.
Article Google Scholar
Gehrig, D., Rebecq, H., Gallego, G., & Scaramuzza, D. (2018). Asynchronous, photometric feature tracking using events and frames. In European conference on computer vision (ECCV) (pp. 766–781).
Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Proceedings of the fourth alvey vision conference (Vol. 15, pp. 147–151).
Kim, H., Handa, A., Benosman, R., Ieng, S.-H., & Davison, A. J. (2014). Simultaneous mosaicing and tracking with an event camera. In British machine vision conference (BMVC).
Klein, G., & Murray, D. (2009). Parallel tracking and mapping on a camera phone. In IEEE ACM international symposium mixed and augmented reality (ISMAR).
Kogler, J., Sulzbachner, C., Humenberger, M., & Eibensteiner, F. Address-event based stereo vision with bio-inspired silicon retina imagers. In Advances in theory and applications of stereo vision (pp. 165–188). InTech.
Kueng, B., Mueggler, E., Gallego, G., & Scaramuzza, D. (2016). Low-latency visual odometry using event-based feature tracks. In IEEE international conference on intelligent robots and systems (IROS) (pp. 16–23).
Lagorce, X., Meyer, C., Ieng, S.-H., Filliat, D., & Benosman, R. (2015). Asynchronous event-based multikernel algorithm for high-speed visual features tracking. IEEE Transactions on Neural Networks and Learning Systems, 26(8), 1710–1720.
Article MathSciNet Google Scholar
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128$\times $128 120 dB 15 $\mu $s latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.
Article Google Scholar
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International joint conference on artificial intelligence (IJCAI) (pp. 674–679).
Maqueda, A. I., Loquercio, A., Gallego, G., García, N., & Scaramuzza, D. (2018). Event-based vision meets deep learning on steering prediction for self-driving cars. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5419–5427).
Mueggler, E., Bartolozzi, C., & Scaramuzza, D. (2017). Fast event-based corner detection. In British machine vision conference (BMVC).
Mueggler, E., Huber, B., & Scaramuzza, D. (2014). Event-based, 6-DOF pose tracking for high-speed maneuvers. In IEEE international conference on intelligent robots and systems (IROS) (pp. 2761–2768). Event camera animation: https://youtu.be/LauQ6LWTkxM?t=25.
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. The International Journal of Robotics Research, 36(2), 142–149.
Article Google Scholar
Munda, G., Reinbacher, C., & Pock, T. (2018). Real-time intensity-image reconstruction for event cameras using manifold regularisation. International Journal of Computer Vision, 126(12), 1381–1393.
Article Google Scholar
Mur-Artal, R., Montiel, J. M. M., & Tardós, J. D. (2015). ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 31(5), 1147–1163.
Article Google Scholar
Ni, Z., Bolopion, A., Agnus, J., Benosman, R., & Régnier, S. (2012). Asynchronous event-based visual shape tracking for stable haptic feedback in microrobotics. IEEE Transactions on Robotics, 28(5), 1081–1089.
Article Google Scholar
Ni, Z., Ieng, S.-H., Posch, C., Régnier, S., & Benosman, R. (2015). Visual tracking using neuromorphic asynchronous event-based cameras. Neural Computation, 27(4), 925–953.
Article MathSciNet Google Scholar
Rebecq, H., Gallego, G., Mueggler, E., & Scaramuzza, D. (2018). EMVS: Event-based multi-view stereo—3D reconstruction with an event camera in real-time. International Journal of Computer Vision, 126(12), 1394–1414.
Article Google Scholar
Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2017). Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization. In British machine vision conference (BMVC).
Rebecq, H., Horstschäfer, T., Gallego, G., & Scaramuzza, D. (2017). EVO: A geometric approach to event-based 6-DOF parallel tracking and mapping in real-time. IEEE Robotics and Automation Letters, 2(2), 593–600.
Article Google Scholar
Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, S. (2019). Events-to-video: Bringing modern computer vision to event cameras. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3857–3866).
Reinbacher, C., Graber, G., & Pock, T. (2016). Real-time intensity-image reconstruction for event cameras using manifold regularisation. In British machine vision conference (BMVC).
Rosten, E., & Drummond, T. (2006). Machine learning for high-speed corner detection. In European conference on computer vision (ECCV) (pp. 430–443).
Scheerlinck, C., Barnes, N., & Mahony, R. (2018). Continuous-time intensity estimation using event cameras. In Asian conference on computer vision (ACCV).
Tedaldi, D., Gallego, G., Mueggler, E., & Scaramuzza, D. (2016). Feature detection and tracking with the dynamic and active-pixel vision sensor (DAVIS). In International conference on event-based control, communication and signal processing (EBCCSP).
Vasco, V., Glover, A., & Bartolozzi, C. (2016). Fast event-based Harris corner detection exploiting the advantages of event-driven cameras. In IEEE international conference on intelligent robots and systems (IROS).
Vidal, A. R., Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2018). Ultimate SLAM? Combining events, images, and IMU for robust visual SLAM in HDR and high speed scenarios. IEEE Robotics and Automation Letters, 3(2), 994–1001.
Article Google Scholar
Zhou, H., Yuan, Y., & Shi, C. (2009). Object tracking using SIFT features and mean shift. Computer Vision and Image Understanding, 113(3), 345–352.
Article Google Scholar
Zhu, A. Z., Atanasov, N., & Daniilidis, K. (2017) Event-based feature tracking with probabilistic data association. In IEEE international conference on robotics and automation (ICRA) (pp. 4465–4470).
Zhu, A. Z., Thakur, D., Ozaslan, T., Pfrommer, B., Kumar, V., & Daniilidis, K. (2018). The multivehicle stereo event camera dataset: An event camera dataset for 3D perception. IEEE Robotics and Automation Letters, 3(3), 2032–2039.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the DARPA FLA program, the Swiss National Center of Competence Research Robotics, through the Swiss National Science Foundation, and the SNSF-ERC starting grant.

Author information

Authors and Affiliations

Robotics and Perception Group, Department of Informatics, University of Zurich, Zurich, Switzerland
Daniel Gehrig, Henri Rebecq, Guillermo Gallego & Davide Scaramuzza
Department of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
Daniel Gehrig, Henri Rebecq, Guillermo Gallego & Davide Scaramuzza

Authors

Daniel Gehrig
View author publications
You can also search for this author in PubMed Google Scholar
Henri Rebecq
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Gallego
View author publications
You can also search for this author in PubMed Google Scholar
Davide Scaramuzza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Gehrig.

Additional information

Communicated by Vittorio Ferrari.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised due to an error in the footnote of Table 7.

Multimedia Material: A supplemental video for this work is available at https://youtu.be/ZyD1YPW1h4U.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 199139 KB)

A Appendix

1.1 A.1 Objective Function Comparison Against ICP-Based Method (Kueng et al. 2016)

As mentioned in Sect. 4, one of the advantages of our method is that data association between events and the tracked feature is implicitly established by the pixel-to-pixel correspondence of the compared patches (2) and (3). This means that we do not have to explicitly estimate it, as was done in Kueng et al. (2016) and Zhu et al. (2017), which saves computational resources and prevents false associations that would yield bad tracking behavior. To illustrate this advantage, we compare the cost function profiles of our method and Kueng et al. (2016) (ICP), which minimizes the alignment error (Euclidean distance) between two 2D point sets: $\{\mathbf {p}_i\}$ from the events (data) and $\{\mathbf {m}_j\}$ from the Canny edges (model),

$$\begin{aligned} \{\mathtt {R}, \mathbf {t}\} = \arg \min _{\mathtt {R}, \mathbf {t}} \sum _{(\mathbf {p}_i, \mathbf {m}_i) \in \text {Matches}}b_i \left\| \mathtt {R}\mathbf {p}_i + \mathbf {t}- \mathbf {m}_i \right\| ^2. \end{aligned}$$

(16)

Here, $\mathtt {R}$ and $\mathbf {t}$ are the alignment parameters and $b_i$ are weights. At each step, the association between events and model points is done by assigning each $\mathbf {p}_i$ to the closest point $\mathbf {m}_j$ and rejecting matches which are too far apart ($> {3}\,\mathrm{pixel}$). By varying the parameter $\mathbf t $ around the estimated value while fixing $\mathtt {R}$ we obtain a slice of the cost function profile. The resulting cost function profiles for our method (7) and (16) are shown in Fig. 18.

For simple black and white scenes (first row of Fig. 18), all events generated belong to strong edges. In contrast, for more complex, highly-textured scenes (second row), events are generated more uniformly in the patch. Our method clearly shows a convex cost function in both situations. In contrast, Kueng et al. (2016) exhibits several local minima and very broad basins of attraction, making exact localization of the optimal registration parameters challenging. The broadness of the basin of attraction, together with the multitude of local minima can be explained by the fact that data association changes for each alignment parameter. This means that there are several alignment parameters which may lead to partial overlapping of the point-clouds resulting in a suboptimal solution.

To show how non-smooth cost profiles affect tracking performance, we show the feature tracks in the last column of Fig. 18. The ground truth derived from KLT is marked in green. Our tracker (in blue) is able to follow the ground truth with high accuracy. On the other hand (Kueng et al. 2016) (in red) exhibits jumping behavior leading to early divergence from ground truth.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gehrig, D., Rebecq, H., Gallego, G. et al. EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames. Int J Comput Vis 128, 601–618 (2020). https://doi.org/10.1007/s11263-019-01209-w

Download citation

Received: 31 January 2019
Accepted: 05 August 2019
Published: 22 August 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11263-019-01209-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames

Abstract

Access this article

Similar content being viewed by others

Asynchronous, Photometric Feature Tracking Using Events and Frames

Continuous-Time Intensity Estimation Using Event Cameras

Events-to-Frame: Bringing Visual Tracking Algorithm to Event Cameras

Change history

20 September 2019

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

A Appendix

1.1 A.1 Objective Function Comparison Against ICP-Based Method (Kueng et al. 2016)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames

Abstract

Access this article

Similar content being viewed by others

Asynchronous, Photometric Feature Tracking Using Events and Frames

Continuous-Time Intensity Estimation Using Event Cameras

Events-to-Frame: Bringing Visual Tracking Algorithm to Event Cameras

Change history

20 September 2019

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

A Appendix

A Appendix

1.1 A.1 Objective Function Comparison Against ICP-Based Method (Kueng et al. 2016)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation