Abstract
We present EKLT, a feature tracking method that leverages the complementarity of event cameras and standard cameras to track visual features with high temporal resolution. Event cameras are novel sensors that output pixel-level brightness changes, called “events”. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the same scene pattern can produce different events depending on the motion direction, establishing event correspondences across time is challenging. By contrast, standard cameras provide intensity measurements (frames) that do not depend on motion direction. Our method extracts features on frames and subsequently tracks them asynchronously using events, thereby exploiting the best of both types of data: the frames provide a photometric representation that does not depend on motion direction and the events provide updates with high temporal resolution. In contrast to previous works, which are based on heuristics, this is the first principled method that uses intensity measurements directly, based on a generative event model within a maximum-likelihood framework. As a result, our method produces feature tracks that are more accurate than the state of the art, across a wide variety of scenes.
Similar content being viewed by others
Change history
20 September 2019
The original version of this article was unfortunately omitted to publish the footnote “The best result per row is highlighted in bold” in Table 7. This has been corrected by publishing this erratum. The correct version of Table 7 with the caption has been given below:
Notes
Eq. (3) can be shown (Gallego et al. 2015) by substituting the brightness constancy assumption (i.e., optical flow constraint) \( \frac{\partial L}{\partial t}(\mathbf {u}(t),t) + \nabla L(\mathbf {u}(t),t) \cdot \dot{\mathbf {u}}(t) = 0, \) with image-point velocity \(\mathbf {v}\equiv \dot{\mathbf {u}}\), in Taylor’s approximation \(\Delta L(\mathbf {u},t) \doteq L(\mathbf {u},t) - L(\mathbf {u},t - \Delta \tau ) \approx \frac{\partial L}{\partial t}(\mathbf {u},t) \Delta \tau \).
The datasets are publicly available at: http://rpg.ifi.uzh.ch/direct_event_camera_tracking/.
Code can be found here: https://github.com/uzh-rpg/rpg_feature_tracking_analysis.
References
Agarwal, S., Mierle, K., et al. (2010–2019). Ceres solver. http://ceres-solver.org.
Alzugaray, I., & Chli, M. (2018). Asynchronous corner detection and tracking for event cameras in real time. IEEE Robotics and Automation Letters, 3(4), 3177–3184.
Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3), 221–255.
Bardow, P., Davison, A. J., & Leutenegger, S. Simultaneous optical flow and intensity estimation from an event camera. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 884–892).
Barranco, F., Teo, CL., Fermuller, C., & Aloimonos, Y. (2015). Contour detection and characterization for asynchronous event sensors. In International conference on computer and vision (ICCV).
Benosman, R., Ieng, S.-H., Clercq, C., Bartolozzi, C., & Srinivasan, M. (2012). Asynchronous frameless event-based optical flow. Neural Networks, 27, 32–37.
Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis & Machine Intelligence, 14(2), 239–256.
Brandli, C., Berner, R., Yang, M., Liu, S.-C., & Delbruck, T. (2014). A 240 \(\times \) 180 130 dB 3us latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10), 2333–2341.
Bryner, S., Gallego, G., Rebecq, H., & Scaramuzza, D. (2019). Event-based, direct camera tracking from a photometric 3D map using nonlinear optimization. In IEEE international conference on robotics and automation (ICRA).
Chaudhry, R., Ravichandran, A., Hager, G., & Vidal, R. Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1932–1939).
Clady, X., Ieng, S.-H., & Benosman, R. (2015). Asynchronous event-based corner detection and matching. Neural Networks, 66, 91–106.
Clady, X., Maro, J.-M., Barré, S., & Benosman, R. B. (2017). A motion-based feature for event-based pattern recognition. Frontiers in Neuroscience, 10, 594.
Delmerico, J., Cieslewski, T., Rebecq, H., Faessler, M., & Scaramuzza, D. (2019). Are we ready for autonomous drone racing?. In IEEE international conference on robotics and automation (ICRA). The UZH-FPV Drone Racing Dataset.
Evangelidis, G. D., & Psarakis, E. Z. (2008). Parametric image alignment using enhanced correlation coefficient maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10), 1858–1865.
Forster, C., Zhang, Z., Gassner, M., Werlberger, M., & Scaramuzza, D. (2017). SVO: Semidirect visual odometry for monocular and multicamera systems. IEEE Transactions on Robotics, 33(2), 249–265.
Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., et al. (2019). Event-based vision: A survey. arXiv:1904.08405.
Gallego, G., Forster, C., Mueggler, E., & Scaramuzza, D. (2015). Event-based camera pose tracking using a generative event model. arXiv:1510.01972.
Gallego, G., Lund, J. E. A., Mueggler, E., Rebecq, H., Delbruck, T., & Scaramuzza, D. (2018). Event-based, 6-DOF camera tracking from photometric depth maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10), 2402–2412.
Gallego, G., Rebecq, H., & Scaramuzza, D. (2018). A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3867–3876).
Gallego, G., & Scaramuzza, D. (2017). Accurate angular velocity estimation with an event camera. IEEE Robotics and Automation Letters, 2(2), 632–639.
Gehrig, D., Rebecq, H., Gallego, G., & Scaramuzza, D. (2018). Asynchronous, photometric feature tracking using events and frames. In European conference on computer vision (ECCV) (pp. 766–781).
Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Proceedings of the fourth alvey vision conference (Vol. 15, pp. 147–151).
Kim, H., Handa, A., Benosman, R., Ieng, S.-H., & Davison, A. J. (2014). Simultaneous mosaicing and tracking with an event camera. In British machine vision conference (BMVC).
Klein, G., & Murray, D. (2009). Parallel tracking and mapping on a camera phone. In IEEE ACM international symposium mixed and augmented reality (ISMAR).
Kogler, J., Sulzbachner, C., Humenberger, M., & Eibensteiner, F. Address-event based stereo vision with bio-inspired silicon retina imagers. In Advances in theory and applications of stereo vision (pp. 165–188). InTech.
Kueng, B., Mueggler, E., Gallego, G., & Scaramuzza, D. (2016). Low-latency visual odometry using event-based feature tracks. In IEEE international conference on intelligent robots and systems (IROS) (pp. 16–23).
Lagorce, X., Meyer, C., Ieng, S.-H., Filliat, D., & Benosman, R. (2015). Asynchronous event-based multikernel algorithm for high-speed visual features tracking. IEEE Transactions on Neural Networks and Learning Systems, 26(8), 1710–1720.
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128\(\times \)128 120 dB 15 \(\mu \)s latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2), 566–576.
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International joint conference on artificial intelligence (IJCAI) (pp. 674–679).
Maqueda, A. I., Loquercio, A., Gallego, G., García, N., & Scaramuzza, D. (2018). Event-based vision meets deep learning on steering prediction for self-driving cars. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5419–5427).
Mueggler, E., Bartolozzi, C., & Scaramuzza, D. (2017). Fast event-based corner detection. In British machine vision conference (BMVC).
Mueggler, E., Huber, B., & Scaramuzza, D. (2014). Event-based, 6-DOF pose tracking for high-speed maneuvers. In IEEE international conference on intelligent robots and systems (IROS) (pp. 2761–2768). Event camera animation: https://youtu.be/LauQ6LWTkxM?t=25.
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., & Scaramuzza, D. (2017). The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. The International Journal of Robotics Research, 36(2), 142–149.
Munda, G., Reinbacher, C., & Pock, T. (2018). Real-time intensity-image reconstruction for event cameras using manifold regularisation. International Journal of Computer Vision, 126(12), 1381–1393.
Mur-Artal, R., Montiel, J. M. M., & Tardós, J. D. (2015). ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 31(5), 1147–1163.
Ni, Z., Bolopion, A., Agnus, J., Benosman, R., & Régnier, S. (2012). Asynchronous event-based visual shape tracking for stable haptic feedback in microrobotics. IEEE Transactions on Robotics, 28(5), 1081–1089.
Ni, Z., Ieng, S.-H., Posch, C., Régnier, S., & Benosman, R. (2015). Visual tracking using neuromorphic asynchronous event-based cameras. Neural Computation, 27(4), 925–953.
Rebecq, H., Gallego, G., Mueggler, E., & Scaramuzza, D. (2018). EMVS: Event-based multi-view stereo—3D reconstruction with an event camera in real-time. International Journal of Computer Vision, 126(12), 1394–1414.
Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2017). Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization. In British machine vision conference (BMVC).
Rebecq, H., Horstschäfer, T., Gallego, G., & Scaramuzza, D. (2017). EVO: A geometric approach to event-based 6-DOF parallel tracking and mapping in real-time. IEEE Robotics and Automation Letters, 2(2), 593–600.
Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, S. (2019). Events-to-video: Bringing modern computer vision to event cameras. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3857–3866).
Reinbacher, C., Graber, G., & Pock, T. (2016). Real-time intensity-image reconstruction for event cameras using manifold regularisation. In British machine vision conference (BMVC).
Rosten, E., & Drummond, T. (2006). Machine learning for high-speed corner detection. In European conference on computer vision (ECCV) (pp. 430–443).
Scheerlinck, C., Barnes, N., & Mahony, R. (2018). Continuous-time intensity estimation using event cameras. In Asian conference on computer vision (ACCV).
Tedaldi, D., Gallego, G., Mueggler, E., & Scaramuzza, D. (2016). Feature detection and tracking with the dynamic and active-pixel vision sensor (DAVIS). In International conference on event-based control, communication and signal processing (EBCCSP).
Vasco, V., Glover, A., & Bartolozzi, C. (2016). Fast event-based Harris corner detection exploiting the advantages of event-driven cameras. In IEEE international conference on intelligent robots and systems (IROS).
Vidal, A. R., Rebecq, H., Horstschaefer, T., & Scaramuzza, D. (2018). Ultimate SLAM? Combining events, images, and IMU for robust visual SLAM in HDR and high speed scenarios. IEEE Robotics and Automation Letters, 3(2), 994–1001.
Zhou, H., Yuan, Y., & Shi, C. (2009). Object tracking using SIFT features and mean shift. Computer Vision and Image Understanding, 113(3), 345–352.
Zhu, A. Z., Atanasov, N., & Daniilidis, K. (2017) Event-based feature tracking with probabilistic data association. In IEEE international conference on robotics and automation (ICRA) (pp. 4465–4470).
Zhu, A. Z., Thakur, D., Ozaslan, T., Pfrommer, B., Kumar, V., & Daniilidis, K. (2018). The multivehicle stereo event camera dataset: An event camera dataset for 3D perception. IEEE Robotics and Automation Letters, 3(3), 2032–2039.
Acknowledgements
This work was supported by the DARPA FLA program, the Swiss National Center of Competence Research Robotics, through the Swiss National Science Foundation, and the SNSF-ERC starting grant.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Vittorio Ferrari.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original version of this article was revised due to an error in the footnote of Table 7.
Multimedia Material: A supplemental video for this work is available at https://youtu.be/ZyD1YPW1h4U.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 199139 KB)
A Appendix
A Appendix
1.1 A.1 Objective Function Comparison Against ICP-Based Method (Kueng et al. 2016)
As mentioned in Sect. 4, one of the advantages of our method is that data association between events and the tracked feature is implicitly established by the pixel-to-pixel correspondence of the compared patches (2) and (3). This means that we do not have to explicitly estimate it, as was done in Kueng et al. (2016) and Zhu et al. (2017), which saves computational resources and prevents false associations that would yield bad tracking behavior. To illustrate this advantage, we compare the cost function profiles of our method and Kueng et al. (2016) (ICP), which minimizes the alignment error (Euclidean distance) between two 2D point sets: \(\{\mathbf {p}_i\}\) from the events (data) and \(\{\mathbf {m}_j\}\) from the Canny edges (model),
Here, \(\mathtt {R}\) and \(\mathbf {t}\) are the alignment parameters and \(b_i\) are weights. At each step, the association between events and model points is done by assigning each \(\mathbf {p}_i\) to the closest point \(\mathbf {m}_j\) and rejecting matches which are too far apart (\(> {3}\,\mathrm{pixel}\)). By varying the parameter \(\mathbf t \) around the estimated value while fixing \(\mathtt {R}\) we obtain a slice of the cost function profile. The resulting cost function profiles for our method (7) and (16) are shown in Fig. 18.
For simple black and white scenes (first row of Fig. 18), all events generated belong to strong edges. In contrast, for more complex, highly-textured scenes (second row), events are generated more uniformly in the patch. Our method clearly shows a convex cost function in both situations. In contrast, Kueng et al. (2016) exhibits several local minima and very broad basins of attraction, making exact localization of the optimal registration parameters challenging. The broadness of the basin of attraction, together with the multitude of local minima can be explained by the fact that data association changes for each alignment parameter. This means that there are several alignment parameters which may lead to partial overlapping of the point-clouds resulting in a suboptimal solution.
To show how non-smooth cost profiles affect tracking performance, we show the feature tracks in the last column of Fig. 18. The ground truth derived from KLT is marked in green. Our tracker (in blue) is able to follow the ground truth with high accuracy. On the other hand (Kueng et al. 2016) (in red) exhibits jumping behavior leading to early divergence from ground truth.
Rights and permissions
About this article
Cite this article
Gehrig, D., Rebecq, H., Gallego, G. et al. EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames. Int J Comput Vis 128, 601–618 (2020). https://doi.org/10.1007/s11263-019-01209-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-019-01209-w