Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter (O) September 13, 2018

The Stixel world – A comprehensive representation of traffic scenes for autonomous driving

Die Stixel Welt – Eine universelle Repräsentation von Straßenszenen für autonomes Fahren
  • Lukas Schneider

    Lukas Schneider is a computer vision and machine learning engineer in the Image Understanding Group at Daimler AG. He is currently pursuing his Ph.D. at the Computer Vision and Geometry Group at ETH Zurich. His research interests include stereoscopic and semantic scene understanding for intelligent vehicles.

    EMAIL logo
    , Michael Hafner

    Dr. Michael E. Hafner, as head of Automated Driving and Active Safety, is in charge of the development of automated and fully autonomous driving at Mercedes-Benz. Systems in his area of responsibility comprise current advanced driver assistance systems, various parking systems and vehicle dynamic functions. Additionally, his team develops highly and fully automated driving up to self-driving systems. In addition to chassis E/E he controls the architecture and development of all E/E systems at Mercedes in his role as board member of the product group E/E.

    and Uwe Franke

    Uwe Franke received the Ph.D. degree in electrical engineering from the Technical University of Aachen, Germany, in 1988 for his work on content based image coding. Since 1989 he has been with Daimler Research and Development and has been constantly working on the development of vision based driver assistance systems. He developed Daimler’s lane departure warning system introduced in 2000. Since 2000 he has been head of Daimler’s Image Understanding Group. The stereo technology developed by his group is the basis for the Mercedes Benz stereo camera system introduced in 2013. Recent work is on image understanding for autonomous driving, in particular Deep Neural Networks.

Abstract

Autonomous vehicles as well as sophisticated driver assistance systems use stereo vision to perceive their environment in 3D. At least two Million 3D points will be delivered by next generation automotive stereo vision systems. In order to cope with this huge amount of data in real-time, we developed a medium level representation, named Stixel world. This representation condenses the relevant scene information by three orders of magnitude. Since traffic scenes are dominated by planar horizontal and vertical surfaces our representation approximates the three-dimensional scene by means of thin planar rectangles called Stixel.

This survey paper summarizes the progress of the Stixel world. The evolution started with a rather simple representation based on a flat world assumption. A major break-through was achieved by introducing deep-learning that allows to incorporate rich semantic information. In its most recent form, the Stixel world encodes geometric, semantic and motion cues and is capable to handle even steepest roads in San Francisco.

Zusammenfassung

Autonome Fahrzeuge und Fahrassistenzsysteme nutzen Stereokamerasysteme um ihre Umgebung in 3D zu erfassen. Typischerweise werden dabei pro Bild bis zu zwei Millionen 3D Punkte ermittelt. Um diese großen Datenmengen verarbeiten zu können haben wir eine Medium-Level Repräsentation, die Stixel-Welt, entwickelt welche die relevante Szeneninformation um drei Größenordnungen reduziert. Da Straßenszenen von vertikalen und horizontalen planaren Flächen dominiert werden, wird die drei-dimensionale Umgebung durch dünne Rechtecke, sogenannte Stixel, approximiert. Dieser Artikel fasst den Fortschritt der Stixel Welt zusammen, deren Evolution mit einer vergleichsweise simplen Repräsentation begann, die auf einer flachen-Welt-Annahme beruhte. Durch die Verwendung von Maschinellem Lernen, das die Erfassung von reichhaltiger semantischer Information erlaubt, wurde ein wesentlicher Durchbruch erzielt. In seiner neusten Form encodiert die Stixel-Welt zusätzlich zu geometrischer und semantischer Information auch die Bewegung der Objekte in der Szene. Darüber hinaus ist sie in der Lage, auch steilste Straßen wie in San Francisco korrekt zu beschreiben.

About the authors

Lukas Schneider

Lukas Schneider is a computer vision and machine learning engineer in the Image Understanding Group at Daimler AG. He is currently pursuing his Ph.D. at the Computer Vision and Geometry Group at ETH Zurich. His research interests include stereoscopic and semantic scene understanding for intelligent vehicles.

Michael Hafner

Dr. Michael E. Hafner, as head of Automated Driving and Active Safety, is in charge of the development of automated and fully autonomous driving at Mercedes-Benz. Systems in his area of responsibility comprise current advanced driver assistance systems, various parking systems and vehicle dynamic functions. Additionally, his team develops highly and fully automated driving up to self-driving systems. In addition to chassis E/E he controls the architecture and development of all E/E systems at Mercedes in his role as board member of the product group E/E.

Uwe Franke

Uwe Franke received the Ph.D. degree in electrical engineering from the Technical University of Aachen, Germany, in 1988 for his work on content based image coding. Since 1989 he has been with Daimler Research and Development and has been constantly working on the development of vision based driver assistance systems. He developed Daimler’s lane departure warning system introduced in 2000. Since 2000 he has been head of Daimler’s Image Understanding Group. The stereo technology developed by his group is the basis for the Mercedes Benz stereo camera system introduced in 2013. Recent work is on image understanding for autonomous driving, in particular Deep Neural Networks.

References

1. E. D. Dickmanns and A. Zapp, A Curvature-based Scheme for Improving Road Vehicle Guidance by Computer Vision, 1987.10.1117/12.937795Search in Google Scholar

2. Thao Dang, Martin Lauer, Philipp Bender andet al., Autonomes Fahren auf der historischen Bertha-Benz-Route, in: Themenheft: Forum Bildverarbeitung. tm - Technisches Messen, 2015.10.1515/teme-2014-0038Search in Google Scholar

3. Heiko Hirschmüller, Stereo processing by semiglobal matching and mutual information, IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2008).10.1109/TPAMI.2007.1166Search in Google Scholar PubMed

4. Hernán Badino, Uwe Franke and David Pfeiffer, The Stixel world - a compact medium level representation of the 3D world, in: DAGM Symposium, 2009.10.1007/978-3-642-03798-6_6Search in Google Scholar

5. Markus Enzweiler, Matthias Hummel, David Pfeiffer and Uwe Franke, Efficient Stixel-based object recognition, in: IEEE Intelligent Vehicles Symposium, 2012.10.1109/IVS.2012.6232137Search in Google Scholar

6. David Pfeiffer, The Stixel world - a compact Medium-level representation for efficiently modeling dynamic three-dimensional environments, Ph.D. thesis, Humboldt-Universität zu Berlin, 2011.Search in Google Scholar

7. Jonathan Long, Evan Shelhamer and Trevor Darrell, Fully convolutional networks for semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, 2015.10.1109/CVPR.2015.7298965Search in Google Scholar

8. C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, Going deeper with convolutions, in: IEEE Conference on Computer Vision and Pattern Recognition, 2015.10.1109/CVPR.2015.7298594Search in Google Scholar

9. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth and Bernt Schiele, The Cityscapes Dataset for Semantic Urban Scene Understanding, in: IEEE Conference on Computer Vision and Pattern Recognition, 2016, to appear.10.1109/CVPR.2016.350Search in Google Scholar

Received: 2018-03-06
Accepted: 2018-08-23
Published Online: 2018-09-13
Published in Print: 2018-09-25

© 2018 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 30.4.2024 from https://www.degruyter.com/document/doi/10.1515/auto-2018-0029/html
Scroll to top button