Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

Arnav Varma; Hemang Chawla; Bahram Zonooz; Elahe Arani

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

Topics: 3D Deep Learning; Deep Learning for Visual Understanding ; Vision for Robotics

In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4 VISAPP: VISAPP, 758-769, 2022

Authors: Arnav Varma ; Hemang Chawla ; Bahram Zonooz and Elahe Arani

Affiliation: Advanced Research Lab, NavInfo Europe, The Netherlands

Keyword(s): Transformers, Convolutional Neural Networks, Monocular Depth Estimation, Camera Self-calibration, Self-Supervised Learning.

Abstract: The advent of autonomous driving and advanced driver assistance systems necessitates continuous developments in computer vision for 3D scene understanding. Self-supervised monocular depth estimation, a method for pixel-wise distance estimation of objects from a single camera without the use of ground truth labels, is an important task in 3D scene understanding. However, existing methods for this task are limited to convolutional neural network (CNN) architectures. In contrast with CNNs that use localized linear operations and lose feature resolution across the layers, vision transformers process at constant resolution with a global receptive field at every stage. While recent works have compared transformers against their CNN counterparts for tasks such as image classification, no study exists that investigates the impact of using transformers for self-supervised monocular depth estimation. Here, we first demonstrate how to adapt vision transformers for self-supervised monocular dept h estimation. Thereafter, we compare the transformer and CNN-based architectures for their performance on KITTI depth prediction benchmarks, as well as their robustness to natural corruptions and adversarial attacks, including when the camera intrinsics are unknown. Our study demonstrates how transformer-based architecture, though lower in run-time efficiency, achieves comparable performance while being more robust and generalizable. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.217.158.184

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Varma, A.; Chawla, H.; Zonooz, B. and Arani, E. (2022). Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP; ISBN 978-989-758-555-5; ISSN 2184-4321, SciTePress, pages 758-769. DOI: 10.5220/0010884000003124

@conference{visapp22,
author={Arnav Varma. and Hemang Chawla. and Bahram Zonooz. and Elahe Arani.},
title={Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics},
booktitle={Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP},
year={2022},
pages={758-769},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010884000003124},
isbn={978-989-758-555-5},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP
TI - Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
SN - 978-989-758-555-5
IS - 2184-4321
AU - Varma, A.
AU - Chawla, H.
AU - Zonooz, B.
AU - Arani, E.
PY - 2022
SP - 758
EP - 769
DO - 10.5220/0010884000003124
PB - SciTePress