short-paper

Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames

Authors:
Kotaro Oomori

Preferred Networks, Japan

Preferred Networks, Japan

0009-0007-6679-9758
View Profile

,
Wataru Kawabe

University of Tokyo, Japan

University of Tokyo, Japan

0000-0001-5547-3831
View Profile

,
Fabrice Matulic

Preferred Networks, Japan

Preferred Networks, Japan

0000-0002-1804-631X
View Profile

,
Takeo Igarashi

University of Tokyo, Japan

University of Tokyo, Japan

0000-0002-5495-6441
View Profile

,
Keita Higuchi

Preferred Networks, Japan

Preferred Networks, Japan

0009-0000-6054-8471
View Profile

ISS Companion '23: Companion Proceedings of the 2023 Conference on Interactive Surfaces and SpacesNovember 2023Pages 69https://doi.org/10.1145/3626485.3626546

Published:05 November 2023Publication History

ISS Companion '23: Companion Proceedings of the 2023 Conference on Interactive Surfaces and Spaces

Pages 69

ABSTRACT

This demonstration is invited from ISS 2023 paper track for https://doi.org/10.1145/3626476. Segmenting and determining the 3D bounding boxes of objects of interest in RGB videos is an important task for a variety of applications such as augmented reality, navigation, and robotics. Supervised machine learning techniques are commonly used for this, but they need training datasets: sets of images with associated 3D bounding boxes manually defined by human annotators using a labelling tool. However, precisely placing 3D bounding boxes can be difficult using conventional 3D manipulation tools on a 2D interface. To alleviate that burden, we propose a novel technique with which 3D bounding boxes can be created by simply drawing 2D bounding rectangles on multiple frames of a video sequence showing the object from different angles. The method uses reconstructed dense 3D point clouds from the video and computes tightly fitting 3D bounding boxes of desired objects selected by back-projecting the 2D rectangles. We show concrete application scenarios of our interface, including training dataset creation and editing 3D spaces and videos. An evaluation comparing our technique with a conventional 3D annotation tool shows that our method results in higher accuracy. We also confirm that the bounding boxes created with our interface have a lower variance, likely yielding more consistent labels and datasets.

Index Terms

Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools

Recommendations

Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames

Segmenting and determining the 3D bounding boxes of objects of interest in RGB videos is an important task for a variety of applications such as augmented reality, navigation, and robotics. Supervised machine learning techniques are commonly used for ...
Read More
Semi-supervised multi-instance multi-label learning for video annotation task
MM '12: Proceedings of the 20th ACM international conference on Multimedia

Traditional approaches for automatic video annotation usually represent one video clip with a flat feature vector, neglecting the fact that video data contain natural structures. It is also noteworthy that a video clip is often relevant to multiple ...
Read More
Adaptive graph guided embedding for multi-label annotation
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence

Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISS Companion '23: Companion Proceedings of the 2023 Conference on Interactive Surfaces and Spaces
November 2023
113 pages
ISBN:9798400704253
DOI:10.1145/3626485
Editors:
Jacob Biehl,
Scott Carter,
Andrés Lucero,
Ville Mäkelä,
Florian Alt
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D annotation tool
dataset creation
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of533submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 30
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames

ISS Companion '23: Companion Proceedings of the 2023 Conference on Interactive Surfaces and Spaces

ABSTRACT

Cited By

Index Terms

Recommendations

Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view Frames

Semi-supervised multi-instance multi-label learning for video annotation task

Adaptive graph guided embedding for multi-label annotation