Lifetime and Deployment Limits for Mobile, 3D-Perceptual Applications

Liu, Yan; Li, Yun; Johnsson, Lennart; Chien, Andrew A.

doi:10.1007/978-3-319-39907-2_30

Yan Liu¹⁵,
Yun Li¹⁵,
Lennart Johnsson^15,16 &
…
Andrew A. Chien^15,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9740))

Included in the following conference series:

International Conference on Virtual, Augmented and Mixed Reality

6591 Accesses

Abstract

Low-cost image and depth sensors (RGBD) promise a wealth of new applications as mobile computing devices become aware of the 3D structure of their environs. However, while sensors are now cheap and readily available, the computational demands for even basic 3D services such as model-building and tracking are significant. We assess these requirements of a basic 3D service that would be required to support many proposed 3D applications, building an analytical model calibrated with detailed empirical measurements. Our results show that both cooperative use of ensembles of mobile devices and adaptive 3D sensor data processing are important to bring compute requirements into feasible ranges.

You have full access to this open access chapter, Download conference paper PDF

Large-scale, real-time 3D scene reconstruction on a mobile device

Article 24 February 2017

Scalable 6-DOF Localization on Mobile Devices

Collaborative Mobile 3D Reconstruction of Urban Scenes

1 Introduction and Objectives

The advent of low-cost IR sensors with real-time RGBD (depth) data is stimulating new applications that use 3D knowledge to create new user experiences. Such 3D sensing and computation is demanding for mobile devices. We perform a rigorous study of a 3D tracking service to support mobile 3D applications, characterizing feasible battery life with ensembles of todays wearables, smartphones, tablets, and laptops. First, using a combined metric (lifetime-speed), we compare ensemble capability. Second, using a lifetime metric, we bound realistic application times on a variety of ensembles. Most are quite short, even at low frame rates, lifetimes of a few hours, and at 30 fps, a few minutes. Third, we explore cloud support, showing that Wifi-based support is possible, LTE-based support is not – communication consumes too much energy. Finally, we assess the opportunity to improve lifetime by adapting resolution and frame rate, showing a 6-fold potential improvement.

2 Application and Performance Model, Data Comparison

2.1 SLAMBench: A 3D Perception Service

We describe an analytical model for SLAM [5, 10] computation and compare it to real-time system measurements. SLAM includes three major steps: de-noising sensor depth data, frame alignment with the scene model, and model update. In SLAMBench [13], used in our experiments, de-noising use a bilateral filter with Gaussian weights, alignment an ICP [1] algorithm using point-to-plane matching [3, 14] and projective mapping [2], and frame integration with the model using a Truncated Signed Distance Function [4] then raycasting to generate the updated model point-cloud. Our analytical model estimates for the SLAMBench stages are summarized in Table 1 (Fig. 1).

Table 1. A model for simultaneous localization and mapping (floating-point operations/frame)

Full size table

Many scene-dependent factors can affect the precise computation count for 3D modelling. For example, the number of model voxels in the field of view determines the number of voxels that need to be updated for model integration and raycasting, and thus affecting the integration and raycasting computation counts. While full analysis of such scene dependence is beyond the scope of this paper, we provide a simple approximation that uses simple factors for the major element (see subpoint 2 in Table 1). These factors are based on offline analysis for the specific experiments used in this paper; an online method is a good area for future work.

2.2 Basic Characterization and Data Comparison

In this section, we present measurements from a range of experiments, comparing them to our analytical model. We collected $512 \times 424$ depth images at 30 fps using the Microsoft Kinect V2 sensor that is moving along a 2-meter track. To explore a key dimension of computational challenge, we vary the distance (sensor to scene) from 1.5 m (Near) to 2 m (Far) as well as the rate of camera movement from 0.06 m/s (Slow) to 0.2 m/s (Fast).

Timing and instruction counts are collected on an Intel i5-3350P CPU. We use downrezing (reducing the remove/depth resolution) and subsetting frames we use can compare a range of frame rates and sensor resolutions with the same experimental data. This data is presented in Figs. 2 and 3. To compare measured results with the model, we convert our floating-point counts from the model are to instructions with an average ratio of 2:1 based on overall observed averages. The model captures the key features of required computation, matching measurements well. For example, key features, such as linear growth with increases in pixels/frame and frame rate are captured clearly (see Figs. 2 and 3).

3 Feasibility of Deploying 3D Perception: Ensembles and Lifetimes

3.1 Lifetime-Speed Metric

We explore the execution of a 3D services, SLAMbench, that performs simultaneous localization and mapping (SLAM), essential for 3D aware navigation, rendering, or just location in future 3D applications. We take advantage of SLAMbench’s partitioned structure, mapping its six stages to a variety of different device ensembles. Our goal is to understand the capabilities and lifetimes of all interesting combinations of wearables, smartphones, tablets, and laptops (specifications see Table 2). To assess the usability of the service over a period of time, we define two metrics:

$$\begin{aligned} LS = \text {Lifetime-Speed product} = \# \text {frames computable} \times \text {maximum tracking rate} \end{aligned}$$

$$\begin{aligned} L = \text {Lifetime} = (\# \text {frames computable} / \text {maximum tracking rate}) / 3600 \end{aligned}$$

Table 2. Compute, battery and weight specs for device classes

Full size table

3.2 Single Devices and Ensembles

We compare devices and ensembles, using the LS metric. First Fig. 4 compares single device for two sensing resolutions. As expected, using lower resolution significantly increases lifetime, and the largest devices have the longest lifetimes.

Next we consider two device ensembles in Fig. 5. With two devices, the decomposition of SLAMbench across the devices is important. Our results show that lifetime is greatest for those deployments that put the computationally intense stages on the larger devices, and for lower sensing resolution. Three device ensemble data is presented in Fig. 6, and the results show that decomposition is even more complex, but the best configurations have the same property – the computationally intense stages on the largest devices.

Overall, Figs. 4, 5, and 6 show that for appropriate computation mapping, LS is mostly determined by the largest device. For example, at resolution $512 \times 424$ in Fig. 5, maximum LS values achieved are 10,000 for smartphone, 70,000 for tablet, and 200,000 for laptops. Best LS is achieved when computation maps to the largest device (heaviest).

3.3 Lifetimes for Weight-Comparable Ensembles

For mobile users, the dominant issue on whether to take a device along may be its weight. Our results show that larger devices have greater capability, but at a cost in portability. To see if the larger devices are better only because of their greater size (bang for the gram), Figs. 7 and 8 consider the best configuration for lifetime in different weight classes. Interestingly, our results show that while laptops and tablets are most capable, the smartphone is most efficient, providing the greatest lifetime for its weight. However, this is only at low performance, 1 fps. Its lifetime falls to a few minutes at 30 fps.

Table 3. Energy/bit for various network technologies [6, 9]

Full size table

3.4 Communication Limits

For ensembles, distributing the 3D service computation means that data must be transmitted between devices. Here we examine the energy cost of that communication within an ensemble, comparing it to energy expended on computation (Fig. 9). In nearly all cases, the computation energy dominates; communication energy manageable for Bluetooth and WiFi. However, LTE is too expensive to use in ensemble 3D service, requiring over $90\,\%$ of energy required for communication. This suggests that even with advances in LTE efficiency cloud-based SLAM or even partial cloud-based SLAM is unlikely to be viable (Table 3).

4 Adaptive Control to Reduce Computation

4.1 Single Frame Rate and Resolution

Others have explored novel and customized data structures to reduce the computation cost of 3D model building and tracking [7, 8, 11, 12, 15, 16]. Typically, point-cloud based reconstruction has high levels of redundancy, enabling robust reconstruction. The ICP algorithm can support SLA with lower resolutions and frame rates. As a baseline, we consider the highest resolution frames ($512 \times 424$) and maximum frame rate can be achieved to create the baseline for both computation and accuracy. We assumed that the mean absolute Trajectory error (ATE) must be kept within 10 % of the best possible, and show results for the lifetime improvement in Fig. 10. By picking the best rate and resolution, our study shows that the lifetime for a range of ensembles can be increased by up to 6-fold.

Interesting, these results appear to be consistent over a range of movement speeds and scene distances. Even in Far-Fast, 6-fold improvements are possible by using low resolution. In fact, the results are nearly as good as for Near-Slow.

4.2 Best Collection of Frames – Rate and Resolution

While the previous comparison assumed a single fixed, optimal choice, there is much opportunity to adapt at a finer temporal scale. To understand the potential of per frame adaptive resolution and frame rate adaptive control, we search exhaustively for the best combinations of resolution and frame rate in a 3-segment movement pattern. We first collect the depth images with Microsoft Kinect V2 camera moving at 0.5 m/s (Superfast) for a 4-second movement experiment. Second we split the collected depth images into the three segments, dividing equally by distance along the movement track. Finally we consider all possible resolution and frame rate combinations for each segment and compute the mean ATE. Each set of choices produces both a total data volume, and an ATE. Each becomes a point in the 2D scatterplots shown in Figs. 11 and 12. Our results show a remarkable dynamic range of over 300-fold at close to the same ATE.

Our results show that choices in adaptation matter a great deal as low data size adaptive control can achieve both very high and low mean ATE. But, the results are encouraging for adaptation because there are low data size adaptive control that matches the best mean ATE (see Figs. 11 and 12. For example, 1,200 kbits over the 4 s experiment is only 40 KB/s, but delivers close to best mean ATE. Likewise, a small fraction of the frames (40 out of 120) or 10 fps achieves close to best mean ATE even for this high speed motion. In short, if some intelligent adaptive control can choose close to the optimal, a remarkably small amount of data is required while producing near lowest mean ATE.

The smaller data – resolution and frame rate – also dramatically reduces the computation required. To assess the potential benefit, we compute the computation cost savings with two simple adaptive control algorithms: (1) Fixed frame rate, adapt resolution based on mean depth of point cloud, and (2) Fixed resolution, adapt frame rate based on tracked sensor velocity. These results are shown in Fig. 13.

Our results show that choosing the best of these two adaptive control algorithm, enables a 160-fold computation cost saving while achieving near lowest ATE. (see Fig. 13). This suggests adaptive control is promising to save energy (communication and computation) while maintaining a high tracking accuracy.

5 Summary and Future Work

Our study of a 3D perception service sheds insights into viable ensembles. At low frame rates, smartphones can support simple applications today. For fast motion, larger devices are required for peak compute speed and lifetime. Cloud support is not feasible. Adaptive frame rate and resolution is a promising approach to save energy. Future efforts should consider a broader range of devices, and movement.

References

Besl, P.J., Mckay, H.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)
Article Google Scholar
Blais, G., Levine, M.D.: Registering multiview range data to create 3D computer objects. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 820–824 (1995)
Article Google Scholar
Chen, Y., Medioni, G.: Object modeling by registration of multiple range images. In: Proceedings of IEEE Conference on Robotics and Automations, Sacremento, pp. 2724–2729 (1991)
Google Scholar
Curless, B.: A volumetric method for building complex models from range images. In: Proceedings of the SIGGRAPH, pp. 303–312 (1996)
Google Scholar
Dissanayake, G., Clark, S., Newman, P., Durrant-Whyte, H., Csorba, M.: Estimating uncertain spatial relationships in robotics. IEEE Trans. Rob. Autom. 17(3), 229–241 (2001)
Article Google Scholar
Huang, J., Qian, F., Alexandre Gerber, Z., Mao, M., Sen, S., Spatscheck, O.: A close examination of performance and power characteristics of 4G LTE networks. In: International Conference on Mobile Systems, pp. 225–238 (2012)
Google Scholar
Klingensmith, M., Dryanovski, I., Srinivasa, S., Xiao, J.: Chisel: real time large scale 3D reconstruction onboard a mobile device. In: Robotics Science and Systems 2015, July 2015
Google Scholar
Kottas, Dimitrios G., Hesch, Joel A., Bowman, Sean L., Roumeliotis, Stergios I.: On the consistency of vision-aided inertial navigation. In: Desai, Jaydev P., Dudek, Gregory, Khatib, Oussama, Kumar, Vijay (eds.) Experimental Robotics. STAR, vol. 88, pp. 303–317. Springer, Heidelberg (2013)
Chapter Google Scholar
Lauridsen, M., Nol, L., Srensen, T.B., Mogensen, P.: An empirical LTE smartphone power model with a view to energy efficiency evolution. Intel Technol. J. 18(1), 172–193 (2014)
Google Scholar
Leonard, J.J., Durrant-Whyte, H.F.: Mobile robot localization by tracking geometric beacons. IEEE Trans. Rob. Autom. 7(3), 376–382 (1991)
Article Google Scholar
Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: FastSLAM: a factored solution to the simultaneous localization and mapping problem. In: AAAI National Conference on Artificial Intelligence, pp. 593–598 (2003)
Google Scholar
Mourikis, A.I., Roumeliotis, S.I.: A multi-state constraint kalman filter for vision-aided inertial navigation. In: Proceedings - IEEE International Conference on Robotics and Automation, pp. 3565–3572 (2007)
Google Scholar
Nardi, L., Bodin, B., Zia, M.Z., Mawer, J.: Introducing slambench, a performance and accuracy benchmarking methodology for SLAM. Eprint Arxiv (2014)
Google Scholar
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: 3DIM, pp. 145–152 (2001)
Google Scholar
Schops, T., Engel, J., Cremers, D.: Semi-dense visual odometry for ar on a smartphone. In: 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 145–150 (2014)
Google Scholar
Schöps, T., Sattler, T., Häne, C., Pollefeys, M.: 3D Modeling on the go: interactive 3D reconstruction of large-scale scenes on mobile devices. In: 2015 International Conference on 3D Vision (3DV), pp. 291–299, 19–22 October 2015
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Science Foundation under Award CNS-1405959. We also gratefully acknowledge generous support from Intel, HP, and the Seymour Goodman Foundation.

Author information

Authors and Affiliations

University of Chicago, Chicago, IL, 60637, USA
Yan Liu, Yun Li, Lennart Johnsson & Andrew A. Chien
University of Houston, Houston, TX, 77004, USA
Lennart Johnsson
Argonne National Laboratory, Lemont, IL, 60439, USA
Andrew A. Chien

Authors

Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yun Li
View author publications
You can also search for this author in PubMed Google Scholar
Lennart Johnsson
View author publications
You can also search for this author in PubMed Google Scholar
Andrew A. Chien
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Liu .

Editor information

Editors and Affiliations

Federal Solutions Division, Design Interactive, Inc., Orlando, Florida, USA
Stephanie Lackey
Institute for Simulation and Training, University of Central Florida, Orlando, Florida, USA
Randall Shumaker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Li, Y., Johnsson, L., Chien, A.A. (2016). Lifetime and Deployment Limits for Mobile, 3D-Perceptual Applications. In: Lackey, S., Shumaker, R. (eds) Virtual, Augmented and Mixed Reality. VAMR 2016. Lecture Notes in Computer Science(), vol 9740. Springer, Cham. https://doi.org/10.1007/978-3-319-39907-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-39907-2_30
Published: 19 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39906-5
Online ISBN: 978-3-319-39907-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lifetime and Deployment Limits for Mobile, 3D-Perceptual Applications

Abstract

Similar content being viewed by others

Large-scale, real-time 3D scene reconstruction on a mobile device

Scalable 6-DOF Localization on Mobile Devices

Collaborative Mobile 3D Reconstruction of Urban Scenes

1 Introduction and Objectives