Abstract
Simulation provides vast benefits for the field of robotics and Human-Robot Interaction (HRI). This study investigates how sensor effects seen in the real domain can be modeled in simulation and what role they play in effective Sim2Real domain transfer for learned perception models. The study considers introducing naive noise approaches such as additive Gaussian and salt and pepper noise as well as data-driven sensor effects models into simulation for representing Microsoft Kinect sensor capabilities and phenomena seen on real world systems. This study quantifies the benefit of multiple approaches to modeling sensor effects in simulation for Sim2Real domain transfer by their object classification improvements in the real domain. User studies are conducted to address hypotheses by training grounded language models in each of the sensor effects modeling cases and evaluated on the robot’s interaction capabilities in the real domain. In addition to grounded language performance metrics, user study evaluation includes surveys on the human participant’s assessment of the robot’s capabilities in the real domain. Results from this pilot study show benefits to modeling sensor noise in simulation for Sim2Real domain transfer. This study also begins to explore the effects that such models have on human-robot interactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carlson, A., Skinner, K.A., Vasudevan, R., Johnson-Roberson, M.: Modeling camera effects to improve visual learning from synthetic data. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 505–520. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_31
Carlson, A., Skinner, K.A., Vasudevan, R., Johnson-Roberson, M.: Sensor transfer: learning optimal sensor effect image augmentation for sim-to-real domain adaptation. IEEE Robot. Autom. Lett. 4(3), 2431–2438 (2019). https://doi.org/10.1109/LRA.2019.2896470
Clouet, A., Vaillant, J., Alleysson, D.: The geometry of noise in color and spectral image sensors. Sensors 20(16) (2020). https://doi.org/10.3390/s20164487, https://www.mdpi.com/1424-8220/20/16/4487
Farrell, J., Okincha, M., Parmar, M.: Sensor calibration and simulation. In: DiCarlo, J.M., Rodricks, B.G. (eds.) Digital Photography IV, vol. 6817, pp. 249–257. International Society for Optics and Photonics, SPIE (2008). https://doi.org/10.1117/12.767901
Higgins, P., Gaoussou Youssouf Kebe, K.D., Don Engel, F.F., Matuszek, C.: Towards making virtual human-robot interaction a reality. In: 3rd International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI), March 2021
Kebe, G.Y., et al.: A spoken language dataset of descriptions for speech-based grounded language learning. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) (2021). https://openreview.net/forum?id=Yx9jT3fkBaD
Khoshelham, K., Elberink, S.O.: Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 12(2), 1437–1454 (2012). https://doi.org/10.3390/s120201437, https://www.mdpi.com/1424-8220/12/2/1437
Konnik, M., Welsh, J.: High-level numerical simulations of noise in CCD and CMOS photosensors: review and tutorial (2014)
Landau, M.J., Choo, B.Y., Beling, P.A.: Simulating kinect infrared and depth images. IEEE Trans. Cybern. 46(12), 3018–3031 (2016). https://doi.org/10.1109/TCYB.2015.2494877
Liu, C., Freeman, W., Szeliski, R., Kang, S.B.: Noise estimation from a single image. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1, pp. 901–908 (2006). https://doi.org/10.1109/CVPR.2006.207
Nguyen, C.V., Izadi, S., Lovell, D.: Modeling kinect sensor noise for improved 3D reconstruction and tracking. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization Transmission, pp. 524–530 (2012). https://doi.org/10.1109/3DIMPVT.2012.84
Richards, L.E., Nguyen, A., Darvish, K., Raff, E., Matuszek, C., Nguyen, A.: A manifold alignment approach to grounded language learning (2021)
Smisek, J., Jancosek, M., Pajdla, T.: 3D with Kinect. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds.) Consumer Depth Cameras for Computer Vision. Advances in Computer Vision and Pattern Recognition, pp. 3–25. Springer, London (2013). https://doi.org/10.1007/978-1-4471-4640-7_1
Sweeney, C., Izatt, G., Tedrake, R.: A supervised approach to predicting noise in depth images. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 796–802 (2019). https://doi.org/10.1109/ICRA.2019.8793820
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Berlier, A.J., Bhatt, A., Matuszek, C. (2023). Augmenting Simulation Data with Sensor Effects for Improved Domain Transfer. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_52
Download citation
DOI: https://doi.org/10.1007/978-3-031-25075-0_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25074-3
Online ISBN: 978-3-031-25075-0
eBook Packages: Computer ScienceComputer Science (R0)