Abstract
The progress in the field of 3D video, particularly depth maps, is leading to the emergence of various technologies such as augmented, virtual, and mixed reality that have a wide range of applications in smart cities, intelligent transportation, AI-enabled farms, healthcare, education, industry, and more. Additionally, the future development of the Internet of Things (IoT) heavily depends on incorporating 3D vision and depth perception into machines like autonomous cars, robots, and drones, so that they effectively perceive their surroundings similar to how humans do. However, traditional compression methods that focus only on texture are not suitable for efficiently handle the large volume of depth maps due to the distinct features between texture and depth. To tackle this challenge, we aim to propose a model for compressing depth maps. Our approach utilizes a learning variable-rate method combined with a conditional quality-controllable autoencoder. The model consists of an encoder that automatically extracts features from depth maps using an optimized Convolutional Neural Network. This latter consists of an initial layer that uses predetermined wedgelet filters, succeeded by a VGG19 model. Additionally, we utilize a technique for classifying image styles based on Learnt Deep Correlation Features in order to learn deep features that distinguish depth maps from texture images. Our model objective is to optimize a loss function with multiple terms, which maintains the accuracy of depth discontinuities in the reconstructed output while also ensuring high-quality synthesis. By capturing and preserving deep features specific to depth maps, our end-to-end network achieves better R/D compression performances compared to related methods and depth-oriented 3D-HEVC standard.

















Similar content being viewed by others
Data Availability
Not applicable.
Notes
Peak Signal-to-Noise Ratio
Mean Squared Error
Multi Scale-Structural SIMilarity
References
Merkle, P., Smolic, A., Muller, K., & Wiegand, T. (2007). Multi-view video plus depth representation and coding. In: IEEE ICIP.
Sebai, D. (2020). Performance analysis of HEVC scalable extension for depth maps. Journal of Signal Processing Systems, 92(7), 747–761.
Ying, C., Karsten, M., Jens-Rainer, O., Anthony, V., & Ye-Kui, W. (2016). Overview of the multiview and 3D extensions of high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology, 26(7), 35–49.
Toderici, G., O’Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., Covell, M., & Sukthankar, R. (2015). Variable rate image compression with recurrent neural networks. Preprint retrieved from http://arxiv.org/abs/1511.06085
Yang, F., Herranz, L., Van De Weijer, J., Guitián, J. A. I., López, A. M., & Mozerov, M. G. (2020). Variable rate deep image compression with modulated autoencoder. IEEE Signal Processing Letters, 27, 331–335.
Sebai, D., & Shah, A. U. (2023). Semantic-oriented learning-based image compression by Only-Train-Once quantized autoencoders. SIVP, 17(1), 285–293.
Zhao, L., Zhang, J., Bai, H., Wang, A., & Zhao, Y. (2022). LMDC: Learning a multiple description codec for deep learning-based image compression. MTA, 81(10).
Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. In: ICLR.
Cai, S., Zhang, Z., Chen, L., Yan, L., Zhong, S., & Zou, X. (2022). High-fidelity variable-rate image compression via invertible activation transformation. Preprint retrieved from http://arxiv.org/abs/2209.05054
Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., & Wang, Y. (2021). End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing, 30, 3179–3191.
Dosovitskiy, A., & Djolonga, J. (2020). You only train once : Loss-conditional training of deep networks. International Conference on Learning Representations.
Masoumian, A., Rashwan, H. A., Abdulwahab, S., Cristiano, J., Asif, M. S., & Puig, D. (2023). GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network. Neurocomputing, 517, 81–92.
Ople, J. J. M., Chen, S. F., Chen, Y. Y., Hua, K. L., Hijji, M., Yang, P., & Muhammad, K. (2022). Controllable model compression for roadside camera depth estimation. IEEE Transactions on Intelligent Transportation Systems.
Wu, Y., & Gao, W. (2022). End-to-end lossless compression of high precision depth maps guided by pseudo-residual. DCC.
Peng, B., Jing, Y., Jin, D., Liu, X., Pan, Z., & Lei, J. (2022). Texture-guided end-to-end depth map compression. IEEE ICIP.
Chen, M., Zhang, P., Chen, Z., Zhang, Y., Wang, X., & Kwong, S. (2022). End-to-end depth map compression framework via rgb-to-depth structure priors learning. IEEE ICIP.
Zhang, R., Jia, K., & Liu, P. (2020). Fast CU size decision using machine learning for depth map coding in 3D-HEVC. IEEE DCC.
Wang, X., Zhang, P., Zhang, Y., Ma, L., Kwong, S., & Jiang, J. (2018). Deep intensity guidance based compression artifacts reduction for depth map. Journal of Visual Communication and Image Representation, 57, 234–242.
Jung, J. H., Shin, Y., & Kwon, Y. (2018). Extension of convolutional neural network with general image processing kernels. TENCON IEEE Region 10 Conference.
Ieracitano, C., Paviglianiti, A., Mammone, N., Versaci, M., Pasero, E., & Morabito, F. C. (2021). So-CNNet: An optimized sobel filter based convolutional neural betwork for SEM images classification of nanomaterials. Progresses in Artificial Intelligence and Neural Systems.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representation.
Sehli, M., Sebai, D., & Ghorbel, F. (2022). WeLDCFNet: Convolu-tional neural network based on wedgelet filters and learnt deep correlation features for depth maps features extraction. IEEE MMSP.
Chu, W., & Wu, Y. (2018). Image style classification based on learnt deep correlation features. IEEE Transactions on Multimedia, 20(9), 2491–2502.
Unsplash Digital Library. Online. Retrieved January 22, 2023, from https://unsplash.com
Cruz, S., Hutchcroft, W., Li, Y., Khosravan, N., Boyadzhiev, I., & Kang, S. B. (2021). Zillow indoor dataset: Annotated floor plans with 360\(^{\circ }\) panoramas and 3D room layouts. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Sheikh, H. R., Wang, Z., Cormack, L., & Bovik, A. C. LIVE image quality assessment database. Online. Retrieved January 22, 2023, from http://live.ece.utexas.edu/research/quality
Benchmark MPEG Sequences. Online. Retrieved January 22, 2023, from https://mpeg.chiariglione.org/tags/test-sequences
Sebai, D., Sehli, M., & Ghorbel, F. (2021). Sparse representations-based depth images quality assessment. Visual Informatics, 5(1), 67–75.
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IEEE Transactions on Information Theory, 47, 7–42.
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
Not applicable.
Competing Interests
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sebai, D., Sehli, M. & Ghorbel, F. End-to-End Variable-Rate Learning-Based Depth Compression Guided by Deep Correlation Features. J Sign Process Syst 96, 81–97 (2024). https://doi.org/10.1007/s11265-023-01906-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-023-01906-3