End-to-End Variable-Rate Learning-Based Depth Compression Guided by Deep Correlation Features

Sebai, Dorsaf; Sehli, Maryem; Ghorbel, Faouzi

doi:10.1007/s11265-023-01906-3

End-to-End Variable-Rate Learning-Based Depth Compression Guided by Deep Correlation Features

Published: 05 January 2024

Volume 96, pages 81–97, (2024)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

322 Accesses
Explore all metrics

Abstract

The progress in the field of 3D video, particularly depth maps, is leading to the emergence of various technologies such as augmented, virtual, and mixed reality that have a wide range of applications in smart cities, intelligent transportation, AI-enabled farms, healthcare, education, industry, and more. Additionally, the future development of the Internet of Things (IoT) heavily depends on incorporating 3D vision and depth perception into machines like autonomous cars, robots, and drones, so that they effectively perceive their surroundings similar to how humans do. However, traditional compression methods that focus only on texture are not suitable for efficiently handle the large volume of depth maps due to the distinct features between texture and depth. To tackle this challenge, we aim to propose a model for compressing depth maps. Our approach utilizes a learning variable-rate method combined with a conditional quality-controllable autoencoder. The model consists of an encoder that automatically extracts features from depth maps using an optimized Convolutional Neural Network. This latter consists of an initial layer that uses predetermined wedgelet filters, succeeded by a VGG19 model. Additionally, we utilize a technique for classifying image styles based on Learnt Deep Correlation Features in order to learn deep features that distinguish depth maps from texture images. Our model objective is to optimize a loss function with multiple terms, which maintains the accuracy of depth discontinuities in the reconstructed output while also ensuring high-quality synthesis. By capturing and preserving deep features specific to depth maps, our end-to-end network achieves better R/D compression performances compared to related methods and depth-oriented 3D-HEVC standard.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 4

Strategies for enhancing deep video encoding efficiency using the Convolutional Neural Network in a hyperautomation mechanism

Article Open access 07 January 2025

Semantic-oriented learning-based image compression by Only-Train-Once quantized autoencoders

Article 29 April 2022

Low-complexity two-step lossless depth coding using coarse Lossy coding

Article 25 February 2022

Data Availability

Not applicable.

Notes

https://tinyurl.com/5n92wz32
Peak Signal-to-Noise Ratio
Mean Squared Error
Multi Scale-Structural SIMilarity

References

Merkle, P., Smolic, A., Muller, K., & Wiegand, T. (2007). Multi-view video plus depth representation and coding. In: IEEE ICIP.
Sebai, D. (2020). Performance analysis of HEVC scalable extension for depth maps. Journal of Signal Processing Systems, 92(7), 747–761.
Article Google Scholar
Ying, C., Karsten, M., Jens-Rainer, O., Anthony, V., & Ye-Kui, W. (2016). Overview of the multiview and 3D extensions of high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology, 26(7), 35–49.
Google Scholar
Toderici, G., O’Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., Covell, M., & Sukthankar, R. (2015). Variable rate image compression with recurrent neural networks. Preprint retrieved from http://arxiv.org/abs/1511.06085
Yang, F., Herranz, L., Van De Weijer, J., Guitián, J. A. I., López, A. M., & Mozerov, M. G. (2020). Variable rate deep image compression with modulated autoencoder. IEEE Signal Processing Letters, 27, 331–335.
Article ADS Google Scholar
Sebai, D., & Shah, A. U. (2023). Semantic-oriented learning-based image compression by Only-Train-Once quantized autoencoders. SIVP, 17(1), 285–293.
Google Scholar
Zhao, L., Zhang, J., Bai, H., Wang, A., & Zhao, Y. (2022). LMDC: Learning a multiple description codec for deep learning-based image compression. MTA, 81(10).
Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. In: ICLR.
Cai, S., Zhang, Z., Chen, L., Yan, L., Zhong, S., & Zou, X. (2022). High-fidelity variable-rate image compression via invertible activation transformation. Preprint retrieved from http://arxiv.org/abs/2209.05054
Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., & Wang, Y. (2021). End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing, 30, 3179–3191.
Article ADS PubMed Google Scholar
Dosovitskiy, A., & Djolonga, J. (2020). You only train once : Loss-conditional training of deep networks. International Conference on Learning Representations.
Masoumian, A., Rashwan, H. A., Abdulwahab, S., Cristiano, J., Asif, M. S., & Puig, D. (2023). GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network. Neurocomputing, 517, 81–92.
Article Google Scholar
Ople, J. J. M., Chen, S. F., Chen, Y. Y., Hua, K. L., Hijji, M., Yang, P., & Muhammad, K. (2022). Controllable model compression for roadside camera depth estimation. IEEE Transactions on Intelligent Transportation Systems.
Wu, Y., & Gao, W. (2022). End-to-end lossless compression of high precision depth maps guided by pseudo-residual. DCC.
Peng, B., Jing, Y., Jin, D., Liu, X., Pan, Z., & Lei, J. (2022). Texture-guided end-to-end depth map compression. IEEE ICIP.
Chen, M., Zhang, P., Chen, Z., Zhang, Y., Wang, X., & Kwong, S. (2022). End-to-end depth map compression framework via rgb-to-depth structure priors learning. IEEE ICIP.
Zhang, R., Jia, K., & Liu, P. (2020). Fast CU size decision using machine learning for depth map coding in 3D-HEVC. IEEE DCC.
Wang, X., Zhang, P., Zhang, Y., Ma, L., Kwong, S., & Jiang, J. (2018). Deep intensity guidance based compression artifacts reduction for depth map. Journal of Visual Communication and Image Representation, 57, 234–242.
Article Google Scholar
Jung, J. H., Shin, Y., & Kwon, Y. (2018). Extension of convolutional neural network with general image processing kernels. TENCON IEEE Region 10 Conference.
Ieracitano, C., Paviglianiti, A., Mammone, N., Versaci, M., Pasero, E., & Morabito, F. C. (2021). So-CNNet: An optimized sobel filter based convolutional neural betwork for SEM images classification of nanomaterials. Progresses in Artificial Intelligence and Neural Systems.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representation.
Sehli, M., Sebai, D., & Ghorbel, F. (2022). WeLDCFNet: Convolu-tional neural network based on wedgelet filters and learnt deep correlation features for depth maps features extraction. IEEE MMSP.
Chu, W., & Wu, Y. (2018). Image style classification based on learnt deep correlation features. IEEE Transactions on Multimedia, 20(9), 2491–2502.
Article Google Scholar
Unsplash Digital Library. Online. Retrieved January 22, 2023, from https://unsplash.com
Cruz, S., Hutchcroft, W., Li, Y., Khosravan, N., Boyadzhiev, I., & Kang, S. B. (2021). Zillow indoor dataset: Annotated floor plans with 360$^{\circ }$ panoramas and 3D room layouts. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Sheikh, H. R., Wang, Z., Cormack, L., & Bovik, A. C. LIVE image quality assessment database. Online. Retrieved January 22, 2023, from http://live.ece.utexas.edu/research/quality
Benchmark MPEG Sequences. Online. Retrieved January 22, 2023, from https://mpeg.chiariglione.org/tags/test-sequences
Sebai, D., Sehli, M., & Ghorbel, F. (2021). Sparse representations-based depth images quality assessment. Visual Informatics, 5(1), 67–75.
Article Google Scholar
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IEEE Transactions on Information Theory, 47, 7–42.

Download references

Funding

None.

Author information

Authors and Affiliations

Cristal Laboratory, National School of Computer Sciences (ENSI), Manouba, Tunisia
Dorsaf Sebai, Maryem Sehli & Faouzi Ghorbel

Authors

Dorsaf Sebai
View author publications
You can also search for this author inPubMed Google Scholar
Maryem Sehli
View author publications
You can also search for this author inPubMed Google Scholar
Faouzi Ghorbel
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Dorsaf Sebai.

Ethics declarations

Ethics Approval

Not applicable.

Competing Interests

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sebai, D., Sehli, M. & Ghorbel, F. End-to-End Variable-Rate Learning-Based Depth Compression Guided by Deep Correlation Features. J Sign Process Syst 96, 81–97 (2024). https://doi.org/10.1007/s11265-023-01906-3

Download citation

Received: 20 July 2023
Revised: 03 December 2023
Accepted: 05 December 2023
Published: 05 January 2024
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11265-023-01906-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-End Variable-Rate Learning-Based Depth Compression Guided by Deep Correlation Features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Strategies for enhancing deep video encoding efficiency using the Convolutional Neural Network in a hyperautomation mechanism

Semantic-oriented learning-based image compression by Only-Train-Once quantized autoencoders

Low-complexity two-step lossless depth coding using coarse Lossy coding

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now