Skip to main content

Advertisement

Log in

End-to-End Variable-Rate Learning-Based Depth Compression Guided by Deep Correlation Features

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The progress in the field of 3D video, particularly depth maps, is leading to the emergence of various technologies such as augmented, virtual, and mixed reality that have a wide range of applications in smart cities, intelligent transportation, AI-enabled farms, healthcare, education, industry, and more. Additionally, the future development of the Internet of Things (IoT) heavily depends on incorporating 3D vision and depth perception into machines like autonomous cars, robots, and drones, so that they effectively perceive their surroundings similar to how humans do. However, traditional compression methods that focus only on texture are not suitable for efficiently handle the large volume of depth maps due to the distinct features between texture and depth. To tackle this challenge, we aim to propose a model for compressing depth maps. Our approach utilizes a learning variable-rate method combined with a conditional quality-controllable autoencoder. The model consists of an encoder that automatically extracts features from depth maps using an optimized Convolutional Neural Network. This latter consists of an initial layer that uses predetermined wedgelet filters, succeeded by a VGG19 model. Additionally, we utilize a technique for classifying image styles based on Learnt Deep Correlation Features in order to learn deep features that distinguish depth maps from texture images. Our model objective is to optimize a loss function with multiple terms, which maintains the accuracy of depth discontinuities in the reconstructed output while also ensuring high-quality synthesis. By capturing and preserving deep features specific to depth maps, our end-to-end network achieves better R/D compression performances compared to related methods and depth-oriented 3D-HEVC standard.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Data Availability

Not applicable.

Notes

  1. https://tinyurl.com/5n92wz32

  2. Peak Signal-to-Noise Ratio

  3. Mean Squared Error

  4. Multi Scale-Structural SIMilarity

References

  1. Merkle, P., Smolic, A., Muller, K., & Wiegand, T. (2007). Multi-view video plus depth representation and coding. In: IEEE ICIP.

  2. Sebai, D. (2020). Performance analysis of HEVC scalable extension for depth maps. Journal of Signal Processing Systems, 92(7), 747–761.

    Article  Google Scholar 

  3. Ying, C., Karsten, M., Jens-Rainer, O., Anthony, V., & Ye-Kui, W. (2016). Overview of the multiview and 3D extensions of high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology, 26(7), 35–49.

    Google Scholar 

  4. Toderici, G., O’Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., Covell, M., & Sukthankar, R. (2015). Variable rate image compression with recurrent neural networks. Preprint retrieved from http://arxiv.org/abs/1511.06085

  5. Yang, F., Herranz, L., Van De Weijer, J., Guitián, J. A. I., López, A. M., & Mozerov, M. G. (2020). Variable rate deep image compression with modulated autoencoder. IEEE Signal Processing Letters, 27, 331–335.

    Article  ADS  Google Scholar 

  6. Sebai, D., & Shah, A. U. (2023). Semantic-oriented learning-based image compression by Only-Train-Once quantized autoencoders. SIVP, 17(1), 285–293.

    Google Scholar 

  7. Zhao, L., Zhang, J., Bai, H., Wang, A., & Zhao, Y. (2022). LMDC: Learning a multiple description codec for deep learning-based image compression. MTA, 81(10).

  8. Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. In: ICLR.

  9. Cai, S., Zhang, Z., Chen, L., Yan, L., Zhong, S., & Zou, X. (2022). High-fidelity variable-rate image compression via invertible activation transformation. Preprint retrieved from http://arxiv.org/abs/2209.05054

  10. Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., & Wang, Y. (2021). End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing, 30, 3179–3191.

    Article  ADS  PubMed  Google Scholar 

  11. Dosovitskiy, A., & Djolonga, J. (2020). You only train once : Loss-conditional training of deep networks. International Conference on Learning Representations.

  12. Masoumian, A., Rashwan, H. A., Abdulwahab, S., Cristiano, J., Asif, M. S., & Puig, D. (2023). GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network. Neurocomputing, 517, 81–92.

    Article  Google Scholar 

  13. Ople, J. J. M., Chen, S. F., Chen, Y. Y., Hua, K. L., Hijji, M., Yang, P., & Muhammad, K. (2022). Controllable model compression for roadside camera depth estimation. IEEE Transactions on Intelligent Transportation Systems.

  14. Wu, Y., & Gao, W. (2022). End-to-end lossless compression of high precision depth maps guided by pseudo-residual. DCC.

  15. Peng, B., Jing, Y., Jin, D., Liu, X., Pan, Z., & Lei, J. (2022). Texture-guided end-to-end depth map compression. IEEE ICIP.

  16. Chen, M., Zhang, P., Chen, Z., Zhang, Y., Wang, X., & Kwong, S. (2022). End-to-end depth map compression framework via rgb-to-depth structure priors learning. IEEE ICIP.

  17. Zhang, R., Jia, K., & Liu, P. (2020). Fast CU size decision using machine learning for depth map coding in 3D-HEVC. IEEE DCC.

  18. Wang, X., Zhang, P., Zhang, Y., Ma, L., Kwong, S., & Jiang, J. (2018). Deep intensity guidance based compression artifacts reduction for depth map. Journal of Visual Communication and Image Representation, 57, 234–242.

    Article  Google Scholar 

  19. Jung, J. H., Shin, Y., & Kwon, Y. (2018). Extension of convolutional neural network with general image processing kernels. TENCON IEEE Region 10 Conference.

  20. Ieracitano, C., Paviglianiti, A., Mammone, N., Versaci, M., Pasero, E., & Morabito, F. C. (2021). So-CNNet: An optimized sobel filter based convolutional neural betwork for SEM images classification of nanomaterials. Progresses in Artificial Intelligence and Neural Systems.

  21. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representation.

  22. Sehli, M., Sebai, D., & Ghorbel, F. (2022). WeLDCFNet: Convolu-tional neural network based on wedgelet filters and learnt deep correlation features for depth maps features extraction. IEEE MMSP.

  23. Chu, W., & Wu, Y. (2018). Image style classification based on learnt deep correlation features. IEEE Transactions on Multimedia, 20(9), 2491–2502.

    Article  Google Scholar 

  24. Unsplash Digital Library. Online. Retrieved January 22, 2023, from https://unsplash.com

  25. Cruz, S., Hutchcroft, W., Li, Y., Khosravan, N., Boyadzhiev, I., & Kang, S. B. (2021). Zillow indoor dataset: Annotated floor plans with 360\(^{\circ }\) panoramas and 3D room layouts. IEEE/CVF Conference on Computer Vision and Pattern Recognition.

  26. Sheikh, H. R., Wang, Z., Cormack, L., & Bovik, A. C. LIVE image quality assessment database. Online. Retrieved January 22, 2023, from http://live.ece.utexas.edu/research/quality

  27. Benchmark MPEG Sequences. Online. Retrieved January 22, 2023, from https://mpeg.chiariglione.org/tags/test-sequences

  28. Sebai, D., Sehli, M., & Ghorbel, F. (2021). Sparse representations-based depth images quality assessment. Visual Informatics, 5(1), 67–75.

    Article  Google Scholar 

  29. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IEEE Transactions on Information Theory, 47, 7–42.

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dorsaf Sebai.

Ethics declarations

Ethics Approval

Not applicable.

Competing Interests

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sebai, D., Sehli, M. & Ghorbel, F. End-to-End Variable-Rate Learning-Based Depth Compression Guided by Deep Correlation Features. J Sign Process Syst 96, 81–97 (2024). https://doi.org/10.1007/s11265-023-01906-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-023-01906-3

Keywords