Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming

Ciaramella, Angelo; Gianfico, Marco; Giunta, Giulio

doi:10.1007/s11042-015-3002-x

Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming

Published: 07 November 2015

Volume 75, pages 17375–17392, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Angelo Ciaramella¹,
Marco Gianfico¹ &
Giulio Giunta¹

236 Accesses
14 Citations
Explore all metrics

Abstract

In this work, a scheme based on a compressive sampling technique and a fast dictionary learning approach for reconstructing audio content in multimedia streaming is introduced. Audio streaming data are encapsulated in different packets by means of an interleaving technique. The compressive sampling technique is used to reconstruct audio information in case of lost packets, with a sparsifying basis provided by a greedy adaptive dictionary learning algorithm. In order to assess the performance of the methodology, several experiments on speech and musical audio signals are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Consistent Dictionary Learning for Audio Declipping

A Novel Incremental Dictionary Learning Method for Low Bit Rate Speech Streaming

An Introduction to Sparse Sampling on Audio Signal by Exploring Different Basis Matrices

Notes

The software is available on request. The source and reconstructed recordings of the experiments are available at the public websites http://goo.gl/wVZBMzand http://goo.gl/o4UPIh.
See Appendix A and the websites http://goo.gl/wVZBMz and http://goo.gl/o4UPIh for further details on the content of the audio signals.
The recordings of the results can be downloaded and listened from the websites http://goo.gl/wVZBMz and http://goo.gl/o4UPIh.

References

Adler A, Emiya V, Jafari MG, Elad M, Gribonval R, Plumbley MD (2012) Audio inpainting. IEEE Trans Audio Speech Lang Process 20(3):922–932
Article Google Scholar
Bahat Y, Schechner YY, Elad M (2015) Self-content-audio inpainting. Signal Process 111:61–72
Article Google Scholar
Banu JF, Ramachandran V (2013) Study of QoS management techniques for VoiceApplications. Int J Comput Sci Electron Eng (IJCSEE) 1(1)
Candès EJ, Wakin MB (2008) An Introduction To Compressive Sampling. IEEE Signal Process Mag 25(2):21–30
Article Google Scholar
Donoho DL (2006) Compressed sensing. IEEE Transaction on Information Theory 52(4):1289–1306
Article MathSciNet MATH Google Scholar
Duric A, Andersen S (2004) Real-time Transport Protocol (RTP) Payload Format for internet Low Bit Rate Codec (iLBC) Speech. The Internet Society
Feamster N, Balakrishnan H (2002) Packet Loss Recovery for Streaming Video. In: 12th International Packet Video Workshop
Fornasier M, Rauhut H (2008) Iterative thresholding algorithms. Appl Comput Harmon Anal 25(2):187–208
Article MathSciNet MATH Google Scholar
Griffin A, Hirvonen T, Tzagkarakis C, Mouchtaris A, Tsakalides P (2011) Single-Channel and Multi-Channel Sinusoidal Audio Coding Using Compressed Sensing. IEEE Trans Audio Speech Lang Process 19(5):1382–1395
Article Google Scholar
Handley M (1997) An Examination of Mbone Performance, USC/ISI res. rep. ISI/RR-97–450
Hovorka J (2009) Methods for evaluation of speech enhancement algorithms. Adv Mil Technol 4(2)
I. Telecommunication Union, ITU-TG.723.1, http://www.itu.int/rec/T-REC-G.723.1/_page.print
ITU, Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T, Recommendation P. 862, http://www.itu.int/rec/T-REC-P.862/
Kabal P (2011) ITU-TG.723.1 SpeechCoder: A Matlab Implementation. Technical Report, MMSPLab Technical Report, Department of Electrical and Computer Engineering, McGill University
Kleijn WB, Shabestary TZ, Skoglund J (2014) Sinusoidal interpolation across missing data. In: Proceedings of the 14th International Workshop on Acoustic Signal Enhancement, pp 70–74
Jafari MG, Plumbley MD (2011) Fast dictionary learning for sparse representations of speech signals. IEEE J Sel Top Sign Process 5(5):1025–1031
Article Google Scholar
Jensen JR, Christensen MG, Jensens MH, Jensen SH, Larsen T (2009) Robust parametric audio coding using multiple description coding. IEEE Signal Process Lett 16(12):1083–1086
Article Google Scholar
Lindblom J, Hedelin P (2002) Packet loss concealment based on sinusoidal modeling. In: IEEE Workshop Proceedings on Speech Coding
Loizou P, Enhancement Speech (2007) Theory and practice. CRC Press, Boca Raton: FL
Google Scholar
Lu X, He H, Tan H (2013) A low complexity packet loss recovery method for audio transmission. In: Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013), pp 1526–1529
Mallat S, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415
Article MATH Google Scholar
Miller GA, Licklider JCR (1950) The intellegibility of interrupted speech. J Acoust Soc Amer 22(2):167–173
Article Google Scholar
Needell D, Tropp JA (2008) CoSaMP: iterative signal recovery from noisy samples. Appl Comput Harmon Anal 26(3):301–321
Article MathSciNet MATH Google Scholar
Ofir H, Malah D (2006) Packet loss concealment for audio streaming based on the GAPES and MAPES algorithms. In: Proceedings of IEEE 24th Convention of Electrical and Electronics Engineers in Israel
Perkins C, Hodson O, Hardman V (1998) A survey of packet loss recovery techniques for streaming audio. IEEE Network, 1998 12(5):40–48
Google Scholar
Pozueco L, Paneda XG, Garcia R, Melendi D, Cabrero S (2013) Adaptable system based on Scalable Video Coding for high-quality video service. Comput Electr Eng
Ramsey JL (1970) Realization of optimum interleavers. IEEE Trans Inf Theory 16:338–345
Article MATH Google Scholar
Rodbro CA, Christensen MG, Andersen SV, Jensen SH (2003) Compressed domain packet loss concealment of sinusoidally coded speech. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing
Romberg J l ₁-Magic, www.acm.caltech.edu/l1magic
Schulzrinne H, et al. (1996) RTP: A Transport Protocol for Real-Time Applications, IETFAudio/Video Transport WG, RFC 1889
Suzuki J, Taka M (1989) Missing packet recovery techniques for low-bit-rate coded speech. IEEE J Sel Areas Commun 7(5):707–717
Article Google Scholar
Toyoshima M, Shimamura T (2014) Packet loss concealment for VoIP based on pitch waveform replication and linear predictive coding. In: Proceedings of 2014 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp 89–92
Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via Orthogonal Matching Pursuit. IEEE Trans Inf Theory 53(12):4655–4666
Article MathSciNet MATH Google Scholar
Xiang K, Hu R (2014) An improved packet loss concealment method form mobile audio coding. The open Automation and Control Systems Journal 6:188–193
Article Google Scholar

Download references

Acknowledgments

This work was partially funded by the “Sostegno alla ricerca individuale per il triennio 2015-2017” project of the University of Naples “Parthenope”.

Author information

Authors and Affiliations

Department of Science and Technology, University of Naples Parthenope, Centro Direzionale Isola C4, 80143, Naples, Italy
Angelo Ciaramella, Marco Gianfico & Giulio Giunta

Authors

Angelo Ciaramella
View author publications
You can also search for this author in PubMed Google Scholar
Marco Gianfico
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Giunta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angelo Ciaramella.

Appendix A

In the following, we report the basic features of the audio signals used in the experiments.

1.1 A.1 Data set D ₁

In this experiment, we consider a male voice (wav format, Mono, 44100 Hz, 32-bit float, 32 seconds). The sentence for constructing the dictionary is:

“Dopo ebbi questa visione, una porta era aperta nel cielo e la voce che prima avevo udita come di tromba parlare con me mi dice: sali quassù e ti mostrerò ciò che deve accadere dopo queste cose, e all’istante fui rapito in spirito ed ecco in cielo c’era un trono e sul trono uno seduto e il seduto era simile nell’aspetto a gemma di diaspro e cornalina e intorno al trono c’era l’arcobaleno simile nell’aspetto...”.

The testing audio signal, with a male voice, is (wav format, Mono, 44100 Hz, 32-bit float, 4.5 seconds):

“Temere Dio è sapienza, astenersi dal male è intelligenza”.

1.2 A.2 Data set D ₂

In this experiment, we consider a female voice (wav format, Mono, 44100 Hz, 32-bit float, 35.29 seconds).

The sentence for constructing the dictionary is:

“Ungete uno stampo rotondo con una noce di burro e cospargetelo bene con un poco di pan grattato, tagliate a pezzi i fegatini, l’animella scottata (risata) e pelata in precedenza e i granelli di pollo lasciando poi rosolare il tutto in un poco di burro infine cospargete con pepe e sale nero, no pepe nero e sale. Per la pasta disponete la farina a fontana sulla spianatoia, unite 4 uova e lavorate il composto con le mani versando anche un poco d’acqua.”.

The testing audio signal, with a female voice (wav format, Mono, 44100 Hz, 32-bit float, 4.1 seconds), is:

“C’era una volta un re seduto sul sofa”.

1.3 A.3 Data set D ₃

In this experiment, we consider a male voice (wav format, Mono, 44100 Hz, 32-bit float, 33.4 sec).

The sentence for constructing the dictionary is:

“Mi hanno detto di parlare un pò più piano perchè non si capisce niente. Mi hanno detto di vestire un poco meglio perchè sembro un deficiente, e allora mi son detto parlerò più piano e vestirò un pò più elegante. Sono andato in un negozio ed ho comprato un capo molto appariscente. Questa qua e la volta buona che riesco ad integrarmi in società, quante volte lo diceva mammà”.

The testing audio signal, a male voice (wav format, Mono, 44100hz, 32-bit float, 10.5 seconds), is

“Trentatre trentini entrarono in treno tutti e trentatre trotterellando. Mi mi (risate di fondo)”.

1.4 A.4 Data set D ₄

In this experiment, we consider a female voice (wav format, Mono, 44100 Hz, 32-bit float, 6.5 seconds).

The sentence for constructing the dictionary is:

“Ciao questa sera stiamo facendo questa cosa, ciao.”.

The testing audio signal, a female voice (wav format, Mono, 44100 Hz, 32-bit float, 5.7 seconds), is

“Mi hanno detto che per far passare il mal di testa devo mettere...”.

1.5 A.5 Data set D ₅

In this experiment we consider a female voice (wav format, Mono, 44100 Hz, 32-bit float, 5.7 seconds).

The sentence for constructing the dictionary is:

‘Mi hanno detto che per far passare il mal di testa devo mettere...”.

The testing audio signal, a female voice (wav format, Mono, 44100 Hz, 32-bit float, 6.5 seconds), is

“Ciao questa sera stiamo facendo questa cosa, ciao”.

1.6 A.6 Data set D ₆

In this experiment we consider a male voice ((wav format, Mono, 44100 Hz, 32-bit float, 32 seconds). The sentence for constructing the dictionary is the same of the data set D ₁.

The testing audio signal, a female voice (wav format, Mono, 44100 Hz, 32-bit float, 4.1 seconds), is

“C’era una volta un re seduto sul sofa”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ciaramella, A., Gianfico, M. & Giunta, G. Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming. Multimed Tools Appl 75, 17375–17392 (2016). https://doi.org/10.1007/s11042-015-3002-x

Download citation

Received: 31 December 2014
Revised: 27 August 2015
Accepted: 09 October 2015
Published: 07 November 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11042-015-3002-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming

Abstract

Access this article

Similar content being viewed by others

Adaptive Consistent Dictionary Learning for Audio Declipping

A Novel Incremental Dictionary Learning Method for Low Bit Rate Speech Streaming

An Introduction to Sparse Sampling on Audio Signal by Exploring Different Basis Matrices

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A

1.1 A.1 Data set D ₁

1.2 A.2 Data set D ₂

1.3 A.3 Data set D ₃

1.4 A.4 Data set D ₄

1.5 A.5 Data set D ₅

1.6 A.6 Data set D ₆

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming

Abstract

Access this article

Similar content being viewed by others

Adaptive Consistent Dictionary Learning for Audio Declipping

A Novel Incremental Dictionary Learning Method for Low Bit Rate Speech Streaming

An Introduction to Sparse Sampling on Audio Signal by Exploring Different Basis Matrices

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A

Appendix A

1.1 A.1 Data set D 1

1.2 A.2 Data set D 2

1.3 A.3 Data set D 3

1.4 A.4 Data set D 4

1.5 A.5 Data set D 5

1.6 A.6 Data set D 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

1.1 A.1 Data set D ₁

1.2 A.2 Data set D ₂

1.3 A.3 Data set D ₃

1.4 A.4 Data set D ₄

1.5 A.5 Data set D ₅

1.6 A.6 Data set D ₆