Skip to main content
Log in

Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this work, a scheme based on a compressive sampling technique and a fast dictionary learning approach for reconstructing audio content in multimedia streaming is introduced. Audio streaming data are encapsulated in different packets by means of an interleaving technique. The compressive sampling technique is used to reconstruct audio information in case of lost packets, with a sparsifying basis provided by a greedy adaptive dictionary learning algorithm. In order to assess the performance of the methodology, several experiments on speech and musical audio signals are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The software is available on request. The source and reconstructed recordings of the experiments are available at the public websites http://goo.gl/wVZBMzand http://goo.gl/o4UPIh.

  2. See Appendix A and the websites http://goo.gl/wVZBMz and http://goo.gl/o4UPIh for further details on the content of the audio signals.

  3. The recordings of the results can be downloaded and listened from the websites http://goo.gl/wVZBMz and http://goo.gl/o4UPIh.

References

  1. Adler A, Emiya V, Jafari MG, Elad M, Gribonval R, Plumbley MD (2012) Audio inpainting. IEEE Trans Audio Speech Lang Process 20(3):922–932

    Article  Google Scholar 

  2. Bahat Y, Schechner YY, Elad M (2015) Self-content-audio inpainting. Signal Process 111:61–72

    Article  Google Scholar 

  3. Banu JF, Ramachandran V (2013) Study of QoS management techniques for VoiceApplications. Int J Comput Sci Electron Eng (IJCSEE) 1(1)

  4. Candès EJ, Wakin MB (2008) An Introduction To Compressive Sampling. IEEE Signal Process Mag 25(2):21–30

    Article  Google Scholar 

  5. Donoho DL (2006) Compressed sensing. IEEE Transaction on Information Theory 52(4):1289–1306

    Article  MathSciNet  MATH  Google Scholar 

  6. Duric A, Andersen S (2004) Real-time Transport Protocol (RTP) Payload Format for internet Low Bit Rate Codec (iLBC) Speech. The Internet Society

  7. Feamster N, Balakrishnan H (2002) Packet Loss Recovery for Streaming Video. In: 12th International Packet Video Workshop

  8. Fornasier M, Rauhut H (2008) Iterative thresholding algorithms. Appl Comput Harmon Anal 25(2):187–208

    Article  MathSciNet  MATH  Google Scholar 

  9. Griffin A, Hirvonen T, Tzagkarakis C, Mouchtaris A, Tsakalides P (2011) Single-Channel and Multi-Channel Sinusoidal Audio Coding Using Compressed Sensing. IEEE Trans Audio Speech Lang Process 19(5):1382–1395

    Article  Google Scholar 

  10. Handley M (1997) An Examination of Mbone Performance, USC/ISI res. rep. ISI/RR-97–450

  11. Hovorka J (2009) Methods for evaluation of speech enhancement algorithms. Adv Mil Technol 4(2)

  12. I. Telecommunication Union, ITU-TG.723.1, http://www.itu.int/rec/T-REC-G.723.1/_page.print

  13. ITU, Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T, Recommendation P. 862, http://www.itu.int/rec/T-REC-P.862/

  14. Kabal P (2011) ITU-TG.723.1 SpeechCoder: A Matlab Implementation. Technical Report, MMSPLab Technical Report, Department of Electrical and Computer Engineering, McGill University

  15. Kleijn WB, Shabestary TZ, Skoglund J (2014) Sinusoidal interpolation across missing data. In: Proceedings of the 14th International Workshop on Acoustic Signal Enhancement, pp 70–74

  16. Jafari MG, Plumbley MD (2011) Fast dictionary learning for sparse representations of speech signals. IEEE J Sel Top Sign Process 5(5):1025–1031

    Article  Google Scholar 

  17. Jensen JR, Christensen MG, Jensens MH, Jensen SH, Larsen T (2009) Robust parametric audio coding using multiple description coding. IEEE Signal Process Lett 16(12):1083–1086

    Article  Google Scholar 

  18. Lindblom J, Hedelin P (2002) Packet loss concealment based on sinusoidal modeling. In: IEEE Workshop Proceedings on Speech Coding

  19. Loizou P, Enhancement Speech (2007) Theory and practice. CRC Press, Boca Raton: FL

    Google Scholar 

  20. Lu X, He H, Tan H (2013) A low complexity packet loss recovery method for audio transmission. In: Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013), pp 1526–1529

  21. Mallat S, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415

    Article  MATH  Google Scholar 

  22. Miller GA, Licklider JCR (1950) The intellegibility of interrupted speech. J Acoust Soc Amer 22(2):167–173

    Article  Google Scholar 

  23. Needell D, Tropp JA (2008) CoSaMP: iterative signal recovery from noisy samples. Appl Comput Harmon Anal 26(3):301–321

    Article  MathSciNet  MATH  Google Scholar 

  24. Ofir H, Malah D (2006) Packet loss concealment for audio streaming based on the GAPES and MAPES algorithms. In: Proceedings of IEEE 24th Convention of Electrical and Electronics Engineers in Israel

  25. Perkins C, Hodson O, Hardman V (1998) A survey of packet loss recovery techniques for streaming audio. IEEE Network, 1998 12(5):40–48

    Google Scholar 

  26. Pozueco L, Paneda XG, Garcia R, Melendi D, Cabrero S (2013) Adaptable system based on Scalable Video Coding for high-quality video service. Comput Electr Eng

  27. Ramsey JL (1970) Realization of optimum interleavers. IEEE Trans Inf Theory 16:338–345

    Article  MATH  Google Scholar 

  28. Rodbro CA, Christensen MG, Andersen SV, Jensen SH (2003) Compressed domain packet loss concealment of sinusoidally coded speech. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing

  29. Romberg J l 1-Magic, www.acm.caltech.edu/l1magic

  30. Schulzrinne H, et al. (1996) RTP: A Transport Protocol for Real-Time Applications, IETFAudio/Video Transport WG, RFC 1889

  31. Suzuki J, Taka M (1989) Missing packet recovery techniques for low-bit-rate coded speech. IEEE J Sel Areas Commun 7(5):707–717

    Article  Google Scholar 

  32. Toyoshima M, Shimamura T (2014) Packet loss concealment for VoIP based on pitch waveform replication and linear predictive coding. In: Proceedings of 2014 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp 89–92

  33. Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via Orthogonal Matching Pursuit. IEEE Trans Inf Theory 53(12):4655–4666

    Article  MathSciNet  MATH  Google Scholar 

  34. Xiang K, Hu R (2014) An improved packet loss concealment method form mobile audio coding. The open Automation and Control Systems Journal 6:188–193

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially funded by the “Sostegno alla ricerca individuale per il triennio 2015-2017” project of the University of Naples “Parthenope”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angelo Ciaramella.

Appendix A

Appendix A

In the following, we report the basic features of the audio signals used in the experiments.

1.1 A.1 Data set D 1

In this experiment, we consider a male voice (wav format, Mono, 44100 Hz, 32-bit float, 32 seconds). The sentence for constructing the dictionary is:

“Dopo ebbi questa visione, una porta era aperta nel cielo e la voce che prima avevo udita come di tromba parlare con me mi dice: sali quassù e ti mostrerò ciò che deve accadere dopo queste cose, e all’istante fui rapito in spirito ed ecco in cielo c’era un trono e sul trono uno seduto e il seduto era simile nell’aspetto a gemma di diaspro e cornalina e intorno al trono c’era l’arcobaleno simile nell’aspetto...”.

The testing audio signal, with a male voice, is (wav format, Mono, 44100 Hz, 32-bit float, 4.5 seconds):

“Temere Dio è sapienza, astenersi dal male è intelligenza”.

1.2 A.2 Data set D 2

In this experiment, we consider a female voice (wav format, Mono, 44100 Hz, 32-bit float, 35.29 seconds).

The sentence for constructing the dictionary is:

“Ungete uno stampo rotondo con una noce di burro e cospargetelo bene con un poco di pan grattato, tagliate a pezzi i fegatini, l’animella scottata (risata) e pelata in precedenza e i granelli di pollo lasciando poi rosolare il tutto in un poco di burro infine cospargete con pepe e sale nero, no pepe nero e sale. Per la pasta disponete la farina a fontana sulla spianatoia, unite 4 uova e lavorate il composto con le mani versando anche un poco d’acqua.”.

The testing audio signal, with a female voice (wav format, Mono, 44100 Hz, 32-bit float, 4.1 seconds), is:

“C’era una volta un re seduto sul sofa”.

1.3 A.3 Data set D 3

In this experiment, we consider a male voice (wav format, Mono, 44100 Hz, 32-bit float, 33.4 sec).

The sentence for constructing the dictionary is:

“Mi hanno detto di parlare un pò più piano perchè non si capisce niente. Mi hanno detto di vestire un poco meglio perchè sembro un deficiente, e allora mi son detto parlerò più piano e vestirò un pò più elegante. Sono andato in un negozio ed ho comprato un capo molto appariscente. Questa qua e la volta buona che riesco ad integrarmi in società, quante volte lo diceva mammà”.

The testing audio signal, a male voice (wav format, Mono, 44100hz, 32-bit float, 10.5 seconds), is

“Trentatre trentini entrarono in treno tutti e trentatre trotterellando. Mi mi (risate di fondo)”.

1.4 A.4 Data set D 4

In this experiment, we consider a female voice (wav format, Mono, 44100 Hz, 32-bit float, 6.5 seconds).

The sentence for constructing the dictionary is:

“Ciao questa sera stiamo facendo questa cosa, ciao.”.

The testing audio signal, a female voice (wav format, Mono, 44100 Hz, 32-bit float, 5.7 seconds), is

“Mi hanno detto che per far passare il mal di testa devo mettere...”.

1.5 A.5 Data set D 5

In this experiment we consider a female voice (wav format, Mono, 44100 Hz, 32-bit float, 5.7 seconds).

The sentence for constructing the dictionary is:

‘Mi hanno detto che per far passare il mal di testa devo mettere...”.

The testing audio signal, a female voice (wav format, Mono, 44100 Hz, 32-bit float, 6.5 seconds), is

“Ciao questa sera stiamo facendo questa cosa, ciao”.

1.6 A.6 Data set D 6

In this experiment we consider a male voice ((wav format, Mono, 44100 Hz, 32-bit float, 32 seconds). The sentence for constructing the dictionary is the same of the data set D 1.

The testing audio signal, a female voice (wav format, Mono, 44100 Hz, 32-bit float, 4.1 seconds), is

“C’era una volta un re seduto sul sofa”.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ciaramella, A., Gianfico, M. & Giunta, G. Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming. Multimed Tools Appl 75, 17375–17392 (2016). https://doi.org/10.1007/s11042-015-3002-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3002-x

Keywords

Navigation