Skip to main content

Advertisement

Log in

Real-Time Lossy Audio Signal Reconstruction Using Novel Sliding Based Multi-instance Linear Regression/Random Forest and Enhanced CGPANN

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

This paper proposes a novel NeuroEvolutionary algorithm called Enhanced Cartesian Genetic Programming evolved Artificial Neural Network (ECGPANN) as a predictor for the lost signal samples in real time. Unlike traditional Cartesian Genetic Programming evolved Artificial Neural Network (CGPANN), the proposed algorithm introduces bi-chromosomal architecture instead of single chromosome to perform parallel evolution of topology with weights and architecture. This modification makes it suitable for obtaining global optimum solutions to predict both periodic and aperiodic lost samples at run-time. Sliding Window based Multi-instance Linear Regression (SW-MLR) and Sliding Window based Multi-instance Random Forest (SW-MRF) prediction algorithms are also exploited for the reconstruction of multiple missing samples. SW-MLR and SW-MRF being trained on fixed input/output cannot be utilized for random signal loss due to dynamic nature of number of output estimations needed at run-time. ECGPANN has the flexibility to produce variable number of outputs in real-time. Experimental results demonstrates the efficacy of the ECGPANN for both single and multi-sample loss with fix periodic and aperiodic noise using sliding window technique. The SNR improvement achieved ranges from 20 to 37 dB for periodic noise and 31–44 dB for aperiodic noise with signals having 16.6–50% samples missing. ECGPANN when compared in terms of its performance with the traditional CGPANN produced 4–5% improvement in prediction accuracy on average. The proposed ECGAPNN model is able to achieve a mean absolute error (MAE) of 0.051 (speech), 0.015 (guitar) and 0.038 (flute) for 16.6% lost/corrupted signals. MAE of 0.066 (speech), 0.020 (guitar) and 0.049 (flute) for 50% lost/corrupted data has been reported. The networks are trained and tested on audio speech signal and evaluated on music signals for its generality, with ECGPANN performing consistently better irrespective of the change in type of signals and demonstrated its robustness with change in number of missing samples in contrast to SW-MLR and SW-MRF. The ability to predict randomly variable number of missing samples make it applicable in real time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119164746.app2.

  2. https://www.kaggle.com/zhousl16/ solo-audio.

References

  1. Ahmad AM, Khan GM, Mahmud SA (2013) Classification of arrhythmia types using cartesian genetic programming evolved artificial neural networks. In: International conference on engineering applications of neural networks. Springer, pp 282–291

  2. Alexander DC, Zikic D, Zhang J, Zhang H, Criminisi A (2014) Image quality transfer via random forest regression: applications in diffusion MRI. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 225–232

  3. Aras S, Kocakoç İD (2016) A new model selection strategy in time series forecasting with artificial neural networks: IHTS. Neurocomputing 174:974–987

    Article  Google Scholar 

  4. Balasundaram S, Gupta D (2014) Training Lagrangian twin support vector regression via unconstrained convex minimization. Knowl-Based Syst 59:85–96

    Article  Google Scholar 

  5. Bartkowiak M, Latanowicz B (2010) Mitigation of long gaps in music using hybrid sinusoidal+ noise model with context adaptation. In: 2010 International conference on signals and electronic systems (ICSES). IEEE, pp 435–438

  6. Bhardwaj A, Tiwari A (2015) Breast cancer diagnosis using genetically optimized neural network model. Expert Syst Appl 42(10):4611–4620

    Article  Google Scholar 

  7. Bontempi G (2008) Long term time series prediction with multi-input multi-output local learning. In: Proceedings of the 2nd ESTSP, pp 145–154

  8. Borchani H, Varando G, Bielza C, Larrañaga P (2015) A survey on multi-output regression. Wiley Interdiscip Rev Data Min Knowl Discov 5(5):216–233

    Article  Google Scholar 

  9. Boufounos PT (2009) Greedy sparse signal reconstruction from sign measurements. In: 2009 Conference record of the forty-third Asilomar conference on signals, systems and computers. IEEE, pp 1305–1309

  10. Ebner PP, Eltelt A (2020) Audio inpainting with generative adversarial network. ArXiv preprint arXiv:2003.07704

  11. Elharrouss O, Almaadeed N, Al-Maadeed S, Akbari Y (2019) Image inpainting: a review. Neural Process Lett 51:2007–2028. https://doi.org/10.1007/s11063-019-10163-0

    Article  Google Scholar 

  12. Etter W (1996) Restoration of a discrete-time signal segment by interpolation based on the left-sided and right-sided autoregressive parameters. IEEE Trans Signal Process 44(5):1124–1135

    Article  Google Scholar 

  13. Frank E, Pfahringer B (2013) Propositionalisation of multi-instance data using random forests. In: Cranefield S, Nayak A (eds) AI 2013: advances in artificial intelligence. AI 2013. Lecture Notes in Computer Science, vol 8272. Springer

  14. Godsill S, Rayner P, Cappé O (2002) Digital audio restoration. In: Applications of digital signal processing to audio and acoustics. Springer, pp 133–194

  15. Hammarqvist U (2011) Audio editing in the time-frequency domain using the Gabor Wavelet Transform. Independent thesis, Advanced level

  16. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin

    Book  Google Scholar 

  17. Huang L, Xia Y, Huang L, Zhang S (2019) Two matrix-type projection neural networks for matrix-valued optimization with application to image restoration. Neural Process Lett. https://doi.org/10.1007/s11063-019-10086-w

  18. Huang N, Lu G, Xu D (2016) A permutation importance-based feature selection method for short-term electricity load forecasting using random forest. Energies 9(10):767

    Article  Google Scholar 

  19. Khan GM, Ahmad A (2018) Breaking the stereotypical dogma of artificial neural networks with cartesian genetic programming. Inspired by Nature, pp 213–233

  20. Khan GM, Ali J, Mahmud S (2014) Wind power forecasting—an application of machine learning in renewable energy. In: Proceedings of the international joint conference on neural networks, pp 1130–1137. https://doi.org/10.1109/IJCNN.2014.6889771

  21. Khan GM, Arshad R (2016) Electricity peak load forecasting using CGP based neuro evolutionary techniques. Int J Comput Intell Syst 9(2):376–395

    Article  Google Scholar 

  22. Khan GM, Ullah F, Mahmud SA (2013) MPEG-4 internet traffic estimation using recurrent CGPANN. In: Engineering applications of neural networks: 14th international conference, EANN 2013, Halkidiki, Greece, Sept 13–16, 2013 Proceedings, Part I, pp 22–31. https://doi.org/10.1007/978-3-642-41013-0_3

  23. Khan GM, Zafari F, Mahmud SA (2013) Very short term load forecasting using cartesian genetic programming evolved recurrent neural networks (CGPRNN). In: 12th international conference on machine learning and applications, ICMLA 2013, Miami, FL, USA, Dec 4–7, 2013, vol 2, pp 152–155. https://doi.org/10.1109/ICMLA.2013.181

  24. Khan MM, Khan GM, Miller JF (2010) Evolution of neural networks using cartesian genetic programming. In: IEEE congress on evolutionary computation. IEEE, pp 1–8

  25. Khan NM, Khan GM (2017) Audio signal reconstruction using cartesian genetic programming evolved artificial neural network (CGPANN). In: Chen X, Luo B, Luo F, Palade V, Wani MA (eds) 16th IEEE international conference on machine learning and applications, ICMLA 2017, Cancun, Mexico, Dec 18–21, 2017. IEEE, pp 568–573. https://doi.org/10.1109/ICMLA.2017.0-100

  26. Khan NM, Khan GM (2018) Signal reconstruction using evolvable recurrent neural networks. In: International conference on intelligent data engineering and automated learning. Springer, pp 594–602

  27. Lagrange M, Marchand S, Rault JB (2005) Long interpolation of audio signals using linear prediction in sinusoidal modeling. J Audio Eng Soc 53(10):891–905

    Google Scholar 

  28. Li C, Lu B, Zhang Y, Liu H, Qu Y (2018) 3d reconstruction of indoor scenes via image registration. Neural Process Lett 48(3):1281–1304

    Article  Google Scholar 

  29. Linusson H (2013) Multi-output random forests. Independent thesis Advanced level (degree of Master (One Year)). University of Borås, School of Business and IT, 2013. https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1309070&dswid=6848

  30. Mathe M, Nandyala SP, Kumar TK (2012) Speech enhancement using Kalman filter for white, random and color noise. In: 2012 International conference on devices, circuits and systems (ICDCS). IEEE, pp 195–198

  31. Mehri S, Kumar K, Gulrajani I, Kumar R, Jain S, Sotelo J, Courville A, Bengio Y (2016) Samplernn: an unconditional end-to-end neural audio generation model. ArXiv preprint arXiv:1612.07837

  32. Miller JF (2011) Cartesian genetic programming. In: Cartesian genetic programming. Springer, pp 17–34

  33. Miller JF, Thomson P (2000) Cartesian genetic programming. In: European conference on genetic programming. Springer, pp 121–132

  34. Mousavi A, Dasarathy G, Baraniuk RG (2017) DeepCodec: adaptive sensing and recovery via deep convolutional neural networks. ArXiv preprint arXiv:1707.03386

  35. Nisan N (1992) Pseudorandom generators for space-bounded computation. Combinatorica 12(4):449–461

    Article  MathSciNet  Google Scholar 

  36. Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. ArXiv preprint arXiv:1609.03499

  37. Oudre L (2018) Interpolation of missing samples in sound signals based on autoregressive modeling. Image Process On Line 8:329–344

  38. Oudre L (2015) Automatic detection and removal of impulsive noise in audio signals. Image Process On Line 5:267–281

    Article  MathSciNet  Google Scholar 

  39. Oyamada K, Kameoka H, Kaneko T, Tanaka K, Hojo N, Ando H (2018) Generative adversarial network-based approach to signal reconstruction from magnitude spectrogram. In: 2018 26th European signal processing conference (EUSIPCO). IEEE, pp 2514–2518

  40. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet  MATH  Google Scholar 

  41. Petukhova T, Ojkic D, McEwen B, Deardon R, Poljak Z (2018) Assessment of autoregressive integrated moving average (ARIMA), generalized linear autoregressive moving average (GLARMA), and random forest (RF) time series regression models for predicting influenza a virus frequency in swine in Ontario, Canada. PloS one 13(6):e0198313

    Article  Google Scholar 

  42. Potter LC, Arun K (1989) Energy concentration in band-limited extrapolation. IEEE Trans Acoust Speech Signal Process 37(7):1027–1041

    Article  Google Scholar 

  43. Rehman M, Ali J, Khan GM, Mahmud S (2014) Extracting trends ensembles in solar irradiance for green energy generation using neuro-evolution. In: IFIP advances in information and communication technology, vol 436. https://doi.org/10.1007/978-3-662-44654-6_45

  44. Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M (2015) Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev 71:804–818

    Article  Google Scholar 

  45. Scott HRR, Wilson R (1995) A multiresolution audio restoration algorithm. In: IEEE ASSP workshop on applications of signal processing to audio and acoustics, 1995. IEEE, pp 151–154

  46. Shanmugam A, Raja MA, Lakshmi SV, Adlinvini V, Ashwin M, Ajeesh PP (2013) Adaptive noise cancellation for speech processing in real time environment. Int J Eng Res Appl (IJERA) 3(2):1102–1106

    Google Scholar 

  47. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  MathSciNet  Google Scholar 

  48. Stanley KO, Clune J, Lehman J, Miikkulainen R (2019) Designing neural networks through neuroevolution. Nat Mach Intell 1(1):24–35

    Article  Google Scholar 

  49. Taieb SB, Sorjamaa A, Bontempi G (2010) Multiple-output modeling for multi-step-ahead time series forecasting. Neurocomputing 73(10–12):1950–1957

    Article  Google Scholar 

  50. Turner AJ, Miller JF (2013) Cartesian genetic programming encoded artificial neural networks: a comparison using three benchmarks. In: Proceedings of the 15th annual conference on genetic and evolutionary computation. ACM, pp 1005–1012

  51. Uncini A (2003) Audio signal processing by neural networks. Neurocomputing 55(3–4):593–625

    Article  Google Scholar 

  52. Valsecchi A, Damas S, Tubilleja C, Arechalde J (2020) Stochastic reconstruction of 3D porous media from 2D images using generative adversarial networks. Neurocomputing 399:227–236. https://doi.org/10.1016/j.neucom.2019.12.040

    Article  Google Scholar 

  53. Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  54. Vaseghi SV (1996) Spectral subtraction. In: Advanced signal processing and digital noise reduction. Springer, pp 242–260

  55. Vaseghi SV, Rayner P (1990) Detection and suppression of impulsive noise in speech communication systems. IEE Proc I Commun Speech Vis 137(1):38–46

    Article  Google Scholar 

  56. Wagstaff KL, Lane T, Roper A (2008) Multiple-instance regression with structured data. In: 2008 IEEE international conference on data mining workshops, pp 291–300

  57. Wang Z, Lan L, Vucetic S (2011) Mixture model for multiple instance regression and applications in remote sensing. IEEE Trans Geosci Remote Sens 50:2226–2237

    Article  Google Scholar 

  58. Wolfe PJ, Godsill SJ (2003) A Gabor regression scheme for audio signal analysis. In: 2003 IEEE workshop on applications of signal processing to audio and acoustics. IEEE, pp 103–106

  59. Wolfe PJ, Godsill SJ (2005) Interpolation of missing data values for audio signal restoration using a Gabor regression model. In: IEEE international conference on acoustics, speech, and signal processing, 2005. Proceedings (ICASSP’05), vol 5. IEEE, pp v–517

  60. Xia Y, Wang P (2013) Speech enhancement in presence of colored noise using an improved least square estimation. In: Proceedings of 3rd international conference on multimedia technology (ICMT-13)

  61. Zhou J, Qian H, Lu X, Duan Z, Huang H, Shao Z (2019) Polynomial activation neural networks: modeling, stability analysis and coverage bp-training. Neurocomputing 359:227–240

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadia Masood Khan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, N.M., Khan, G.M. Real-Time Lossy Audio Signal Reconstruction Using Novel Sliding Based Multi-instance Linear Regression/Random Forest and Enhanced CGPANN. Neural Process Lett 53, 227–255 (2021). https://doi.org/10.1007/s11063-020-10379-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-020-10379-5

Keywords

Navigation