Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations

Baghel, Shikha; Prasanna, S. R. M.; Guha, Prithwijit

doi:10.1007/978-3-031-20980-2_4

Shikha Baghel¹¹,
S. R. M. Prasanna¹² &
Prithwijit Guha¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

International Conference on Speech and Computer

1087 Accesses

Abstract

Overlapped speech contains simultaneous speech of multiple speakers. The presence of overlapped speech is one of the main sources of error for speaker diarization, speech, and speaker recognition systems. Most of the existing works used magnitude spectrum based features for overlap detection. This work focuses on detecting overlapped speech by exploring instantaneous phase and amplitude information of speech signal. Phase characteristics are captured by the Instantaneous Frequency Spectrogram (IFSpec), while Teager-Kaiser Energy Operator (TEO) based pyknograms are used for representing instantaneous amplitude. Features are learned from the IF spectrogram and TEO-based pyknogram automatically using Fully-Convolutional Neural Network (F-CNN). This work is evaluated on the SSC corpus, which has been previously used in this task. Significant performance improvement is observed when both representations are combined in an early fusion framework. The performance improvement upon combination indicates the presence of complementary information in the feature representations. Classification is performed over three different segment durations, i.e., 1 s, 500 ms, and 250 ms, to analyze the effect of segment duration over overlap detection. The effect of speaker gender present in overlapped speech is also studied in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Separation of speech & music using temporal-spectral features and neural classifiers

Article 03 February 2023

Detection of Overlapping Speech for the Purposes of Speaker Diarization

Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition

Article 27 August 2022

References

Andrei, V., Cucu, H., Burileanu, C.: Overlapped speech detection and competing speaker counting-humans versus deep learning. IEEE J. Select. Top. Sig. Process. 13(4), 850–862 (2019)
Article Google Scholar
Andrei, V., Cucu, H., Burileanu, C.: Detecting overlapped speech on short timeframes using deep learning. In: INTERSPEECH, pp. 1198–1202 (2017)
Google Scholar
Baghel, S., Prasanna, S.R.M., Guha, P.: Overlapped speech detection using phase features. J. Acoust. Soc. Am. 150(4), 2770–2781 (2021)
Article Google Scholar
Boakye, K., Trueba-Hornero, B., Vinyals, O., Friedland, G.: Overlapped speech detection for improved speaker diarization in multiparty meetings. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4353–4356 (2008)
Google Scholar
Boakye, K., Vinyals, O., Friedland, G.: Improved overlapped speech handling for speaker diarization. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Google Scholar
Chowdhury, S.A., Danieli, M., Riccardi, G.: Annotating and categorizing competition in overlap speech. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5316–5320 (2015)
Google Scholar
Chowdhury, S.A., Riccardi, G.: A deep learning approach to modeling competitiveness in spoken conversations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5680–5684 (2017)
Google Scholar
Cooke, M., Hershey, J.R., Rennie, S.J.: Monaural speech separation and recognition challenge. Comput. Speech Lang. 24(1), 1–15 (2010)
Article Google Scholar
Geiger, J.T., Eyben, F., Schuller, B., Rigoll, G.: Detecting overlapping speech with long short-term memory recurrent neural networks. In: Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France (2013)
Google Scholar
Lovekin, J., Krishnamachari, K.R., Yantorno, R.E., Benincasa, D.S., Wenndt, S.J.: Adjacent Pitch Period Comparison (APPC) as a usability measure of speech segments under co-channel conditions. In: IEEE International Symposium on Intelligent Signal Processing and Communication Systems, pp. 139–142 (2001)
Google Scholar
Ryant, N., et al.: First dihard challenge evaluation plan. 2018, Technical Report (2018)
Google Scholar
Ryanta, N., et al.: Enhancement and analysis of conversational speech: JSALT 2017. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5154–5158 (2018)
Google Scholar
Shokouhi, N., Hansen, J.H.L.: Teager-Kaiser energy operators for overlapped speech detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1035–1047 (2017)
Article Google Scholar
Shokouhi, N., Ziaei, A., Sangwan, A., Hansen, J.H.L.: Robust overlapped speech detection and its application in word-count estimation for prof-life-log data. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4724–4728 (2015)
Google Scholar
Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)
Google Scholar
Yella, S.H., Bourlard, H.: Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1688–1700 (2014)
Article Google Scholar
Yousefi, M., Hansen, J.H.L.: Frame-based overlapping speech detection using convolutional neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6744–6748 (2020)
Google Scholar
Yousefi, M., Shokouhi, N., Hansen, J.H.: Assessing speaker engagement in 2-person debates: overlap detection in united states presidential debates. In: INTERSPEECH, pp. 2117–2121 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, Assam, India
Shikha Baghel & Prithwijit Guha
Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, 580011, India
S. R. M. Prasanna

Authors

Shikha Baghel
View author publications
You can also search for this author in PubMed Google Scholar
S. R. M. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar
Prithwijit Guha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shikha Baghel .

Editor information

Editors and Affiliations

Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna
St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baghel, S., Prasanna, S.R.M., Guha, P. (2022). Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-20980-2_4
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics