Abstract
Overlapped speech contains simultaneous speech of multiple speakers. The presence of overlapped speech is one of the main sources of error for speaker diarization, speech, and speaker recognition systems. Most of the existing works used magnitude spectrum based features for overlap detection. This work focuses on detecting overlapped speech by exploring instantaneous phase and amplitude information of speech signal. Phase characteristics are captured by the Instantaneous Frequency Spectrogram (IFSpec), while Teager-Kaiser Energy Operator (TEO) based pyknograms are used for representing instantaneous amplitude. Features are learned from the IF spectrogram and TEO-based pyknogram automatically using Fully-Convolutional Neural Network (F-CNN). This work is evaluated on the SSC corpus, which has been previously used in this task. Significant performance improvement is observed when both representations are combined in an early fusion framework. The performance improvement upon combination indicates the presence of complementary information in the feature representations. Classification is performed over three different segment durations, i.e., 1 s, 500 ms, and 250 ms, to analyze the effect of segment duration over overlap detection. The effect of speaker gender present in overlapped speech is also studied in this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andrei, V., Cucu, H., Burileanu, C.: Overlapped speech detection and competing speaker counting-humans versus deep learning. IEEE J. Select. Top. Sig. Process. 13(4), 850–862 (2019)
Andrei, V., Cucu, H., Burileanu, C.: Detecting overlapped speech on short timeframes using deep learning. In: INTERSPEECH, pp. 1198–1202 (2017)
Baghel, S., Prasanna, S.R.M., Guha, P.: Overlapped speech detection using phase features. J. Acoust. Soc. Am. 150(4), 2770–2781 (2021)
Boakye, K., Trueba-Hornero, B., Vinyals, O., Friedland, G.: Overlapped speech detection for improved speaker diarization in multiparty meetings. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4353–4356 (2008)
Boakye, K., Vinyals, O., Friedland, G.: Improved overlapped speech handling for speaker diarization. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Chowdhury, S.A., Danieli, M., Riccardi, G.: Annotating and categorizing competition in overlap speech. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5316–5320 (2015)
Chowdhury, S.A., Riccardi, G.: A deep learning approach to modeling competitiveness in spoken conversations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5680–5684 (2017)
Cooke, M., Hershey, J.R., Rennie, S.J.: Monaural speech separation and recognition challenge. Comput. Speech Lang. 24(1), 1–15 (2010)
Geiger, J.T., Eyben, F., Schuller, B., Rigoll, G.: Detecting overlapping speech with long short-term memory recurrent neural networks. In: Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France (2013)
Lovekin, J., Krishnamachari, K.R., Yantorno, R.E., Benincasa, D.S., Wenndt, S.J.: Adjacent Pitch Period Comparison (APPC) as a usability measure of speech segments under co-channel conditions. In: IEEE International Symposium on Intelligent Signal Processing and Communication Systems, pp. 139–142 (2001)
Ryant, N., et al.: First dihard challenge evaluation plan. 2018, Technical Report (2018)
Ryanta, N., et al.: Enhancement and analysis of conversational speech: JSALT 2017. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5154–5158 (2018)
Shokouhi, N., Hansen, J.H.L.: Teager-Kaiser energy operators for overlapped speech detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1035–1047 (2017)
Shokouhi, N., Ziaei, A., Sangwan, A., Hansen, J.H.L.: Robust overlapped speech detection and its application in word-count estimation for prof-life-log data. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4724–4728 (2015)
Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)
Yella, S.H., Bourlard, H.: Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1688–1700 (2014)
Yousefi, M., Hansen, J.H.L.: Frame-based overlapping speech detection using convolutional neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6744–6748 (2020)
Yousefi, M., Shokouhi, N., Hansen, J.H.: Assessing speaker engagement in 2-person debates: overlap detection in united states presidential debates. In: INTERSPEECH, pp. 2117–2121 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Baghel, S., Prasanna, S.R.M., Guha, P. (2022). Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-20980-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)