Abstract
As the popularity of multi-functional communication devices grows, traditional audio conferencing now may involve heterogeneous teleconferencing devices, including POTS phone, VoIP phones, dual-mode smart phones, and so on. During a multi-party audio conference involving heterogeneous devices, it is possible that a video conference is held concurrently involving a subset of devices capable of processing video streams for better the conferencing experience. In such a scenario, the need for synchronization between circuit-switched audio streams and packet-switched video streams arises. While the problem of audio-video synchronization has been extensively investigated in related work, existing solutions are limited to synchronization in packet-data networks and hence are not applicable in the target environment. In this work, we consider the problem of supporting such an overlay video conference among dual-mode phones. We first transform the audio-video synchronization problem into the problem of synchronizing circuit-switched and packet-switched audio streams. We then propose an end-to-end solution for audio synchronization that is transparent to the heterogeneous network protocol suites involved. We investigate synchronization algorithms based on digital speech processing using different acoustic features of the speech signal in the waveform, cepstrum, and spectrum domains. We evaluate the effectiveness of different algorithms under various impairments including codec distortion, line noises, packet losses, and overlapping utterances. Evaluation results show a promising direction for using DSP-based algorithms to address the synchronization problem across heterogeneous telephony systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hsieh, H.-Y., Li, C.-W., Lin, H.-P.: Handoff with DSP support: Enabling seamless voice communications across heterogeneous telephony systems on dual-mode mobile devices. IEEE Transactions on Mobile Computing 8(1), 93–108 (2009)
Liu, C., Xie, Y., Lee, M.J.: Multipoint multimedia teleconference system with adaptive synchronization. IEEE Journal on Selected Areas in Communications (J-SAC) 14, 1422–1435 (1996)
Xie, Y., Liu, C., Lee, M.J., Saadawi, Y.N.: Adaptive multimedia synchronization in a teleconference system. ACM/Springer Multimedia Systems 7(4), 326–337 (1999)
Kim, C., Seo, K.-D., Sung, W., Jung, S.-H.: Efficient audio/video synchronization method for video telephony system in consumer cellular phones. In: Proceedings of the ICCE 2006 Consumer Electronics, January 2006, pp. 137–138 (2006)
Liu, H., Zarki, M.E.: A synchronization control scheme for real-time streaming multimedia applications. In: Proceedings of 13th Packet Video Workshop (April 2003)
Yang, M., Bourbakis, N., Chen, Z., Trifas, M.: An efficient audio-video synchronization methodology. In: Proceedings of the IEEE International Conference on Multimedia and Expo., July 2007, pp. 767–770 (2007)
Lie, W.-N., Hsieh, H.-C.: Lips detection by morphological image processing. In: Proceedings of ICSP 1998, pp. 1084–1087 (1998)
Zoric, G., Pandzic, I.S.: A real-time lip sync system using a genetic algorithm for automatic neural network configuration. In: Proceedings of the IEEE International Conference on Multimedia and Expo., July 2005, pp. 1366–1369 (2005)
Cutler, R., Bridgewater, A.: Audio/video synchronization using audio hashing. Patent No. US 2006/0291478 A1 (December 2006)
Jourjine, A., Richard, S., Yilmaz, O.: Blind separation of disjoint orthogonal signals: Demixing n sources from 2 mixtures. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2000, pp. 2985–2988 (2000)
Rickard, S., Yilmaz, O.: On the approximate W-Disjoint Orthogonality of speech. In: Proceedings of ICASSP, May 2002, pp. 13–17 (2002)
Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 52(7), 1830–1847 (2004)
Shan, Z., Swary, J., Aviyente, S.: Underdetermined source separation in the time-frequency domain. In: Proceedings of ICASSP, September 2007, pp. 945–948 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Lin, HP., Hsieh, HY. (2009). On Using Digital Speech Processing Techniques for Synchronization in Heterogeneous Teleconferencing. In: Bartolini, N., Nikoletseas, S., Sinha, P., Cardellini, V., Mahanti, A. (eds) Quality of Service in Heterogeneous Networks. QShine 2009. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 22. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10625-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-10625-5_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10624-8
Online ISBN: 978-3-642-10625-5
eBook Packages: Computer ScienceComputer Science (R0)