Skip to main content

On Using Digital Speech Processing Techniques for Synchronization in Heterogeneous Teleconferencing

  • Conference paper
Quality of Service in Heterogeneous Networks (QShine 2009)

Abstract

As the popularity of multi-functional communication devices grows, traditional audio conferencing now may involve heterogeneous teleconferencing devices, including POTS phone, VoIP phones, dual-mode smart phones, and so on. During a multi-party audio conference involving heterogeneous devices, it is possible that a video conference is held concurrently involving a subset of devices capable of processing video streams for better the conferencing experience. In such a scenario, the need for synchronization between circuit-switched audio streams and packet-switched video streams arises. While the problem of audio-video synchronization has been extensively investigated in related work, existing solutions are limited to synchronization in packet-data networks and hence are not applicable in the target environment. In this work, we consider the problem of supporting such an overlay video conference among dual-mode phones. We first transform the audio-video synchronization problem into the problem of synchronizing circuit-switched and packet-switched audio streams. We then propose an end-to-end solution for audio synchronization that is transparent to the heterogeneous network protocol suites involved. We investigate synchronization algorithms based on digital speech processing using different acoustic features of the speech signal in the waveform, cepstrum, and spectrum domains. We evaluate the effectiveness of different algorithms under various impairments including codec distortion, line noises, packet losses, and overlapping utterances. Evaluation results show a promising direction for using DSP-based algorithms to address the synchronization problem across heterogeneous telephony systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hsieh, H.-Y., Li, C.-W., Lin, H.-P.: Handoff with DSP support: Enabling seamless voice communications across heterogeneous telephony systems on dual-mode mobile devices. IEEE Transactions on Mobile Computing 8(1), 93–108 (2009)

    Article  Google Scholar 

  2. Liu, C., Xie, Y., Lee, M.J.: Multipoint multimedia teleconference system with adaptive synchronization. IEEE Journal on Selected Areas in Communications (J-SAC) 14, 1422–1435 (1996)

    Article  Google Scholar 

  3. Xie, Y., Liu, C., Lee, M.J., Saadawi, Y.N.: Adaptive multimedia synchronization in a teleconference system. ACM/Springer Multimedia Systems 7(4), 326–337 (1999)

    Article  Google Scholar 

  4. Kim, C., Seo, K.-D., Sung, W., Jung, S.-H.: Efficient audio/video synchronization method for video telephony system in consumer cellular phones. In: Proceedings of the ICCE 2006 Consumer Electronics, January 2006, pp. 137–138 (2006)

    Google Scholar 

  5. Liu, H., Zarki, M.E.: A synchronization control scheme for real-time streaming multimedia applications. In: Proceedings of 13th Packet Video Workshop (April 2003)

    Google Scholar 

  6. Yang, M., Bourbakis, N., Chen, Z., Trifas, M.: An efficient audio-video synchronization methodology. In: Proceedings of the IEEE International Conference on Multimedia and Expo., July 2007, pp. 767–770 (2007)

    Google Scholar 

  7. Lie, W.-N., Hsieh, H.-C.: Lips detection by morphological image processing. In: Proceedings of ICSP 1998, pp. 1084–1087 (1998)

    Google Scholar 

  8. Zoric, G., Pandzic, I.S.: A real-time lip sync system using a genetic algorithm for automatic neural network configuration. In: Proceedings of the IEEE International Conference on Multimedia and Expo., July 2005, pp. 1366–1369 (2005)

    Google Scholar 

  9. Cutler, R., Bridgewater, A.: Audio/video synchronization using audio hashing. Patent No. US 2006/0291478 A1 (December 2006)

    Google Scholar 

  10. Jourjine, A., Richard, S., Yilmaz, O.: Blind separation of disjoint orthogonal signals: Demixing n sources from 2 mixtures. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2000, pp. 2985–2988 (2000)

    Google Scholar 

  11. Rickard, S., Yilmaz, O.: On the approximate W-Disjoint Orthogonality of speech. In: Proceedings of ICASSP, May 2002, pp. 13–17 (2002)

    Google Scholar 

  12. Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 52(7), 1830–1847 (2004)

    Article  MathSciNet  Google Scholar 

  13. Shan, Z., Swary, J., Aviyente, S.: Underdetermined source separation in the time-frequency domain. In: Proceedings of ICASSP, September 2007, pp. 945–948 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Lin, HP., Hsieh, HY. (2009). On Using Digital Speech Processing Techniques for Synchronization in Heterogeneous Teleconferencing. In: Bartolini, N., Nikoletseas, S., Sinha, P., Cardellini, V., Mahanti, A. (eds) Quality of Service in Heterogeneous Networks. QShine 2009. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 22. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10625-5_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10625-5_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10624-8

  • Online ISBN: 978-3-642-10625-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics