Skip to main content
Log in

Standard audio format encapsulation (SAFE)

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

One characteristic that distinguishes speaker recognition (identification, verification, classification, tracking, etc.) from other biometrics is that it is designed to operate with devices and over channels that were created for other technologies and functions. That characteristic supports broad, inexpensive, and speedy deployments. The explosion of mobile devices has exacerbated the mismatch problem and the challenges for interoperability. This paper presents a detailed proposal for interoperability that supports all types of audio interchange operations while, at the same time, limiting the audio formats to a small set of widely-used, open standards. We call this proposal Standard Audio Format Encapsulation (SAFE). The SAFE proposal has been incorporated into speaker-recognition data interchange draft standards by the M1 (biometrics) committee of ANSI/INCITS and ISO/IEC JTC1/SC37 project 19794-13 (Voice data).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ANSI/INCITS (2009). Project 1821—INCITS 456:200x, information technology—speaker recognition format for raw data interchange (SIVR-1). URL abstract: http://www.incits.org/abstracts/1821a.htm, purchase: http://www.techstreet.com.

  2. Beigi, H. (2009). Effects of time lapse on speaker recognition results. In 16th internation conference on digital signal processing (pp. 1–6).

  3. Beigi, H. (2010). Fundamentals of speaker recognition. New York: Springer. ISBN: 978-0-387-77591-3.

    Google Scholar 

  4. Burrows, M., & Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm. Tech. rep., Digital SRC Research Report.

  5. Coalson, J. (2009). FLAC comparison.

  6. Coalson, J. (2009). FLAC (free lossless audio codec).

  7. Coalson, J. (2009). FLAC links.

  8. Goncalves, I., Pfeiffer, S., & Montgomery, C. (2008). Ogg media types. RFC 5334 (proposed standard). URL http://www.ietf.org/rfc/rfc5334.txt.

  9. Huffman, D. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the Institute of Radio Engineers, 40(9), 1098–1101.

    Google Scholar 

  10. ITU-T (1988). G.711 pulse code modulation (PCM) of voice frequencies. ITU-T recommendation. URL http://www.itu.int/rec/T-REC-G.711-198811-I/en.

  11. JTC1/SC37, I. (2009). Text of 3rd WD 19794-13 biometric data interchange formats—part 13: voice data. URL http://isotc.iso.org/livelink/livelink/JTC001-SC37-N-3053.pdf?func=doc.Fetch&nodeId=7941680&docTitle=JTC001-SC37-N-3053.

  12. Pfeiffer, S. (2003). The ogg encapsulation format version 0. RFC 3533 (informational). URL http://www.ietf.org/rfc/rfc3533.txt.

  13. Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Prentice Hall signal processing series. New York: Prentice Hall. ISBN: 0-13-015157-2.

    Google Scholar 

  14. Salomon, D. (2006). Data compression: the complete reference (4th ed.). New York: Springer. ISBN: 1-84-628602-6.

    Google Scholar 

  15. Sollaud, A. (2008). RTP payload format for ITU-T recommendation G.711.1. RFC 5391 (proposed standard). URL http://www.ietf.org/rfc/rfc5391.txt.

  16. Summerfield, R., Dunstone, T., & Summerfield, C. (2008). Speaker verification in a multi-vendor environment. In W3C workshop on speaker identification and verification (SIV).

  17. *0.8* 1.2 Vorbis I Specifications (2004). The XIPH open-source community. URL http://xiph.org/ao/doc/.

  18. Viswanathan, M., Beigi, H. S., Dharanipragada, S., Maali, F., & Tritschler, A. (2000). Multimedia document retrieval using speech and speaker recognition. International Journal on Document Analysis and Recognition, 2(4), 147–162. Invited paper.

    Google Scholar 

  19. Libao ogg audio api. (2004). The XIPH open-source community. URL http://xiph.org/ao/doc/.

  20. Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3), 337–343.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Homayoon Beigi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beigi, H., Markowitz, J.A. Standard audio format encapsulation (SAFE). Telecommun Syst 47, 235–242 (2011). https://doi.org/10.1007/s11235-010-9315-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-010-9315-1

Keywords

Navigation