Abstract
One characteristic that distinguishes speaker recognition (identification, verification, classification, tracking, etc.) from other biometrics is that it is designed to operate with devices and over channels that were created for other technologies and functions. That characteristic supports broad, inexpensive, and speedy deployments. The explosion of mobile devices has exacerbated the mismatch problem and the challenges for interoperability. This paper presents a detailed proposal for interoperability that supports all types of audio interchange operations while, at the same time, limiting the audio formats to a small set of widely-used, open standards. We call this proposal Standard Audio Format Encapsulation (SAFE). The SAFE proposal has been incorporated into speaker-recognition data interchange draft standards by the M1 (biometrics) committee of ANSI/INCITS and ISO/IEC JTC1/SC37 project 19794-13 (Voice data).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
ANSI/INCITS (2009). Project 1821—INCITS 456:200x, information technology—speaker recognition format for raw data interchange (SIVR-1). URL abstract: http://www.incits.org/abstracts/1821a.htm, purchase: http://www.techstreet.com.
Beigi, H. (2009). Effects of time lapse on speaker recognition results. In 16th internation conference on digital signal processing (pp. 1–6).
Beigi, H. (2010). Fundamentals of speaker recognition. New York: Springer. ISBN: 978-0-387-77591-3.
Burrows, M., & Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm. Tech. rep., Digital SRC Research Report.
Coalson, J. (2009). FLAC comparison.
Coalson, J. (2009). FLAC (free lossless audio codec).
Coalson, J. (2009). FLAC links.
Goncalves, I., Pfeiffer, S., & Montgomery, C. (2008). Ogg media types. RFC 5334 (proposed standard). URL http://www.ietf.org/rfc/rfc5334.txt.
Huffman, D. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the Institute of Radio Engineers, 40(9), 1098–1101.
ITU-T (1988). G.711 pulse code modulation (PCM) of voice frequencies. ITU-T recommendation. URL http://www.itu.int/rec/T-REC-G.711-198811-I/en.
JTC1/SC37, I. (2009). Text of 3rd WD 19794-13 biometric data interchange formats—part 13: voice data. URL http://isotc.iso.org/livelink/livelink/JTC001-SC37-N-3053.pdf?func=doc.Fetch&nodeId=7941680&docTitle=JTC001-SC37-N-3053.
Pfeiffer, S. (2003). The ogg encapsulation format version 0. RFC 3533 (informational). URL http://www.ietf.org/rfc/rfc3533.txt.
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Prentice Hall signal processing series. New York: Prentice Hall. ISBN: 0-13-015157-2.
Salomon, D. (2006). Data compression: the complete reference (4th ed.). New York: Springer. ISBN: 1-84-628602-6.
Sollaud, A. (2008). RTP payload format for ITU-T recommendation G.711.1. RFC 5391 (proposed standard). URL http://www.ietf.org/rfc/rfc5391.txt.
Summerfield, R., Dunstone, T., & Summerfield, C. (2008). Speaker verification in a multi-vendor environment. In W3C workshop on speaker identification and verification (SIV).
*0.8* 1.2 Vorbis I Specifications (2004). The XIPH open-source community. URL http://xiph.org/ao/doc/.
Viswanathan, M., Beigi, H. S., Dharanipragada, S., Maali, F., & Tritschler, A. (2000). Multimedia document retrieval using speech and speaker recognition. International Journal on Document Analysis and Recognition, 2(4), 147–162. Invited paper.
Libao ogg audio api. (2004). The XIPH open-source community. URL http://xiph.org/ao/doc/.
Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3), 337–343.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Beigi, H., Markowitz, J.A. Standard audio format encapsulation (SAFE). Telecommun Syst 47, 235–242 (2011). https://doi.org/10.1007/s11235-010-9315-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-010-9315-1