Abstract
This paper describes an examination of acoustic features for the estimation of perceptional similarity between speeches. We firstly extract some acoustic features including personality from speeches of 36 persons. Secondly, we calculate each distance between extracted features using Gaussian Mixture Model (GMM) or Dynamic Time Warping (DTW), and then we sort speeches based on the physical similarity. On the other hand, there is the permutation based on the perceptional similarity which is sorted according to the subject. We evaluate the physical features by the Spearman’s rank correlation coefficient with two permutations. Consequently, the results show that DTW distance with high STRAIGHT Cepstrum is an optimum feature for estimation of perceptional similarity.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Morishima, S., Maejima, A., Wemlera, S., Machida, T., Takebayashi, M.: Future Cast System. ACM SIGGRAPH 2005 Sketch. ACM SIGGRAPH 2005 Full Conference DVD-ROM Disc 2 (2005) ISBN 1-59593-099-X.020-morishima.pdf
Toda, T., Saruwatari, H., Shikano, K.: High Quality Voice Conversion Based on Gaussian Mixture Model with Dynamic Frequency Warping. In: Proc. INTERSPEECH2001-EUROSPEECH, Aalborg, Denmark, pp. 349–352 (September 2001)
Amino, K., Sugawara, T., Arai, T.: Speaker Similarities in Human Perception and their Spectral Properties. In: Proc. of WESPAC (2006)
Nagashima, I., Takagiwa, M., Saito, Y., Nagao, Y., Murakami, H., Fukushima, M., Yamnagwa, H.: An investigation of speech similarity for speaker discrimination. In: Acoustical Society of Japan 2003 Spring Meeting, pp. 737–738 (2003)(in Japanese)
Kawahara, H.: STRAIGHT: An extremely high-quality VOCODER for auditory and speech perception research. In: Greenberg, Slaney (eds.) Computational Models of Auditory Function, pp. 343–354. IOS Press, Amsterdam (2001)
Reynolds, D.A.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Trans. On Acoust. Speech and Audio Processing 3(1) (1995)
Abe, M.: Speech morphing by gradually changing spectrum parameter and fundamental frequency. In: ICSLP 1996, pp. 2235–2238 (1996)
Kasuya, H., Zhu, W., Matsuda, M., Yang, C.S.: Voice quality conversion based on an ARX speech analysis-synthesis method and its application to the study of speaker individualilty. J. Acoust. Soc. Am. Pt.2 100(4), 2600 (1996)
Kitamura, T., Saitou, T.: Contribution of acoustic features of sustained vowels on perception of speaker characteristic. Acoustical Society of Japan 2007 Spring Meeting , 443–444 (2007) (in japanese)
Furui, S., Akagi, M.: Perception of voice individuality and physical correlates. Journal of the Acoustical Society of Japan J66-A, 311–318 (1985)
Saitou, T., Kitamura, T.: Factors in /VVV/ concatenated vowels affecting perception of speaker individuality. Acoustical Society of Japan 2007 Spring Meeting , 441–442 (2007) (in Japanese)
Higuchi, N., Hashimoto, M.: Analysis of acoustic features affecting speaker identification. In: Proc. of EUROSPEECH 1995, pp. 435–438 (1995)
Higuchi, N., Hashimoto, M.: Analysis of acoustic features affecting speaker identification. J. Acoust. Soc. Jpn (E) 17(1), 33–35 (1996)
Francis, A.L., Nusbaum, H.C.: Paying attention to speaking rate. In: Proc. of ICSLP 1996 (1996)
Minowa, Y., Kido, H., Kasuya, H.: The acoustic parameters associated with the expression of voice quality -a preliminary study. In: Proc. Spring Meeting Acoust. Soc. Japan, pp. 363–364 (2000)
Kido, H., Kasuya, H.: Voice quality expressions of speech utterance and their acoustic correlates. Technical report of IEICE, SP2002-95, WIT2002-35 (2002)
Martin, A., Przybocki, M., Doddington, G., Reynolds, D.: The NIST speaker recognition evaluation - overview, methodology, system, results, perspectives. Speech Communication 31, 225–254 (2000)
Weber, F., Manganaro, L., Peskin, B., Shriberg, E.: Using Prosodic and Lexical Information for Speaker Identification. In: Proc. ICASSP, vol. 1, pp. 141–144 (2002)
Reynolds, D.A.: Speaker Identification and Verification using Gaussian Mixuture Speaker Models. Speech Communication 17, 177–192 (1995)
Sukkar, R.A., Gandhi, M.B., Setlur, A.R.: Speaker Verification Using Mixture Decomposition Discrimination. IEEE Trans. Speech Audio Proc. 8(3), 292–299 (2000)
Sakoe, H., Chiba, S.: A Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Trans. on ASSP 26(27), 43–49 (1978)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adachi, Y., Kawamoto, S., Morishima, S., Nakamura, S. (2007). Acoustic Features for Estimation of Perceptional Similarity. In: Ip, H.HS., Au, O.C., Leung, H., Sun, MT., Ma, WY., Hu, SM. (eds) Advances in Multimedia Information Processing – PCM 2007. PCM 2007. Lecture Notes in Computer Science, vol 4810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77255-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-77255-2_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77254-5
Online ISBN: 978-3-540-77255-2
eBook Packages: Computer ScienceComputer Science (R0)