Acoustic Features for Estimation of Perceptional Similarity

Adachi, Yoshihiro; Kawamoto, Shinichi; Morishima, Shigeo; Nakamura, Satoshi

doi:10.1007/978-3-540-77255-2_33

Yoshihiro Adachi^1,2,
Shinichi Kawamoto¹,
Shigeo Morishima¹ &
…
Satoshi Nakamura¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4810))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1161 Accesses

Abstract

This paper describes an examination of acoustic features for the estimation of perceptional similarity between speeches. We firstly extract some acoustic features including personality from speeches of 36 persons. Secondly, we calculate each distance between extracted features using Gaussian Mixture Model (GMM) or Dynamic Time Warping (DTW), and then we sort speeches based on the physical similarity. On the other hand, there is the permutation based on the perceptional similarity which is sorted according to the subject. We evaluate the physical features by the Spearman’s rank correlation coefficient with two permutations. Consequently, the results show that DTW distance with high STRAIGHT Cepstrum is an optimum feature for estimation of perceptional similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Speech Assessment Based on Entropy and Similarity Measures

Mobile microphone robust acoustic feature identification using coefficient of variance

Article Open access 02 August 2021

Classification of audio signals using spectrogram surfaces and extrinsic distortion measures

Article Open access 22 October 2022

References

Morishima, S., Maejima, A., Wemlera, S., Machida, T., Takebayashi, M.: Future Cast System. ACM SIGGRAPH 2005 Sketch. ACM SIGGRAPH 2005 Full Conference DVD-ROM Disc 2 (2005) ISBN 1-59593-099-X.020-morishima.pdf
Google Scholar
Toda, T., Saruwatari, H., Shikano, K.: High Quality Voice Conversion Based on Gaussian Mixture Model with Dynamic Frequency Warping. In: Proc. INTERSPEECH2001-EUROSPEECH, Aalborg, Denmark, pp. 349–352 (September 2001)
Google Scholar
Amino, K., Sugawara, T., Arai, T.: Speaker Similarities in Human Perception and their Spectral Properties. In: Proc. of WESPAC (2006)
Google Scholar
Nagashima, I., Takagiwa, M., Saito, Y., Nagao, Y., Murakami, H., Fukushima, M., Yamnagwa, H.: An investigation of speech similarity for speaker discrimination. In: Acoustical Society of Japan 2003 Spring Meeting, pp. 737–738 (2003)(in Japanese)
Google Scholar
Kawahara, H.: STRAIGHT: An extremely high-quality VOCODER for auditory and speech perception research. In: Greenberg, Slaney (eds.) Computational Models of Auditory Function, pp. 343–354. IOS Press, Amsterdam (2001)
Google Scholar
Reynolds, D.A.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Trans. On Acoust. Speech and Audio Processing 3(1) (1995)
Google Scholar
Abe, M.: Speech morphing by gradually changing spectrum parameter and fundamental frequency. In: ICSLP 1996, pp. 2235–2238 (1996)
Google Scholar
Kasuya, H., Zhu, W., Matsuda, M., Yang, C.S.: Voice quality conversion based on an ARX speech analysis-synthesis method and its application to the study of speaker individualilty. J. Acoust. Soc. Am. Pt.2 100(4), 2600 (1996)
Article Google Scholar
Kitamura, T., Saitou, T.: Contribution of acoustic features of sustained vowels on perception of speaker characteristic. Acoustical Society of Japan 2007 Spring Meeting , 443–444 (2007) (in japanese)
Google Scholar
Furui, S., Akagi, M.: Perception of voice individuality and physical correlates. Journal of the Acoustical Society of Japan J66-A, 311–318 (1985)
Google Scholar
Saitou, T., Kitamura, T.: Factors in /VVV/ concatenated vowels affecting perception of speaker individuality. Acoustical Society of Japan 2007 Spring Meeting , 441–442 (2007) (in Japanese)
Google Scholar
Higuchi, N., Hashimoto, M.: Analysis of acoustic features affecting speaker identification. In: Proc. of EUROSPEECH 1995, pp. 435–438 (1995)
Google Scholar
Higuchi, N., Hashimoto, M.: Analysis of acoustic features affecting speaker identification. J. Acoust. Soc. Jpn (E) 17(1), 33–35 (1996)
Google Scholar
Francis, A.L., Nusbaum, H.C.: Paying attention to speaking rate. In: Proc. of ICSLP 1996 (1996)
Google Scholar
Minowa, Y., Kido, H., Kasuya, H.: The acoustic parameters associated with the expression of voice quality -a preliminary study. In: Proc. Spring Meeting Acoust. Soc. Japan, pp. 363–364 (2000)
Google Scholar
Kido, H., Kasuya, H.: Voice quality expressions of speech utterance and their acoustic correlates. Technical report of IEICE, SP2002-95, WIT2002-35 (2002)
Google Scholar
Martin, A., Przybocki, M., Doddington, G., Reynolds, D.: The NIST speaker recognition evaluation - overview, methodology, system, results, perspectives. Speech Communication 31, 225–254 (2000)
Article Google Scholar
Weber, F., Manganaro, L., Peskin, B., Shriberg, E.: Using Prosodic and Lexical Information for Speaker Identification. In: Proc. ICASSP, vol. 1, pp. 141–144 (2002)
Google Scholar
Reynolds, D.A.: Speaker Identification and Verification using Gaussian Mixuture Speaker Models. Speech Communication 17, 177–192 (1995)
Article Google Scholar
Sukkar, R.A., Gandhi, M.B., Setlur, A.R.: Speaker Verification Using Mixture Decomposition Discrimination. IEEE Trans. Speech Audio Proc. 8(3), 292–299 (2000)
Article Google Scholar
Sakoe, H., Chiba, S.: A Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Trans. on ASSP 26(27), 43–49 (1978)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

ATR Spoken Language Communication Research Laboratories, 2-2-2 Keihanna, Science City, Kyoto, 619-0288, Japan
Yoshihiro Adachi, Shinichi Kawamoto, Shigeo Morishima & Satoshi Nakamura
Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku Tokyo, 169-8555, Japan
Yoshihiro Adachi

Authors

Yoshihiro Adachi
View author publications
You can also search for this author in PubMed Google Scholar
Shinichi Kawamoto
View author publications
You can also search for this author in PubMed Google Scholar
Shigeo Morishima
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Horace H.-S. Ip Oscar C. Au Howard Leung Ming-Ting Sun Wei-Ying Ma Shi-Min Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adachi, Y., Kawamoto, S., Morishima, S., Nakamura, S. (2007). Acoustic Features for Estimation of Perceptional Similarity. In: Ip, H.HS., Au, O.C., Leung, H., Sun, MT., Ma, WY., Hu, SM. (eds) Advances in Multimedia Information Processing – PCM 2007. PCM 2007. Lecture Notes in Computer Science, vol 4810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77255-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-77255-2_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77254-5
Online ISBN: 978-3-540-77255-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Acoustic Features for Estimation of Perceptional Similarity

Abstract

Access this chapter

Preview

Similar content being viewed by others

Speech Assessment Based on Entropy and Similarity Measures

Mobile microphone robust acoustic feature identification using coefficient of variance

Classification of audio signals using spectrogram surfaces and extrinsic distortion measures

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Acoustic Features for Estimation of Perceptional Similarity

Abstract

Access this chapter

Preview

Similar content being viewed by others

Speech Assessment Based on Entropy and Similarity Measures

Mobile microphone robust acoustic feature identification using coefficient of variance

Classification of audio signals using spectrogram surfaces and extrinsic distortion measures

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation