Article

Automatic classification of speech and music using neural networks

Authors:
M. Kashif Saeed Khan

King Fahd Univ. of Petroleum and Minerals, Dhahran, Saudi Arabia

King Fahd Univ. of Petroleum and Minerals, Dhahran, Saudi Arabia
View Profile

,
Wasfi G. Al-Khatib

King Fahd Univ. of Petroleum and Minerals, Dhahran, Saudi Arabia

King Fahd Univ. of Petroleum and Minerals, Dhahran, Saudi Arabia
View Profile

,
Muhammad Moinuddin

King Fahd Univ. of Petroleum and Minerals, Dhahran, Saudi Arabia

King Fahd Univ. of Petroleum and Minerals, Dhahran, Saudi Arabia
View Profile

MMDB '04: Proceedings of the 2nd ACM international workshop on Multimedia databasesNovember 2004Pages 94–99https://doi.org/10.1145/1032604.1032620

Published:13 November 2004Publication History

MMDB '04: Proceedings of the 2nd ACM international workshop on Multimedia databases

Pages 94–99

ABSTRACT

The importance of automatic discrimination between speech signals and music signals has evolved as a research topic over recent years. The need to classify audio into categories such as speech or music is an important aspect of many multimedia document retrieval systems. Several approaches have been previously used to discriminate between speech and music data. In this paper, we propose the use of the mean and variance of the discrete wavelet transform in addition to other features that have been used previously for audio classification. We have used Multi-Layer Perceptron (MLP) Neural Networks as a classifier. Our initial tests have shown encouraging results that indicate the viability of our approach.

References

Carey, M. J., Parris, E. S. and Lloyd-Thomas, H., A Comparison of Features for Speech, Music Discrimination. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 99), Vol. 1, 1999. Google ScholarDigital Library
Chou, W. and Gu, L., Robust Singing Detection In Speech/Music Discriminator Design. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 01), Vol. 2, 2001. Google ScholarDigital Library
El-Maleh, K., Klein, M., Petrucci, G. and Kabal, P., Speech/Music Discrimination For Multimedia Applications. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 00), Vol. 6, 2000. Google ScholarDigital Library
Harb, H. and Chen, L., Robust Speech Music Discrimination Using Spectrum's First Order Statistics And Neural Networks. In Proceedings of the Seventh International Symposium on Signal Processing and Its Applications, Vol. 2, 2003.Google ScholarCross Ref
Harb, H., Chen, L. and Auloge, J. Y., Speech/Music/Silence and Gender Detection Algorithm. In Proceedings of the 7th International Conference on Distributed Multimedia Systems (DMS 01), 2001.Google Scholar
Haykin, S., Neural Networks: A Comprehensive Foundation. Prentice Hall, 1999. Google ScholarDigital Library
Karneback, S., Discrimination between speech and music based on a low frequency modulation feature. In Proceedings of the European Conference on Speech Communication and Technology, 2001.Google Scholar
Panagiotakis, C. and Tziritas, G., A Speech/Music Discriminator Based On RMS And Zero-Crossings. IEEE Transactions on Multimedia, 2004. Google ScholarDigital Library
Parris, E. S., Carey, M. J. and Lloyd-Thomas, H., Feature Fusion For Music Detection. In Proceedings of the European Conference on Speech Communication and Technology, 1999.Google Scholar
Pinquier, J., Rouas, J. -L. and André-Obrecht, R., A Fusion Study in Speech/Music Classification. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 03), Vol. 2, 2003.Google Scholar
Pinquier, J., Rouas, J.-L. and André-Obrecht, R., Robust Speech / Music Classification in Audio Documents. In Proceedings of the International Conference on Spoken Language Processing (ICSLP 02), Vol. 3, 2002.Google Scholar
Pinquier, J., Sénac, C. and André-Obrecht, R., Speech and Music Classification in Audio Documents. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 02), Vol. 4, 2002.Google Scholar
Saad, E. M., El-Adawy, M. I., Abu-El-Wafa, M. E. and Wahba, A. A., A Multifeature Speech/Music Discrimination System. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE 02), Vol. 2, 2002.Google Scholar
Saunders, J., Real-Time Discrimination of Broadcast Speech/Music. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 96), Vol. 2, 1996. Google ScholarDigital Library
Scheirer, E. and Slaney, M., Construction and Evaluation of A Robust Multifeatures Speech/Music Discriminator. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 97), Vol. 2, 1997. Google ScholarDigital Library
Wang, W. Q., Gao, W. and Ying, D. W., A Fast and Robust Speech/Music Discrimination Approach. In Proceedings of the International Conference on Information, Communications and Signal Processing, 2003.Google ScholarCross Ref

Index Terms

Automatic classification of speech and music using neural networks
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems
  2. Robustness
    1. Hardware reliability
      1. Signal integrity and noise analysis
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Machine-learning based classification of speech and music

The need to classify audio into categories such as speech or music is an important aspect of many multimedia document retrieval systems. In this paper, we investigate audio features that have not been previously used in music-speech classification, such ...
Read More
Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation
Abstract
In this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct ...
Read More
Multi-class pattern classification using neural networks

Multi-class pattern classification has many applications including text document classification, speech recognition, object recognition, etc. Multi-class pattern classification using neural networks is not a trivial extension from two-class neural ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MMDB '04: Proceedings of the 2nd ACM international workshop on Multimedia databases
November 2004
118 pages
ISBN:1581139756
DOI:10.1145/1032604
Program Chairs:
Shu-Ching Chen
Florida International University
,
Mei-Ling Shyu
University of Miami
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
audio features
audio signal processing
content-based indexing
music speech classification
neural networks
Qualifiers
- Article
Conference
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 1,544
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic classification of speech and music using neural networks

MMDB '04: Proceedings of the 2nd ACM international workshop on Multimedia databases

ABSTRACT

References

Cited By

Index Terms

Recommendations

Machine-learning based classification of speech and music

Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

Multi-class pattern classification using neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic classification of speech and music using neural networks

MMDB '04: Proceedings of the 2nd ACM international workshop on Multimedia databases

ABSTRACT

References

Cited By

Index Terms

Recommendations

Machine-learning based classification of speech and music

Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

Multi-class pattern classification using neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media