research-article

MapReduce based for speech classification

Authors:

Quang Trung Nguyen,

The Duy BuiAuthors Info & Claims

SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

Pages 87 - 91

https://doi.org/10.1145/3011077.3011090

Published: 08 December 2016 Publication History

Abstract

Speech classification is one of the most vital problems in speech processing as well as spoken word recognition. Although, there have been many studies on the classification of speech signals, the results are still limited on both accuracy and the size of the vocabulary. When classifying a huge volumes vocabulary, the speech classification becomes more and more difficult. Today, there are some frameworks that allow working with big data. One of these is a data mining utility. It can perform supervised classification procedures on very large amounts of data, usually named as big data, on a distributed infrastructure by using the MapReduce framework of Hadoop clusters. This tool has four classification approaches implemented. These are Random Forest, Naïve Bayes, Decision Trees and Support Vector Machines (SVM). All these approaches require input data having the same size, so the input data must be quantized before using. This leads to decrease the accuracy in the classification stage. In this paper, we propose an implementation of Local Naïve Bayes Nearest Neighbor based on Hadoop framework, which allows input data with different sizes and works well with huge training data.

References

[1]

Björn W. Schuller, Pavel Král, and Václav Matoušek, "Speech Analysis in the Big Data Era," in Text, Speech, and Dialogue: 18th International Conference, 2015.

Digital Library

[2]

Wei Dai and Wei Ji, "A MapReduce Implementation of C4.5 Decision Tree Algorithm," International Journal of Database Theory and Application, vol. Vol.7, No.1, pp. 49--60, 2014.

[3]

Wang Dingxian, Liu Xiao, and Wang Mengdi, "A DT-SVM Strategy for Stock Futures Prediction with Big Data," in 16th International Conference on Computational Science and Engineering, 2013.

Digital Library

[4]

(2016, Mar.) mahout.apache.org. {Online}. https://mahout.apache.org

[5]

Anushree Priyadarshini and Agarwal Sonali, "A Map Reduce based Support Vector Machine for Big Data Classification," International Journal of Database Theory and Application, vol. No.5 Vol.8, pp. 77--98, 2015.

[6]

P Anchalia Prajesh and Roy Kaushik, "The k-Nearest Neighbor Algorithm Using MapReduce Paradigm," in Fifth International Conference on Intelligent Systems, Modelling and Simulation, 2014.

[7]

B. Apexa Kamdar and K. Rajani Ishan, "Improved Adaptive K Nearest Neighbor algorithm using MapReduce," International Journal of Science, Engineering and Technology Research (IJSETR), vol. 4, June 2015.

[8]

Boiman O., Shechtman E., and Iran M., "In Defense of Nearest-Neighbor Based Image Classification," In CVPR, 2008.

[9]

Sancho McCann, David G. Lowe, "Local Naive Bayes Nearest Neighbor for Image Classification," In CVPR, 2012.

[10]

Nguyen Quang Trung, Bui The Duy, and Ma Thi Chau, "An Image based Approach for Speech Perception," in NICS, 2015, pp. 208--213.

[11]

J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," in OSDI, 2004, pp. 137--150.

Digital Library

[12]

http://hadoop.apache.org/.

[13]

https://catalog.ldc.upenn.edu/LDC2008S07.

[14]

http://www.alovoice.vn/ai/du-lieu-tieng-noi-tieng-viet/.

[15]

http://research.nii.ac.jp/src/en/TMW.html, 2015.

[16]

http://research.nii.ac.jp/src/en/JVPD.html, 2015.

[17]

Lowe David G., "Distinctive image features from scale-invariant keypoints," IJCV, 2004.

Digital Library

Index Terms

MapReduce based for speech classification
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. MapReduce algorithms

Recommendations

Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm
Highlights
- Distributed Heterogeneous Ensemble is designed for big data classification.
- ...
Abstract
In this era of big data, processing large scale data efficiently and accurately has become a challenging problem. Ensemble classification is a type of supervised learning that uses multiple experts to generate the final output. It ...
Speech classification using SIFT features on spectrogram images

Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input ...
MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

December 2016

442 pages

ISBN:9781450348157

DOI:10.1145/3011077

General Chairs:
Nguyen Manh Hung
NTT University, Vietnam
,
Huynh Quyet Thang
HUST, Vietnam
,
Program Chairs:
Luc De Raedt
KULeuven, Belgium
,
Yves Deville
UCLouvain, Belgium
,
Marc Bui
EPHE, France
,
Truong Thi Dieu Linh
HUST, Vietnam
,
Publications Chairs:
Dinh Viet Sang
HUST, Vietnam
,
Nguyen Hong Phuong
HUST, Vietnam
,
Nguyen Thi Oanh
HUST, Vietnam

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SoICT '16

SoICT '16: Seventh International Symposium on Information and Communication Technology

December 8 - 9, 2016

Ho Chi Minh City, Vietnam

Acceptance Rates

SoICT '16 Paper Acceptance Rate 58 of 132 submissions, 44%;

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
80
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten