A novel method to represent speech signals
Introduction
The major objective of speech coding, an important application of speech processing, is to represent the signal with a minimum number of bits while maintaining perceptual quality. The coding of the speech makes it possible to achieve bandwidth efficiency during the transmission of the signal and store it efficiently on a variety of magnetic and optical media [7]. Over the decades, a variety of speech coding methods have been proposed and developed such as LPC, CELP, RELP, VSELP, PCM, DPCM, ADPCM, Sub-band Coding, Transform Coding, Adaptive Transform Coding [4], [10], [15], [18], [19] and projection pursuit techniques [8], [9], [13]. Beyond coding, numerous methods are utilized for signal representation for compression, recognition, classification and also for secure communication purposes. Some of these signal representation techniques are known as frequency domain, time domain, transform domain, fuzzy logic and synthetic neural networks techniques [5], [6], [10], [11], [16], [17]. In late 1990s, new methods of signal representation by means of special future functions so-called “signature functions”, were introduced [1], [2], [3], [12].
The main idea of the new methods was to represent the signals via pre-defined signature functions directly obtained from the source. In this understanding, vocal track of humans is considered as one of a kind source or we say that all the speech signals constitutes a specific family of signals—“Family of the Speech Signals”. Similarly, any sort of music stems from a family of signal so-called the “Family of the Music Signals”, etc.
In the new techniques of [1], [2], [3], [12], signature functions were created experimentally on an ad hoc basis. In this work however, considering the quasi-stationary behavior of the speech signals, a statistical method is proposed to generate the signature functions. In this regard, we run several thousands of experiments to analyze speech signals employing the signal representation method. In the experiments, each signal piece was divided into small frames (Fig. 1). For each frame, the correlation matrix is constructed and its eigenvalues and eigenvectors were computed. For each frame, the eigenvector which is associated with the highest eigenvalue is sorted and it is stored for further evaluation. Eventually, a big storage area like a data warehouse is constructed as the result of these experiments. Employing a comparison algorithm, eigenvectors with similar shapes were eliminated. In conclusion, for speech signals family, we ended up with a data set that contains only 15 or 16 different shapes of eigenvectors. These vectors are collected under a new set which has only 15 or 16 elements. In this approach, each eigenvector is regarded as a time sequence. Its continuous form is named as “signature function”. Eventually, these time sequences or signature functions are utilized to model the signals. In the modeling process, each frame of the speech signal is represented with only one signature function multiplied by a coefficient. Therefore, each frame is represented by an index number which is associated with a pre-defined signature function or signature sequencemultiplied with a coefficient. Hence, substantial signal compression rate is achieved. In the following section, generation of the pre-defined signature sequences or signature functions is explained. Based on our experimental results, some selected signature functions for speech modeling are depicted. Examples are given to show the practical implementation of the new method. It is expected that the new idea presented in this paper to model the speech signals may be utilized for speech coding, efficient storage with high compression rate and transmission purposes.
Section snippets
A statistical method to generate signature functions
In this method, a quasi-stationary signal, given over a long period of time, is divided into “frames” as shown in Fig. 1. Assume that, number of samples is equally placed over the long but finite interval. Then, the sampled signal is given by
Here is the unit impulse and is the height of the sample “”. Let us assume that the long signal train is divided into equal length frames with samples. Then, the time sequence of each frame can be represented by a
Selection of frame length by means of hearing quality test “MOS”
In this section, we present our experimental results to select the optimum frame length employing the above algorithm. In this process, first several speech pieces obtained from 10 male and 10 female speakers were recorded with 8 kHz sampling rate. The speakers were given random texts in Turkish language to read. For each person, correlation matrices with different frame lengths were computed and then, corresponding eigen-vectors (or signature sequences) were generated. In this phase of the
Summary of the speech reconstruction process and discussion on the compression ability of the new technique
- •
The most important conclusion of this research work is that any speech signal can be modeled by means of a pre-defined signature sequence set which contains only (or 16) different signal shapes or waveforms constructed with (or 40) samples.
- •
It is shown that any random speech frame consist of samples can be expressed as where is pulled from the signature sequence set which yields the minimum value for the
Examples
In this section, 10 different Turkish sentences were read by 20 speakers. Ten of these speakers were male and the remaining were female. Each sentence was sampled with 8 kHz; and recorded on the computer. Then, utilizing the pre-defined signature sequence set given for the frame lengths and as depicted in Figs. 2 and 3 respectively, the original sentences were reconstructed. The hearing quality of the reconstructed sentences was evaluated by 20 listeners. Ten of these listeners were
Comparative results
For , the transmission rate of our method corresponds to and the average MOS computed from Table 1 is 2.76. Similarly, for , the transmission rate of the newly proposed method corresponds to with average MOS of 2.5 as specified by Table 2. In this case, it is fair to compare the hearing quality of our recently proposed method with “FS1015 LPC-10E” of for which MOS is given as 2.6 [14]. This means that the hearing quality of our proposed
Conclusion
In this paper, a novel method to model speech signals is presented by means of so-called pre-defined signature sequences which consist of samples. Pre-defined signature sequences are collected in a set called “signature sequence set ”. The set is generated employing the statistical properties of speech signals. It is shown that any speech sequence (or frame) , which consist of samples, can be expressed by . In this representation, is pulled from the pre-defined
Acknowledgements
The authors would like to thank reviewers of this paper for their helpful and constructive comments. They were guiding and motivating. Useful discussions with Prof. Dr. Erdal Panayirci, Dr. Umit Guz and Hakan Gurkan of ISIK University, Istanbul, Turkey are also acknowledged.
References (19)
- et al.
A fuzzy decision strategy for topic identification and dynamic selection of language models
Signal Process. (EURASIP)
(June 2000) - R. Akdeniz, A.M. Karas, B.S. Yarman, Turkish speech coding by signature base sequences, Proceedings of the...
- R. Akdeniz, B.S. Yarman, Temel Tanim Dizileri ile Konuşma Kodlama, Proceedings of the SIU'98 -6. Sinyal İşleme ve...
- R. Akdeniz, B.S. Yarman, Generation of optimum signature base sequences for speech signals, Proceedings...
- et al.
Speech CodingA Computer Laboratory Textbook
(1996) - et al.
A training algorithm for statistical sequence recognition with applications to transition-based speech recognition
IEEE Signal Proc. Lett.
(July 1996) - et al.
Discrete-Time Processing of Speech Signals
(1993) - et al.
Matching pursuits sinusoidal speech coding
IEEE Trans. Speech Audi. Proc.
(September 2003) - et al.
Projection pursuit regression
J. Am. Stat. Assoc.
(1981)
Cited by (7)
On the comparative results of "SYMPES: A new method of speech modeling"
2006, AEU - International Journal of Electronics and CommunicationsA new method to represent speech signals via predefined signature and envelope sequences
2007, Eurasip Journal on Advances in Signal ProcessingPerformance comparison of discrete transforms on speech compressed sensing
2015, Proceedings - 2015 IEEE International Conference on Computational Intelligence and Communication Technology, CICT 2015Turning point algorithm for speech signal compression
2012, International Journal of Speech TechnologyApplication of model order reduction approach on quality evaluation of speech signal
2012, Proceedings - 2012 2nd National Conference on Computational Intelligence and Signal Processing, CISP 2012A novel approach to speech signal synthesis
2008, ICALIP 2008 - 2008 International Conference on Audio, Language and Image Processing, Proceedings