Bidirectional Alignment of Glottal Pulse Length Sequences for the Evaluation of Pitch Detection Algorithms

Ferrer, Carlos A.; Guillén, Reinier Rodríguez; Nöth, Elmar

doi:10.1007/978-3-030-33904-3_67

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11896))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1541 Accesses

Abstract

This paper describes a problem in a reported Dynamic Time Warping (DTW) alignment procedure to compare the reference and detected glottal pulse length sequences, oriented to compare the evaluation of Pitch Detection Algorithms (PDAs) in pathological voices. The problem in the existing alignment method tends to overestimate the failure of the PDA, by aligning only the detected to the reference sequence. A solution is presented, which performs a bidirectional alignment reducing the differences present in the definitive comparison. The proposal is evaluated in both synthetic and real voice signals, by running three well-known PDAs, and the magnitude of the error reduction along with comments on the possible factors influencing its value, are given. The alignment variant introduced in this paper allows to perform a fairer comparison of the PDAs performances.

You have full access to this open access chapter, Download conference paper PDF

Measuring Periodicity Perturbations in Pathological Voice: General-Purpose Software vs. Custom-Tailored Methods

Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition

Article 27 October 2015

Hwai-Tsu Hu & Ling-Yuan Hsu

An Efficient DTW-Based Approach for Melodic Similarity in Flamenco Singing

Keywords

1 Introduction

In human oral communication, the sounds where the vocal cords vibrate show a quasi-periodic pattern in their acoustic waveforms. The origin of this periodicity is found in the alternating opening and closing of the vocal folds at the glottis, allowing a pulse-like flow of air coming from the lungs to travel forward and create sound [1]. The determination of glottal pulse boundaries is a common problem in several speech processing tasks, either oriented for healthy (e.g. singing [2], fluent speaking [3, 4] and similar uses) or pathologic speech [5]. The methods used for determining the pulse location are specific types of Pitch Detection Algorithms (PDAs), working on a cycle-by-cycle basis [6].

When facing pathological voices, a degradation in the PDAs performance appears [7, 8], due to the higher levels of periodicity perturbations present. There are dozens of PDAs to choose from, and selecting a particular approach should be based on thorough performance comparisons [9,10,11,12]. Key in this evaluation is the selection of the measures of PDAs performance extracted.

The most commonly used measure of PDA performance is to compare the pulse’s durations variability in the detected sequence T_d(n) with the variability of the reference, known sequence T_r(n). Variability of the pulse duration contour is a concept denoted as jitter in the Voice Measurement literature [13], and there are several expressions proposed to measure it [14], among which a representative ones is:

$$ \alpha = \frac{1}{N - 1}\sum\limits_{n = 1}^{N - 1} {\frac{{\left| {T\left( {n + 1} \right) - T\left( n \right)} \right|}}{{0.5 * \left( {T\left( {n + 1} \right) + T\left( n \right)} \right)}}} * 100 $$

(1)

The difference |α_{r −} α_d|, i.e. the values obtained when using T_r(n) and T_d(n) on Eq. (1), is widely reported as a measure of PDA accuracy [8,9,10,11,12]. However, this practice has been heavily criticized [7, 15, 16], since equal variability of two sequences does not imply sequences equality. An alternative approach suggested since [7] is to use an inter-sequence variability measure, termed β as a successor to α in Eq. (1). The expression for β used in [7] was slightly modified to closely resemble α in [15], as:

$$ \beta = \frac{1}{N}\sum\limits_{n = 1}^{N} {\frac{{\left| {T_{d} - T_{r} \left( n \right)} \right|}}{{T_{r} \left( n \right)}}} * 100 $$

(2)

However, the internal functioning of the PDAs frequently produces T_ds which cannot be related to T_r by simply iterating through n, the pulse index. A Dynamic Time Warping (DTW) of both glottal pulse lengths sequences, T_r(n) and T_d(n), was proposed [15] in order to correct the misalignments described, for a better evaluation of the performance of the PDAs. The DTW by itself produced several new performance measures by counting the amount of the different types of misalignment. The DTW procedure described in [15] actually detected the different types of misalignment between both sequences, but, as will be shown in the next section, the value of β produced failed to represent the one corresponding to the aligned sequences.

In this paper we describe the problem present in the reported DTW, we introduce the required modifications to solve it, and perform some experiments showing the relevance of the correction.

2 Existing DTW Procedure: Problem and Solution Proposed

A flowchart depicting the DTW procedure proposed in is shown in the left panel of Fig. 1. The algorithm departs from the sequences of reference and detected pulse positions, P_r(n) and P_d(n), from which the sequences of pulse lengths T_r(n) and T_d(n) are obtained by a difference (discrete derivative) operation. The additional measures of performance produced by the DTW algorithm are the number of significant/gross errors (GE) between both aligned sequences, the number of pulse insertions (PI) and deletions (PD) and the number of contour shifts to the left (SL) or to the right (SR) of the detected pulse boundaries P_d(n) as compared to the reference, hand-marked contour P_r(n). A copy of the detected pulse length contour is made to store the dynamically aligned contour, labeled T_dAl.

The DTW procedure works by incrementally moving through the contours by using separate vector indexes for both, namely n_r and n_d. Both indexes compose a bi-dimensional grid of points (n_r, n_d) where the best path for alignment is searched for using some heuristic constraints.

Regarding the Global Constraints for the DTW, the alignment of length values (Ts) can only be performed if position values (Ps) are actually related in time. If for a particular position this is not met, then a pitch mark deletion (PD) or insertion (PI) occurred. Both PD and PI counts are metrics available from the DTW procedure. The conditions for the occurrence of PD and PI are:

$$ \begin{array}{*{20}c} {PD\text{? = }P_{d} \left( {n_{d} - 1} \right)\;\text{ > }\;P_{r} \left( {n_{r} + 1} \right)} \\ {PI\text{? = }P_{d} \left( {n_{d} + 1} \right)\;\text{ < }\;P_{r} \left( {n_{r} - 1} \right)} \\ \end{array} $$

(3)

If Global Constraints are not violated, Local Constraints were checked on whether the length contours require alignments, either a shift to the left (SL) or to the right (SR) in the (n_r, n_d) path. A first check is that the contours are not similar at their current positions in the grid (n_r, n_d), for which the occurrence of a Gross Error (GE) is evaluated. An auxiliary large-difference/error function between T_r(n_r) and T_d(n_d) for arbitrary displacements a and b, respectively, was defined in [15] for determining the presence of GE,, SL and SR. This Boolean function D(a,b) checks if the difference between the contours at certain positions (displaced a and b with respect to the current positions n_r and n_d, respectively) exceeds a given threshold Thr:

$$ D\left( {a,b} \right)\text{?} = \left| {T_{r} \left( {n_{r} + a} \right) - T_{d} \left( {n_{d} + b} \right)} \right|\text{ > }Thr $$

(4)

The presence or absence of a GE is given by checking for D at the current coordinates of the search grid:

$$ GE\text{?} = D(0,0)\text{?} $$

(5)

And then, the need to perform an SL or SR is given by the expressions in (6), where logical complements have been denoted by bars above the respective terms:

$$ \begin{array}{*{20}c} {SL\text{? = }\left( {\overline{{D\text{?}\left( { - 1,0} \right)}} \;{\& }\;\overline{{D\left( {0,1} \right)}} } \right)\,\;{\& }\;\left( {\overline{{D\text{?}\left( {0, - 1} \right){\& }\overline{{D\left( {1,0} \right)}} }} } \right)} \\ {SR\text{? = }\left( {\overline{{D\text{?}\left( {0, - 1} \right)}} \;{\& }\;\overline{{D\left( {1,0} \right)}} } \right)\;{\& }\;\left( {\overline{{D\text{?}\left( { - 1,0} \right){\& }\overline{{D\left( {0,1} \right)}} }} } \right)} \\ \end{array} $$

(6)

The GE measure is common in short-term PDA comparisons, from where its name was taken [17, 18]. The value of Thr has been expressed in the literature either in time units [17, 19, 20] or in percentage of the average T_r [18, 21, 22]. The latter approach was used in [15], choosing a value of Thr equal to 3% of the average T_r.

After all Global and Local Constraints have been checked, and the indexes n_r and n_d properly modified, T_dAl is correspondingly modified (if needed) to better match T_r, and the algorithm proceeds to check the contour at the next values of n_r and n_d. The DTW ends whenever any of the contours contour reaches its end as indexed by the current values of n_r and n_d. The alignment performed on T_dAl consists in replacing its value in index n_r with the value of T_d(n_d).

2.1 Modification to Make Alignment Bidirectional

The alignment performed in [15] could be considered, however, incomplete, due to this unilateral alignment: only T_d is modified to correspond to T_r. In this way, there will be indexes in T_dAl with no corresponding aligned values in T_r. In this paper we slightly modify the DTW procedure so that the alignment occurs bilaterally: from Tr indexes to T_dAl (as the original) but also from T_d indexes to an aligned version of T_r, namely T_rAl. In this way, T_dAl and T_rAl are better suited as aligned detected and reference contours, to evaluate an inter-contour variability measure, than the previously used pair T_dAl and T_r. The modified flow diagram of the DTW procedure used here is shown in the right panel of Fig. 1. There are two main differences between the alignments performed on both methods, represented in left and right panels:

First, the alignment of the contours occurs in this case not after all constraints checking were performed (“Alignment” section in left panel), but within the actions taken for a particular condition producing a modification of either index n_r or n_d (i.e. the occurrence of a PI, PD, SL or SR in right panel).
Second: The alignment is not performed here by assignment, but by suppression of the value without a corresponding element in the companion contour. The element to suppress in each case is pointed to by the ^ sign within the square encasing the actions corresponding to the particular condition (right panel).

In short, the element [n_d − (SL + PI)] in T_dAl is suppressed every time a PI or SL occurs, while the element [n_r − (SR + PD)] in T_rAl is suppressed every time a PD or SR occurs. These actions are in line with the classical ‘insertion’ and ‘deletion’ operations defined for string sequences in [23]. The third operation (‘change’) is still present when a GE occurs without an SL or SR condition met.

3 Experiments Performed

Both alignment procedures were evaluated by applying them to reference and detected contours corresponding to synthetic and real signals. Synthetic signals allow introducing periodicity perturbations of any magnitude desired, while real signals serve as validation, in case the synthesis procedure were to be contested.

Three well known PDAs were used, so that different performances are to be expected and tested. Among the PDAs are the ones included by default in the freely available Praat system [24], namely the Peak Picking (P-P) and the Cross Correlation (C-C) based methods. Praat is used as a reference in the evaluation of other systems and methods in [9, 12, 25,26,27,28,29], among many other studies. A third PDA evaluated is the Super-Resolution (S-R) method described in [30], which has shown very good results elsewhere [7, 15, 18]. The internal implementation of these PDAs is out of the scope of this paper, however, they are a representative sample of the best performing and more frequently used PDAs, as the previous references may prove.

3.1 Synthetic Signals

Synthetic signals were obtained according to the source filter model [31]. The standard pulse duration T_E was chosen to correspond to a frequency of 150 Hz, with sampling frequency of 22050 Hz. A single glottal pulse waveform was generated according to the polynomial model type C in [32], with rising time of 0.33 T_E and falling time of 0.09 T_E, which reportedly produced the most natural-sounding synthesis.

A vocal tract configuration corresponding to a vowel “a” with resonances defined by the frequencies, amplitudes and bandwidths already used in [7, 15, 30, 33,34,35]. Periodicity perturbations were introduced in seven levels of perturbation, ranging from quasi-periodic to highly aperiodic signals, incorporating random jitter and shimmer (pulse amplitude variability) values per individual pulse up to a maximum % as given in Table 1. Additive noise is added such that the signal-to-noise ratio (SNR) expressed in dB is also given in the table.

Table 1. Values of the individual perturbations per level, combined in the synthetic signals.

Full size table

A total of 300 contiguous pulses, corresponding to roughly two seconds of signal, are synthesized for each perturbation level, with the reference pulse boundaries P_r(n) available from the synthesis procedure. More details in the synthesis procedure can be found in the references provided.

3.2 Real Signals

The hand-marked pulse positions from the 29 pathological signals used in [15] were available for this paper as P_r(n). The acoustic waveforms are available from the Massachusetts Eye and Ear Infirmary Database [36]. The hand-marking includes 4695 pulse markers, with average F0 of 161 Hz, close to the value in the synthetic signals.

3.3 Evaluation

Both alignment procedures, as depicted in left and right panels of Fig. 1, were applied to the available reference P_r(n) and detected P_d(n) resulting from the different PDAs. In the case of the synthetic signals, 100 realizations were obtained for each level, to report averaged measures.

All methods were programmed in MatLab R2017, and a script calling Praat’s PDAs was executed within the MatLab environment. Three results are to be reported for β values: the first being value obtained for non aligned contours, denoted simply as β. The second is the value obtained from the pair of sequences T_dAl and T_r, following the procedure in [15] and shown in the left panel of Fig. 1, and denoted β_O. Finally, the third value is obtained from applying Eq. (2) to the sequences T_dAl and T_rAl, obtained according to the corrections described here and represented in the right panel of Fig. 1 and denoted as β_C.

4 Results and Discussion

4.1 Synthetic Signals

As mentioned in description of the synthetic signals, 100 realizations of the two second synthesis were obtained. The pulse sequences were aligned to the reference ones, and the 100 Beta values corresponding to each level per PDA were averaged. These values are shown in Table 2.

Table 2. Values of the three alternatives for β obtained for the three PDAs on synthetic signals

Full size table

A first conspicuous result is that the S-R PDA greatly outperforms both Praat based variants. In fact, in the most degraded level the P-P variant did not report an average value, since some sequences returned empty from the Praat system, and for the 5^th and 6^th perturbation level the variability was higher than 100%. But PDAs performance is not the main subject of analysis here, but the behavior of our proposed modifications in the value of β_C.

With respect to this issue, it is noticeable the manifest tendency for reduction from the un-aligned β, to the aligned β_O, to our alignment proposal β_C. This tendency is of course larger for the worst PDAs, where the occurrence of misalignments is expected to be higher. But, even for the almost-unaffected S-R PDA, the values of β_C are always equal or lower than β_O.

4.2 Real Signals

The average values for the three variants of β obtained for the 29 signals are shown in Table 3, with rows corresponding to each PDA considered.

Table 3. Averaged values of the three alternatives for β obtained for the three PDAs on the 29 hand-marked signals

Full size table

The tendency to obtain reduced inter-contour variability as we move from the non aligned variant to the original alignment described in [15] and then to the corrections of the alignment described here is manifest for the three PDAs. Here again the Praat based PDAs show the poorest performance, reinforcing the need to check for tweaking the default settings. However, the PDA performance is not our objective in this paper. More interesting results the fact that the reduction is more noticeable in the C-C PDA. It seems that previous β and β_O values were more affected by insertion and deletions of pulses in C-C than by the magnitudes of the GE errors, the opposite of the P-P variant, which shows a smaller reduction towards β_C.

5 Conclusions

A correction to the DTW procedure proposed in [15] has been described, which allows to perform a more realistic comparison of the PDAs performances in terms of β, the variability between the reference and detected pulse lengths contours which can be credited completely to the functioning of the PDA.

The other measures of performance produced by the DTW algorithm (GE, PI, PD, SR & SL) are unaffected by the corrections made, so any previous result obtained with the uncorrected DTW method hold for all these measures. The ability of the correction to reduce the described inflation of β was shown in both real and synthetic signals. The modification introduced is simple to implement, and the authors will make public the code in the near future.

References

Carding, P.N., Mathieson, L.: Voice and speech production. In: Gleeson, M. (ed.) Scott Brown’s Otorhinolaryngology, Head & Neck Surgery, vol. II, 7th edn, pp. 2164–2169. Hodder Education, London (2008)
Chapter Google Scholar
Babacan, O., Drugman, T., D’Alessandro, N., Henrich, N., Dutoit, T.: A quantitative comparison of glottal closure instant estimation algorithms on a large variety of singing sounds. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, pp. 1702–1706 (2013)
Google Scholar
Rao, K.S., Vuppala, A.K.: Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun. 55(6), 745–756 (2013)
Article Google Scholar
Rao, K.S., Maity, S., Reddy, V.R.: Pitch synchronous and glottal closure based speech analysis for language recognition. Int. J. Speech Technol. 16(4), 413–430 (2013)
Article Google Scholar
Deshpande, P.S., Manikandan, M.S.: Effective glottal instant detection and electroglottographic parameter extraction for automated voice pathology assessment. IEEE J. Biomed. Heal. Inform. 22(2), 398–408 (2018)
Article Google Scholar
Linder, R., Albers, A.E., Hess, M., Pöppl, S.J., Schönweiler, R.: Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. J. Voice 22(2), 155–163 (2008)
Article Google Scholar
Parsa, V., Jamieson, D.G.: A comparison of high precision F0 extraction algorithms for sustained vowels. J. Speech Lang. Hear. Res. 42(1), 112–126 (1999)
Article Google Scholar
Veprek, P., Scordilis, M.S.: Analysis, enhancement and evaluation of five pitch determination techniques. Speech Commun. 37(3–4), 249–270 (2002)
Article Google Scholar
Dejonckere, P.H., Schoentgen, J., Giordano, A., Fraj, S., Bocchi, L., Manfredi, C.: Validity of jitter measures in non-quasi-periodic voices. Part I: perceptual and computer performances in cycle pattern recognition. Logop. Phoniatr. Vocology 36(March), 70–77 (2011)
Article Google Scholar
Manfredi, C., Giordano, A., Schoentgen, J., Fraj, S., Bocchi, L., Dejonckere, P.H.: Validity of jitter measures in non-quasi-periodic voices. Part II: the effect of noise. Logop. Phoniatr. Vocology 36(2), 78–89 (2011)
Article Google Scholar
Dejonckere, P.H., Giordano, A., Schoentgen, J., Fraj, S., Bocchi, L., Manfredi, C.: To what degree of voice perturbation are jitter measurements valid? a novel approach with synthesized vowels and visuo-perceptual pattern recognition. Biomed. Sig. Process. Control 7(1), 37–42 (2012)
Article Google Scholar
Manfredi, C., Giordano, A., Schoentgen, J., Fraj, S., Bocchi, L., Dejonckere, P.H.: Perturbation measurements in highly irregular voice signals: Performances/validity of analysis software tools. Biomed. Sig. Process. Control 7(4), 409–416 (2012)
Article Google Scholar
Baken, R.J., Orlikoff, R.F.: Clinical Measurement of Speech and Voice, 2nd edn. Cengage Learning, Boston (2000)
Google Scholar
Buder, E.H.: Acoustic analysis of voice quality: a tabulation of algorithms 1902–1990. In: Kent, R.D., Ball, M.J. (eds.) Voice Quality Measurement, pp. 119–244. Singular, San Diego (2000)
Google Scholar
Ferrer, C., Torres, D., Hernández-Díaz, M.E.: Using dynamic time warping of T0 contours in the evaluation of cycle-to-cycle pitch detection algorithms. Pattern Recogn. Lett. 31(6), 517–522 (2010)
Article Google Scholar
Tsanas, A., Zañartu, M., Little, M.A., Fox, C.M., Ramig, L.O., Clifford, G.D.: Robust fundamental frequency estimation in sustained vowels: detailed algorithmic comparisons and information fusion with adaptive Kalman filtering. J. Acoust. Soc. Am. 135(5), 2885–2901 (2014)
Article Google Scholar
Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., McGonegal, C.A.: A comparative performance study of several pitch detection algorithms. Acoust. Speech Sig. Process. IEEE Trans. 24(5), 399–418 (1976)
Article Google Scholar
Bagshaw, P.C., Miller, S.M., Jack, M.A.: Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching. In: 3rd European Conference on Speech Communication and Technology EUROSPEECH 1993, pp. 1003–1006 (1993)
Google Scholar
Wise, J.D., Caprio, J.R., Parks, T.W.: Maximum likelihood pitch estimation. IEEE Trans. Acoust. 24(5), 418–423 (1976)
Article Google Scholar
Shahnaz, C., Zhu, W.P., Ahmad, M.O.: Robust pitch estimation at very low SNR exploiting time and frequency domain cues. In: ICASSP, IEEE International Conference Acoustics, Speech, and Signal Processing – Proceedings, vol. I, no. February 2005 (2005)
Google Scholar
Nakatani, T., Irino, T.: Robust and accurate fundamental frequency estimation based on dominant harmonic components. J. Acoust. Soc. Am. 116(6), 3690–3700 (2004)
Article Google Scholar
de Cheveigné, A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111(4), 1917–1930 (2002)
Article Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Article MathSciNet Google Scholar
Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 5 (2002)
Google Scholar
Amir, O., Wolf, M., Amir, N.: A clinical comparison between two acoustic analysis softwares: MDVP and Praat. Biomed. Sig. Process. Control 4(3), 202–205 (2009)
Article Google Scholar
Maryn, Y., Corthals, P., De Bodt, M.S., Van Cauwenberge, P., Deliyski, D.D.: Perturbation measures of voice: a comparative study between multi-dimensional voice program and praat. Folia Phoniatr. Logop. 61(4), 217–226 (2009)
Article Google Scholar
Hanschmann, H., Gärtner, S., Berger, R.: Comparability of computer-supported concurrent voice analysis. Folia Phoniatr. Logop. 67(1), 8–14 (2015)
Article Google Scholar
Burris, C., Vorperian, H.K., Fourakis, M., Kent, R.D., Bolt, D.M.: Quantitative and descriptive comparison of four acoustic analysis systems: vowel measurements. J. Speech Lang. Hear. Res. 57(1), 26–45 (2014)
Article Google Scholar
Hagmüller, M., Kubin, G.: Poincaré pitch marks. Speech Commun. 48(12), 1650–1665 (2006)
Article Google Scholar
Medan, Y., Yair, E., Chazan, D.: Super resolution pitch determination of speech signals. IEEE Trans. Signal Process. 39(1), 40–48 (1991)
Article Google Scholar
Fant, G.: Acoustic Theory of Speech Production, 1st edn. Mouton, The Hage (1960)
Google Scholar
Rosenberg, A.E.: Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Am. 49(2B), 583–590 (1971)
Article Google Scholar
Ferrer, C.A., González, E., Hernández-Díaz, M.E.: Evaluation of time and frequency domain-based methods for the estimation of harmonics-to-noise-ratios in voice signals. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 406–415. Springer, Heidelberg (2006). https://doi.org/10.1007/11892755_42
Chapter Google Scholar
Ferrer, C., González, E., Hernández-Díaz, M.E., Torres, D., Del Toro, A.: Removing the influence of shimmer in the calculation of harmonics-to-noise ratios using ensemble-averages in voice signals. EURASIP J. Adv. Sig. Process. 2009(1), 784379 (2009). https://doi.org/10.1155/2009/784379
Article Google Scholar
Ferrer, C., Hernández-Díaz, M.E., González, E.: Using waveform matching techniques in the measurement of shimmer in voiced signals. In: Interspeech 2007: 8th Annual Conference of the International Speech Communication Association, pp. 2436–2439 (2007)
Google Scholar
Disordered Voice Database v1.03. Kay Elemetrics Corp. (1994)
Google Scholar

Download references

Acknowledgements

This work was partially supported by an Alexander von Humboldt Foundation Fellowship granted to one of the authors (Ref 3.2-1164728-CUB-GF-E).

Author information

Authors and Affiliations

Informatics Research Center, Central University “Marta Abreu” de Las Villas, Santa Clara, Cuba
Carlos A. Ferrer & Reinier Rodríguez Guillén
Pattern Recognition Lab, Friedrich Alexander University Erlangen-Nuemberg, Erlangen, Germany
Carlos A. Ferrer & Elmar Nöth

Authors

Carlos A. Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Reinier Rodríguez Guillén
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos A. Ferrer .

Editor information

Editors and Affiliations

Uppsala University, Uppsala, Sweden
Ingela Nyström
University of Information Science, Havana, Cuba
Yanio Hernández Heredia
University of Information Science, Havana, Cuba
Vladimir Milián Núñez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrer, C.A., Guillén, R.R., Nöth, E. (2019). Bidirectional Alignment of Glottal Pulse Length Sequences for the Evaluation of Pitch Detection Algorithms. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2019. Lecture Notes in Computer Science(), vol 11896. Springer, Cham. https://doi.org/10.1007/978-3-030-33904-3_67

Download citation

DOI: https://doi.org/10.1007/978-3-030-33904-3_67
Published: 22 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33903-6
Online ISBN: 978-3-030-33904-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Bidirectional Alignment of Glottal Pulse Length Sequences for the Evaluation of Pitch Detection Algorithms

Abstract

Similar content being viewed by others

Measuring Periodicity Perturbations in Pathological Voice: General-Purpose Software vs. Custom-Tailored Methods

Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition

An Efficient DTW-Based Approach for Melodic Similarity in Flamenco Singing

Keywords

1 Introduction

2 Existing DTW Procedure: Problem and Solution Proposed

2.1 Modification to Make Alignment Bidirectional

3 Experiments Performed

3.1 Synthetic Signals

3.2 Real Signals

3.3 Evaluation

4 Results and Discussion

4.1 Synthetic Signals

4.2 Real Signals

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Bidirectional Alignment of Glottal Pulse Length Sequences for the Evaluation of Pitch Detection Algorithms

Abstract

Similar content being viewed by others

Measuring Periodicity Perturbations in Pathological Voice: General-Purpose Software vs. Custom-Tailored Methods

Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition

An Efficient DTW-Based Approach for Melodic Similarity in Flamenco Singing

Keywords

1 Introduction

2 Existing DTW Procedure: Problem and Solution Proposed

2.1 Modification to Make Alignment Bidirectional

3 Experiments Performed

3.1 Synthetic Signals

3.2 Real Signals

3.3 Evaluation

4 Results and Discussion

4.1 Synthetic Signals

4.2 Real Signals

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation