Skip to main content

Advertisement

Log in

Glottal Gap tracking by a continuous background modeling using inpainting

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

The visual examination of the vibration patterns of the vocal folds is an essential method to understand the phonation process and diagnose voice disorders. However, a detailed analysis of the phonation based on this technique requires a manual or a semi-automatic segmentation of the glottal area, which is difficult and time consuming. The present work presents a cuasi-automatic framework to accurately segment the glottal area introducing several techniques not explored before in the state of the art. The method takes advantage of the possibility of a minimal user intervention for those cases where the automatic computation fails. The presented method shows a reliable delimitation of the glottal gap, achieving an average improvement of 13 and 18% with respect to two other approaches found in the literature, while reducing the error of wrong detection of total closure instants. Additionally, the results suggest that the set of validation guidelines proposed can be used to standardize the criteria of accuracy and efficiency of the segmentation algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Bohr C, Kräck A, Dubrovskiy D, Eysholdt U, Švec J, Psychogios G, Ziethe A, Döllinger M (2014) Spatiotemporal analysis of high-speed videolaryngoscopic imaging of organic pathologies in males. J Speech Lang Hear Res 57(4):1148–1161

    Article  PubMed  Google Scholar 

  2. Voigt D, Döllinger M, Braunschweig T, Yang A, Eysholdt U, Lohscheller J (2010) Classification of functional voice disorders based on phonovibrograms. Artif Intell Med 49(1):51–59

    Article  PubMed  Google Scholar 

  3. Döllinger M., Lohscheller J, Švec JG, McWhorter A, Kunduk M (2011) Support Vector Machine Classification of Vocal Fold Vibrations Based on Phonovibrogram. Intech

  4. Unger J, Lohscheller J, Reiter M, Eder K, Betz CS, Schuster M (2015) A noninvasive procedure for early-stage discrimination of malignant and precancerous vocal fold lesions based on laryngeal dynamics analysis. Cancer Res 75(1):31–39

    Article  CAS  PubMed  Google Scholar 

  5. Herbst CT, Lohscheller J, Švec JG, Henrich N, Weissengruber G, Fitch WT (2014) Glottal opening and closing events investigated by electroglottography and super-high-speed video recordings. J Exp Biol 217(6):955–963

    Article  PubMed  Google Scholar 

  6. Švec JG, Schutte HK (1996) Videokymography: high-speed line scanning of vocal fold vibration. J Voice 10:201–5

    Article  PubMed  Google Scholar 

  7. Walker J, Murphy P (2007) Progress in nonlinear speech processing. ch. A Review of Glottal Waveform Analysis. Springer, Berlin, pp 1–21

    Book  Google Scholar 

  8. Lohscheller J, Toy H, Rosanowski F, Eysholdt U, Dollinger M (2007) Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos. IEEE Trans Med Imaging 11(4):400–413

    Google Scholar 

  9. Karakozoglou S-Z, Nathalie H, D’Alessandro C, Stylianou Y (2011) Automatic glottal segmentation using local-based active contours and application to glottovibrography. Speech Comm 54(5):641–654

    Article  Google Scholar 

  10. Lohscheller J, Eysholdt U (2008) Phonovibrography: mapping high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics. IEEE Trans Med Imaging 27(3):300–309

    Article  PubMed  Google Scholar 

  11. Yan Y, Du G, Zhu C, Marriott G (2012) Snake based automatic tracing of vocal-fold motion from high-speed digital images. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 593–596

  12. Skalski A, Zielinki T, Deliyski D (2008) Analysis of vocal folds movement in high speed videoendoscopy based on level set segmentation and image registration. In: International conference on signals and electronic systems, ICSES, pp 223–226

  13. Mehta DD, Deliyski DD, Quatieri TF, Hillman RE (2013) Automated measurement of vocal fold vibratory asymmetry from high-speed videoendoscopy recordings. J Speech Lang Hear Res 54(1):47–54

    Article  Google Scholar 

  14. Chen J, Gunturk BK, Kunduk M (2013) Glottis segmentation using dynamic programming. In: Proceeding of SPIE, medical imaging, image processing, vol. 8669, pp 86693L

  15. Moukalled HJ, Deliyski DD, Schwarz RR, Wang S (2009) Segmentation of laryngeal high-speed videondoscopy in temporal domain using paired active contours. In: Sixth international workshop on models and analysis of vocal emissions for biomedical applications, MAVEBA, pp 137–140

  16. Demeyer J, Dubuisson T, Gosselin B, Remacle M (2009) Glottis segmentation with a high-speed glottography: a fully automatic method. In: 3rd advanced voice function assessment international workshop, pp 113–116

  17. Elidan G, Elidan J (2012) Vocal folds analysis using global energy tracking. J Voice 26:760–768

    Article  PubMed  Google Scholar 

  18. Andrade-Miranda G, Godino-Llorente JI, Moro-Velázquez L, Gómez-García JA (2015) An automatic method to detect and track the glottal gap from high speed videoendoscopic images. BioMedical Engineering OnLine 14(1):1–29

    Article  Google Scholar 

  19. Lee JS, Kim E, Sung MW, Kim KH, Sung MY, Park KS (2001) A method for assessing the regional vibratory pattern of vocal folds by analysing the video recording of stroboscopy. Med Biol Eng Comput 39(3):273–278

    Article  CAS  PubMed  Google Scholar 

  20. Osma-Ruiz V, Godino-Llorente JI, Sáenz-Lechón N, Fraile R (2008) Segmentation of the glottal space from laryngeal images using the watershed transform. Comput Med Imaging Graph 32:193–201

    Article  PubMed  Google Scholar 

  21. Gloger O, Lehnert B, Schrade A, Volzke H (2015) Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions. IEEE Trans Biomed Eng 62:795–806

    Article  PubMed  Google Scholar 

  22. JungHwan O, Sae H, JeongKyu L, Wallapak T, Johnny W, Piet dGC (2007) Informative frame classification for endoscopy video. Med Image Anal 11(2):110–127

    Article  Google Scholar 

  23. Mallick S, Zickler T, Belhumeur P, Kriegman D (2006) Specularity removal in images and videos: a PDE approach. In: Computer Vision â ECCV 2006, vol. 3951 of lecture notes in computer science, pp 550–563. Springer, Berlin

  24. Paris S, Kornprobst P, Tumblin J, Durand F (2009) Bilateral filtering: Theory and applications. Foundations and Trends in Computer Graphics and Vision 4(1):1–73

    Article  Google Scholar 

  25. Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Computing Surveys, vol. 38

  26. Telea A (2004) An image inpainting technique based on the fast marching method. J Graphics, GPU, Game Tools 9(1):23–34

    Article  Google Scholar 

  27. Ridler TW, Calvard S (1978) Picture thresholding using an iterative selection method. IEEE Trans Syst Man Cybern 8:630–632

    Article  Google Scholar 

  28. Birkholz P (2016) Glottalimageexplorer - an open source tool for glottis segmentation in endoscopic high-speed videos of the vocal folds. In: Jokisch, O (ed.) Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2016, TUDPress-Dresden

  29. Zhang H, Fritts JE, Goldman SA (1996) A survey on evaluation methods for image segmentation. Pattern Recog 29:1335–1346

    Article  Google Scholar 

  30. Ko T, Ciloglu T (2014) Automatic segmentation of high speed video images of vocal folds. J Appl Math 2014:16

    Google Scholar 

  31. Taha AA, Hanbury A (2015) Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC Medical Imaging 15(1):1–28

    Article  Google Scholar 

Download references

Acknowledgements

This work has been funded by the Spanish Ministry of Economy and Competitivity under grant TEC2012-38630-C04-01 and by the Spanish Ministry of Education under grant PRX15/00385.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gustavo Andrade-Miranda.

Appendix

Appendix

In order to assess the glottal segmentation, different metrics have been proposed in the literature: DICE and an area error [21]; a multipoint scale comparison [9]; mean square error [30]; and tracking and comparison of some points of interest [8]. However, it is unreasonable to expect all the metrics be valid for the glottal segmentation problem, since each metric has sensitivities to different properties of the segmentation and thus can discover different types of error [31]. Hence, there is a need for standardizing not only the metrics to be used but also the definition of each metric. In order to overcome this problem, we propose a complete objective framework to evaluate the glottal segmentation.

The initial metrics computed in this study can be categorized depending on their nature and their definition as overlap based, pair-counting based, information theory based, probabilistic based, and spatial distance based. For a detailed description of all the metrics enumerated, next refer to [31].

The first group computes the degree of overlap between two segmentations. To this group belongs DICE or overlap index, Jaccard (JAC), true positive rate (TPR) or sensitivity, true negative rate (TNR) or specificity, F1-Score (FMS), false positive rate (FPR), false negative rate (FNR), positive predictive value or precision (PPV), global constancy error (GCE), and object-level consistency error (OCE).

The second group measures the similarity between clusterings. One of its important properties is that is not based on labels, and thus can be used to evaluate clusterings as well as classifications. The metrics implemented in this work were Rand Index (RI) and Adjusted Rand Index (ARI).

The third group computes a measure of information content for each segmentation. The variation of information (VI) figure of merit belongs to this group. It measures the amount of information that one segmentation shares with the other.

The fourth group describes metrics defined as functions of statistics calculated from the pixels in the overlapped regions of the segmentations. The metrics included are Cohen Kappa coefficient (KAP) and the area under the ROC curve (AUC).

Lastly, the metrics based on spatial distance are defined as functions of the euclidean distances between the pixels that belong to the ground-truth and the pixels of the automatic segmentation. The metrics used in this category are Hausdorff distance (HD), average Hausdorff distance (AHD), and Pratt Index.

However, not all of these 18 metrics were finally considered to objectively evaluate the segmentation accuracy of this study. The properties to be fulfilled by the metrics to be considered part of the good metrics set are the following:

  • Contour accuracy: The segmentations have to provide boundary delimitations as exact as possible. The metrics that are more sensitive to point positions, as distance based metrics, are more suitable to evaluate the segmentation.

  • Degree of overlapping: The segmentations have to provide a correct location of the segmented object (alignment between segmentation and ground-truth). This aspect is important to rank correctly the instants of total closure. The metrics suitable for this property are the overlap based.

  • Complex boundary: The segmentations lead with non-regular shapes, thus the metrics that are sensitive to pixels positions are more suitable to evaluate the final results. The most suitable metrics are the distance-based ones.

  • Background dominates: All the metrics based on a true negative factor (pixels correctly detected as background) have to be avoided. Such metrics are biased with respect to the ratio between the number of foreground pixels (glottis) and the number of background pixels (glottal structures), producing a class imbalance when the background represents the largest part, as occurs in the glottal segmentation.

  • Over and under-segmentation penalization: The metrics have to penalize equally the over and under-segmentation. Thus, metrics such as VI, FPR, PPV, and TPR have to be avoided.

  • High class imbalance: when the segmentation process produces small regions, metrics with chance adjustment, such as KAP and ARI, are recommended.

  • Outlier sensitivity: Sometimes, automatic segmentations have outliers in form of a small set of pixels outside of the right target area. The outlier sensitivity describes metrics that penalize such outliers.

In those cases with no unique metric fulfilling all the properties at the same time, a combination of more than one metric will be necessary. Also, a good practice is to reject metrics that have similar definitions to avoid redundant information. Table 6 shows in rows the metrics, and in columns the guidelines that have to be followed. The three last columns summarize the results of the three trials, μ represents the average accuracy of 760 images analyzed, and 𝜖 close rates how many times an image was ranked with 0. A zero 𝜖 close can be understood as not overlapping or not segmented images, which is related with the error introduced at the closed instants.

Table 6 Summary of the 18 metrics with the selection guidelines. Each row corresponds to one of the metrics, the first seven columns correspond to the properties evaluated to be part of the good metrics set, and the last three columns are the average values of each metric for the three assessments

Having the aforementioned in mind, Table 6 shows a check (✓) to denote a metric that is recommended for the corresponding property; a cross (X) denotes that the metric is not recommended; and empty cells denote neutrality. The good metrics are the ones that have at least one check without crosses. The metrics that satisfy this statement are highlighted in yellow: DICE, JAC, FMS, ARI, KAP, AHD, and Pratt. In order to avoid redundancy, JAC and FMS were excluded because they provide a similar ranking than the DICE coefficient (JAC and FMS are derived from the DICE equation). Following the same criteria, AHD and Pratt are metrics based on distance errors, so one of them may be excluded. Since AHD does not rank the similarity between segmentations in a range scale (as Pratt does, between 0 and 1), we consider that is less intuitive, and it has also been excluded.

Thus, the metrics that best suit the guidelines are one based on overlapping (DICE), one based on pair-counting (ARI), one based on probabilistic means (KAP), and one based on distance (Pratt).

Lastly, in order to verify the concordance between the metrics, pairwise Pearson’s correlation coefficients were calculated. The 760 ranks obtained for each metric are correlated between them and deployed on Table 7. Results show a great correlation between metrics hence we choose DICE and Pratt as good metrics.

Table 7 Pearson’s correlation coefficients among the good metrics. Correlations correspond to MaN vs. InP, MaN vs. SrG, and MaN vs. SnW trials

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Andrade-Miranda, G., Godino-Llorente, J.I. Glottal Gap tracking by a continuous background modeling using inpainting. Med Biol Eng Comput 55, 2123–2141 (2017). https://doi.org/10.1007/s11517-017-1652-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-017-1652-8

Keywords

Navigation