Abstract
Lip segmentation is a fundamental system component in a range of applications including: automatic lip reading, emotion recognition and biometric speaker identification. The first step in lip segmentation involves applying a colour transform to enhance the contrast between the lips and surrounding skin. However, there is much debate among researchers as to the best transform for this task. As such, this article presents the most comprehensive study to date by evaluating 33 colour transforms for lip segmentation: 21 channels from seven colour space models (RGB, HSV, YCbCr, YIQ, CIEXYZ, CIELUV and CIELAB) and 12 additional colour transforms (8 of which are designed specifically for lip segmentation). The colour transform comparison is extended to determine the best transform to segment the oral cavity. Histogram intersection and Otsu’s discriminant are used to quantify and compare the transforms. Results for lip–skin segmentation validate the experimental approach, as 11 of the top 12 transforms are used for lip segmentation in the literature. The necessity of selecting the correct transform is demonstrated by an increase in segmentation accuracy of up to three times. Hue-based transforms including pseudo hue and hue domain filtering perform the best for lip–skin segmentation, with the hue component of HSV achieving the greatest accuracy of 93.85 %. The a* component of CIELAB performs the best for lip–oral cavity segmentation, while pseudo hue and the LUX transform perform reasonably well for both lip–skin segmentation and lip–oral cavity segmentation.








Similar content being viewed by others
References
Chellappa, R., Wilson, C.L., Sirohey, S.: Human and machine recognition of faces: a survey. Proc. IEEE 83(5), 705–741 (1995)
Pandzic, I., Escher, M., Thalmann, N.M.: Facial deformations for mpeg-4. In: Computer Animation 98. Proceedings, pp. 56–62. IEEE (1998)
Lewis, T.W., Powers, D.M.W.: Audio-visual speech recognition using red exclusion and neural networks. Austr. Comput. Sci. Commun. 24, 149–156 (2002)
Coianiz, T., Torresani, L., Caprile, B.: 2d deformable models for visual speech analysis. In: NATO Advanced Study Institute, Speechreading by Man and Machine, Citeseer (2002)
Vogt, M.: Fast matching of a dynamic lip model to color video sequences under regular illumination conditions. Nato ASI Subser. F Comput. Syst. Sci. 150, 399–408 (1996)
Koo, H., Song, H.: Facial feature extraction for face modeling program. Int. J. Circuits Syst. Signal Process. 4(4), 169–176 (2010)
Liang, Y.L., Du, M.H.: Lip extraction method based on a component of lab color space. Comput. Eng. 37(3) (2011)
Wang, S., Lau, W., Leung, S., Yan, H.: A real-time automatic lipreading system. In: Proceedings of the International Symposium on Circuits and Systems, IEEE, vol. 2, pp. II-101-4 (2004)
WenJuan, Y., YaLing, L., MingHui, D.: A real-time lip localization and tacking for lip reading. In: 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), IEEE vol. 6, pp V6-363–V6-366 (2010)
Eveno, N., Caplier, A., Coulon, P.Y.: Key points based segmentation of lips. In: IEEE International Conference on Multimedia and Expo, vol. 2, pp. 125–128. IEEE (2002)
Zhang, X., Mersereau, R.: Lip feature extraction towards an automatic speechreading system. In: International Conference on Image Processing, vol. 3, pp. 226–229. IEEE (2000)
Caplier, A., Stillittano, S., Bouvier, C., Coulon, P.: Lip modelling and segmentation. Lip Segment. Map. Vis. Speech Recognit. 70–127 (2009)
Hurlbert, A.C., Poggio, T.A.: Synthesizing a color algorithm from examples. Science 239(4839), 482–485 (1988)
Lievin, M., Luthon, F.: Nonlinear color space and spatiotemporal mrf for hierarchical segmentation of face features in video. IEEE Trans. Image Process. 13(1), 63–71 (2004)
Goldschen, A.J., Garcia, O.N., Petajan, E.: Continuous optical automatic speech recognition by lipreading. In: 28th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp 572–577. IEEE (1994)
Dahlman, E., Parkvall, S., Beming, P., Bovik, A.C., Fette, B.A., Jack, K., Skold, J., Dowla, F., Chou, P.A., DeCusatis, C.: Communications Engineering Desk Reference. Academic Press, London (2009)
Ford, A., Roberts, A.: Colour Space Conversions. Westminster University, London (1998)
Stork, D.G., Hennecke, M.E.: Speechreading: an overview of image processing, feature extraction, sensory integration and pattern recognition techniques. In: Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, IEEE, pp XVI–XXVI (1996)
Eveno, N., Caplier, A., Coulon, P.Y.: New color transformation for lips segmentation. In: IEEE Fourth Workshop on Multimedia Signal Processing, pp 3–8, iD: 1 (2001)
McClain, M., Brady, K., Brandstein, M., Quatieri, T.: Automated lip-reading for improved speech intelligibility. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-701. IEEE (2004)
Hsu, R.L., Abdel-Mottaleb, M., Jain, A.K.: Face detection in color images. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 696–706 (2002)
Thejaswi, N.S., Sengupta, S.: Lip localization and viseme recognition from video sequences. In: Fourteenth National Conference on Communications (2008)
Hamilton, J.: Color Space Conversion. Green Harbor Publications, Cold Spring Harbor (1992)
Talea, H., Yaghmaie, K.: Automatic visual speech segmentation. In: 3rd International Conference on Communication Software and Networks (ICCSN), pp. 184–188. IEEE (2011)
Canzler, U., Dziurzyk, T.: Extraction of non manual features for videobased sign language recognition. In: IAPR Workshop on Machine Vision Applications, pp. 318–321 (2002)
Gong, Y., Sakauchi, M.: Detection of regions matching specified chromatic features. Comput. Vis. Image Underst. 61(2), 263–269 (1995)
Guan, Y.P.: Automatic extraction of lips based on multi-scale wavelet edge detection. Comput. Vis. IET 2(1), 23–33 (2008)
Zhang, J., Tao, H., Wang, L., Zhan, Y., Song, S.: A real-time approach to the lip-motion extraction in video sequence. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 7, pp. 6423–6428. IEEE (2004)
Ohta, Y., Kanade, T., Sakai, T.: Color information for region segmentation. Comput. Graph. Image Process. 13(3), 222–241 (1980)
Watson, A.B., Poirson, A.: Separable two-dimensional discrete hartley transform. JOSA A 3(12), 2001–2004 (1986)
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Demirkaya, O., Asyali, M.H.: Determination of image bimodality thresholds for different intensity distributions. Signal Process. Image Commun. 19(6), 507–516 (2004)
The MathWorks Inc: Matlab user guide, p. 4. MA, Natick (1998)
Martinez, A.: The AR face database, p. 24. Report, CVC Technical (1998)
Ding, L., Martinez, A.: Features versus context: an approach for precise and detailed detection and delineation of faces and facial features. IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 2022–2038 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gritzman, A.D., Rubin, D.M. & Pantanowitz, A. Comparison of colour transforms used in lip segmentation algorithms. SIViP 9, 947–957 (2015). https://doi.org/10.1007/s11760-014-0615-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-014-0615-x