Abstract
We present a method for producing dense active appearance models (AAMs), suitable for video-realistic synthesis. To this end we estimate a joint alignment of all training images using a set of pairwise registrations and ensure that these pairwise registrations are only calculated between similar images. This is achieved by defining a graph on the image set whose edge weights correspond to registration errors and computing a bounded diameter minimum spanning tree. Dense optical flow is used to compute pairwise registration and a flow refinement method to align small scale texture is introduced. Further, given the registration of training images, vertices are added to the AAM to minimise the error between the observed flow fields and the flow fields interpolated between the AAM mesh points. We demonstrate a significant improvement in model compactness.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abboud, B., Davoine, F., & Dang, M. (2004). Facial expression recognition and synthesis based on an appearance model. Signal Processing: Image Communication, 19(8), 723–740.
Abdalla, A., Deo, N., & Gupta, P. (2000). Random-tree diameter and the diameter constrained MST. In Congressus Numerantium (pp. 161–182).
Baker, S., Matthews, I., & Schneider, J. (2004). Automatic construction of active appearance models as an image coding problem. PAMI, 26(10), 1380–1384.
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In SIGGRAPH (pp. 187–194).
Brox, T., & Bregler, C. (2009). Large displacement optical flow. In CVPR (pp. 41–48).
Cootes, T., & Taylor, C. (2001). Statistical models of appearance for medical image analysis and computer vision. In SPIE Medical Imaging, vol. 4322 (pp. 236–248).
Cootes, T., Twining, C., Petrović, V., Schestowitz, R., & Taylor, C. (2005). Groupwise construction of appearance models using piece-wise affine deformations. In BMVC (pp. 879–888).
Cootes, T., Edwards, G., & Taylor, C. (1998). Active appearance models. ECCV, 2, 484–498.
Cootes, T., Twining, C., Petrović, V., Babalola, K., & Taylor, C. (2010). Computing accurate correspondences across groups of images. PAMI, 32(11), 1994–2005.
Cristinacce, D., & Cootes, T. (2008). Facial motion analysis using clustered shortest path tree registration. In MLVMA Workshop (ECCV)
Deena, S., Hou, S., & Galata, A. (2010). Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model. In ICMI-MLMI (pp. 1–8)
De Floriani, L. (1989). A pyramidal data structure for triangle-based surface description. IEEE Computer Graphics and Applications, 9(2), 67–78.
Garey, M., & Johnson, D. (1979). Computers and intractability: A guide to the theory of NP-completeness. San Francisco: Freeman.
Hamm, J., Ye, D. H., Verma, R., & Davatzikos, C. (2010). Gram: A framework for geodesic registration on anatomical manifolds. Medical Image Analysis, 14(5), 633–642.
Hernandez, M., Bossa, M., & Olmos, S. (2009). Registration of anatomical images using paths of diffeomorphisms parameterized with stationary vector field flows. IJCV, 85(3), 291–306.
Hill, D., Batchelor, P., Holden, M., & Hawkes, D. (2001). Medical image registration. Physics in Medicine and Biology, 46(3), R1.
Julstrom, B. (2009). Greedy heuristics for the bounded diameter minimum spanning tree problem. Journal of Experimental Algorithmics, 14, 1:1.1–1:1.14.
Klaudiny, M., & Hilton, A. (2012). Towards optimal non-rigid surface tracking. In ECCV (pp. 743–756).
Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. PAMI, 28(10), 1568–1583.
Learned-Miller, E. (2006). Data driven image models through continuous joint alignment. PAMI, 28(2), 236–250.
Liu, C. (2009). Beyond pixels: Exploring new representations and applications for motion analysis. Doctoral Thesis, MIT.
Liu, C., Yuen, J., Torralba, A., Sivic, J., & Freeman, W. (2008). Sift flow: dense correspondence across different scenes. In ECCV (pp. 28–42).
Ma, B., Hero, A., Gorman, J., & Michel, O. (2000). Image registration with minimum spanning tree algorithm. International Conference on Image Processing, 1, 481–484.
Marsland, S., Twining, C., & Taylor, C. (2003). Groupwise non-rigid registration using polyharmonic clamped-plate splines. MICCAI, 2879, 771–779.
Matthews, I., & Baker, S. (2004). Active appearance models revisited. IJCV, 60(2), 135–164.
Mittrapiyanuruk, P., DeSouza, G., & Kak, A. (2004). Calculating the 3d-pose of rigid-objects using active appearance models. In International conference on robotics and automation, vol. 5 (pp. 5147–5152).
Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2010). Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. In CVPR (pp. 763–770).
Ramnath, K., Baker, S., Matthews, I., & Ramanan, D. (2008). Increasing the density of active appearance models. In CVPR (pp. 1–8).
Sabuncu, M., & Ramadge, P. (2008). Using spanning graphs for efficient image registration. IEEE Transactions on Image Processing, 17(5), 788–797.
Saragih, J., & Goecke, R. (2006). Learning active appearance models from image sequences. In HCSNet workshop on use of vision in human–computer interaction (pp. 51–60).
Saragih, J., & Goecke, R. (2007). Monocular and stereo methods for aam learning from video. In CVPR (pp. 1–8)
Sidorov, K., Richmond, S., & Marshall, D. (2009). An efficient stochastic approach to groupwise non-rigid image registration. In CVPR (pp. 2208–2213).
Sidorov, K., Richmond, S., & Marshall, D. (2011). Efficient groupwise non-rigid registration of textured surfaces. In CVPR (pp. 2401–2408).
Singh, A., & Gupta, A. (2007). Improved heuristics for the bounded-diameter minimum spanning tree problem. Soft Computing, 11, 911–921.
Smith, B., & Zhang, L. (2012). Joint face alignment with non-parametric shape models. In ECCV (pp. 43–56).
Theobald, B., Matthews, I., Cohn, J., & Boker, S. (2007). Real-time expression cloning using appearance models. In Proceedings of the ACM international conference multimodal interfaces (pp. 134–139).
Tong, Y., Liu, X., Wheeler, F., & Tu, P. (2009). Automatic facial landmark labeling with minimal supervision. In Computer Vision and Pattern Recognition (pp. 2097–2104).
Vetter, T., Jones, M., & Poggio, T. (1997). A bootstrapping algorithm for learning linear models of object classes. In CVPR (pp. 40–46).
Walker, K., Cootes, T., & Taylor, C. (1999). Automatically building appearance models from image sequences using salient features. In BMVC (pp. 463–562).
Wang, L., Han, W., Soong, F., & Huo, Q. (2011). Text driven 3D photo-realistic talking head. In Interspeech (pp. 3307–3308).
Zen, H., Tokuda, K., & Black, A. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1154.
Zhao, C., Cham, W., & Wang, X. (2011). Joint face alignment with a generic deformable face model. In CVPR (pp. 561–568).
Zitova, B., & Flusser, J. (2003). Image registration methods: a survey. Image and Vision Computing, 21(11), 977–1000.
Acknowledgments
We would like to thank Iain Waugh and Norbert Braunschweiler for allowing us to model their faces. We would also like to thank everyone in the Speech Technology Group at Toshiba Research Europe for their help with the visual text-to-speech component of this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Anderson, R., Stenger, B. & Cipolla, R. Using Bounded Diameter Minimum Spanning Trees to Build Dense Active Appearance Models. Int J Comput Vis 110, 48–57 (2014). https://doi.org/10.1007/s11263-013-0661-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0661-9