Elsevier

Pattern Recognition

Volume 44, Issue 6, June 2011, Pages 1225-1234
Pattern Recognition

Sketch recognition by fusion of temporal and image-based features

https://doi.org/10.1016/j.patcog.2010.11.006Get rights and content

Abstract

The increasing availability of pen-based hardware has recently resulted in a parallel growth in sketch-based user interfaces. Sketch-based user interfaces aim to combine the expressive power of free-hand sketching with the processing power of computers. Most sketch-based systems require intelligent ink processing capabilities, which makes the development of robust sketch recognition algorithms a primary concern in the field. So far, the research in sketch recognition has produced various independent approaches to recognition, each of which uses a particular kind of information (e.g., geometric and spatial constraints, image-based features, temporal stroke-ordering patterns). These methods were designed in isolation as stand-alone algorithms, and there has been little work treating various recognition methods as alternative sources of information that can be combined to increase sketch recognition accuracy. In this paper, we focus on two such methods and fuse an image-based method with a time-based method in an attempt to combine the knowledge of how objects look (image data) with the knowledge of how they are drawn (temporal data). In the course of combining spatial and temporal information, we also introduce a mathematically well founded fusion method for combining recognizers. Our combination method can be used for isolated sketch recognition as well as full diagram recognition. Our evaluation with two databases shows that fusing image-based and temporal features yields higher recognition rates. These results are the first to confirm the complementary nature of image-based and temporal recognition methods for full sketch recognition, which has long been suggested, but never supported by data.

Introduction

Sketching is a natural way of expressing and sharing ideas. It allows us to succinctly convey concepts on paper. These qualities of sketching has caught the attention of many graphics application designers who have started exploring graphics applications that can take advantage of intelligent sketch-based interfaces. In addition, the increasing availability of Tablet PCs and other hardware that support pen-based interaction has led to increased interest in interactive graphics applications that can interpret hand-drawn sketches.

At the core of these interactive sketch-based graphics applications lies the sketch recognition technology. Given a hand-drawn sketch, sketch recognition can informally be defined as the task of finding groups of ink in the sketch that represent individual objects (segmentation), and then determining the class of the object represented by each ink group (object recognition). So far, researchers have attempted to address both issues within recognition frameworks that mainly differ by the particular kind of information used.

For example, some authors assumed simple definition of drawings and treated icons as gestures. This group of work use distinguishing global features extracted from single or multiple strokes for object recognition [1], [2].

Others preferred to define objects using geometric and spatial constraints [3], [4], [5], [6], [7]. These constraint-based approaches are founded on cognitive science studies which suggest that, when shown a symbol, people attend preferentially to certain geometric features (e.g., a rectangle is formed by two pairs of lines of equal length, and the lines meet with a 90° angle).

Other authors have taken a more computer-vision-like approach to recognition and formulated image-based algorithms that use image features such as pixel intensities, and intensity histograms [8], [9], [10].

A fourth class of recognition algorithms are based on the temporal stroke-ordering patterns that are naturally used while drawing diagrams [11], [12], [13], [14]. The motivation for these time-based approaches is based on the observation that when people sketch objects, they use highly characteristic drawing orders (e.g., when drawing a stick figure, most people draw the head first, and then respectively draw the body, the legs and the arms). Hence the stroke-ordering patterns in sketches can be used for sketch recognition.

So far, research efforts have mostly focused on getting the best recognition accuracy with any one of the approaches listed above (gesture, constraint, image, and time-based approaches). Relatively little effort has been spent to explore how various recognition methods can be used as individual sources of information, and combined to boost sketch recognition accuracy. Specifically, the issue of how temporal recognition methods can be combined with others for segmenting and recognizing complete sketches has not been studied.

This paper is a step in this direction. We focus on combining image-based and time-based recognition methods. We have three main contributions:

  • Drawing upon results from combining classifiers, we choose a set of combination methods and evaluate them for combining image-based and time-based recognizers.2

  • We describe a mathematically well-founded classifier combination method for full sketch recognition (i.e., continuous sketch recognition).

  • Using two databases, we show that fusing image-based and temporal features yields better recognition rates compared to using either method alone. These results not only show the virtues of combining multiple recognition methods, but are also the first to show the complementary nature of image and time-based methods for full sketch recognition, which has long been suggested, but never supported by data.

In the rest of this paper, we first describe an image-based recognition algorithm that uses Zernike moments, and a time-based sketch recognition algorithm that uses Hidden Markov Models. In Section 4, we describe five methods for classifier fusion that are subsequently used for fusing image-based and time-based features for isolated symbol recognition. In Section 5, we describe how isolated symbol recognizers can be combined using dynamic programming to simultaneously segment and recognize entire sketches with many symbols. In the evaluation section, first we evaluate the performance of the five classifier fusion methods for isolated symbol recognition using two different databases. Then, we report recognition accuracies of image-based, time-based, and combined recognition methods for recognizing full sketches. We also report the runtime for our recognition and preprocessing algorithms. We conclude with related work and a summary of future research directions.

Section snippets

Image-based recognition method: Zernike moments

Although there are many image-based recognition methods, we adopt one based on Zernike moments, which was demonstrated as a simple and effective method for sketch recognition. Our use of Zernike moment features for sketch recognition is based on work by Hse et al. [9], and we refer the reader to this work for the details of feature extraction using Zernike features.

Zernike moments work with bitmap image representations, where the input is represented by a function f(x,y), which is equal to 1 if

Time-based recognition method: hidden Markov model (HMM)

Existing time-based methods use either HMMs or Dynamic Bayesian Networks, which generalize HMMs. For our purposes, both approaches are essentially equivalent, hence we use an HMM-based approach.

Fusion of the Zernike moments and HMM methods

The goal of combining multiple information sources is to achieve a superior performance by exploiting the redundant and complementary nature of the information provided by the different sources of information.

The ways in which the information sources can be combined vary depending on the context of the work, and various communities have come up with different taxonomies that emphasize different aspects of the combination operation. For example, in the context of machine learning, one can talk

Segmentation and recognition of full diagrams

A major problem in sketch recognition is segmentation: partitioning a sketch into groups of ink that represent individual symbols. Knowing the correct segmentation of a sketch immensely simplifies recognition. Therefore, some methods force the users to explicitly specify when they finish drawing each symbol making the system less usable, which defeats the main motivation behind sketch-based interfaces.

One could argue that sketch segmentation should not be considered separate from individual

Evaluation of the proposed methods

Our evaluation included measuring the accuracy of isolated object and complete sketch recognition. We also measured the time required for processing each additional stroke, including the time required to fragment the stroke, and the amount of time required for updating the recognition and segmentation hypotheses for each added stroke.

Related work

In this paper, we fused an image-based method with a time-based method in an attempt to combine the knowledge of how objects look (appearance) with the knowledge of how they are drawn (stroke orderings). We focused on appearance and stroke orderings because they not only represent conceptually different aspects of sketching, but they have also been shown to aid recognition individually. However, our combination method is general and any method producing probabilistic confidence values can be

Future work

The HMM-based method can be improved using sophisticated features computed from ink groups or image patches. Hence features do not necessarily have to be primitive-based. For example, carefully designed features based on shape contexts [51], congealing [52] or other local descriptors can be used. Furthermore feature engineering and feature selection techniques, which are outside the scope of our contribution here, can be used to boost accuracy.

As discussed in Section 3.4, one drawback of the

Summary

In this paper, we presented a framework for fusing an image-based method with a time-based method in an attempt to combine the knowledge of how objects look (image data) with the knowledge of how they are drawn (temporal data). This is unlike most existing approaches, which focus on one kind of feature only. We presented evaluation results for two databases illustrating that combining classifiers yields higher recognition accuracies, and confirmed the complementary nature of image-based and

Tevfik Metin Sezgin graduated summa cum laude with Honors from Syracuse University in 1999. He completed his MS in the Artificial Intelligence Laboratory at Massachusetts Institute of Technology in 2001. He received his PhD in 2006 from Massachusetts Institute of Technology. He subsequently moved to University of Cambridge, and joined the Rainbow group at the University of Cambridge Computer Laboratory as a Postdoctoral Research Associate. Dr. Sezgin is currently an Assistant Professor in the

References (54)

  • C. Alvarado et al.

    SketchREAD: a multi-domain sketch recognition engine

  • J.-P. Valois, M. Cote, M. Cheriet, Online recognition of sketched electrical diagrams, in: ICDAR ’01, September 10–13,...
  • S. Mac et al.

    Eager interpretation of on-line hand-drawn structured documents: the DALI methodology

    Pattern Recognition

    (2009)
  • A. Hall, C. Pomm, P. Widmayer, A combinatorial approach to multi-domain sketch recognition, in: Eurographics Workshop...
  • L. Kara, T. Stahovich, An image-based trainable symbol recognizer for sketch-based interfaces, in: AAAI Fall Symposium...
  • H. Hse, A.R. Newton, Sketched symbol recognition using Zernike moments, in: Proceedings of International Conference on...
  • M. Oltmans, Envisioning sketch recognition: a local feature based approach to recognizing informal sketches, Ph.D....
  • S. Simhon, G. Dudek, Sketch interpretation and refinement using statistical models, in: Eurographics Symposium on...
  • W. Jiang, Z.-X. Sun, HMM-based on-line multi-stroke sketch recognition, in: Proceedings of 2005 International...
  • D. Anderson, C. Bailey, M. Skubic, Hidden Markov model symbol recognition for sketch-based interfaces, in: AAAI Fall...
  • T.M. Sezgin et al.

    HMM-based efficient sketch recognition

  • L. Rabiner

    A tutorial on hidden Markov models and selected applications in speech recognition

    Proceedings of the IEEE

    (1989)
  • L. Rabiner et al.

    Fundamentals of Speech Recognition

    (1993)
  • J. Kittler et al.

    On combining classifiers

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1998)
  • L. Xu et al.

    Methods of combining multiple classifiers and their applications to handwriting recognition

    IEEE Transactions on Systems, Man and Cybernetics

    (1992)
  • A. Rahman et al.

    Multiple classifier decision combination strategies for character recognition: a review

    International Journal on Document Analysis and Recognition

    (2003)
  • C.-C. Chang, C.-J. Lin, LIBSVM: A Library for Support Vector Machines, 2001. Software available at...
  • Cited by (0)

    Tevfik Metin Sezgin graduated summa cum laude with Honors from Syracuse University in 1999. He completed his MS in the Artificial Intelligence Laboratory at Massachusetts Institute of Technology in 2001. He received his PhD in 2006 from Massachusetts Institute of Technology. He subsequently moved to University of Cambridge, and joined the Rainbow group at the University of Cambridge Computer Laboratory as a Postdoctoral Research Associate. Dr. Sezgin is currently an Assistant Professor in the College of Engineering at Koç University, Istanbul. His research interests include intelligent human–computer interfaces, multimodal sensor fusion, and HCI applications of machine learning. Dr. Sezgin is particularly interested in applications of these technologies in building intelligent pen-based interfaces. He currently leads the Intelligent User Interfaces at Koç University.

    1

    Part of this work was completed when the authors were at the University of Cambridge.

    View full text