Sketch recognition by fusion of temporal and image-based features
Introduction
Sketching is a natural way of expressing and sharing ideas. It allows us to succinctly convey concepts on paper. These qualities of sketching has caught the attention of many graphics application designers who have started exploring graphics applications that can take advantage of intelligent sketch-based interfaces. In addition, the increasing availability of Tablet PCs and other hardware that support pen-based interaction has led to increased interest in interactive graphics applications that can interpret hand-drawn sketches.
At the core of these interactive sketch-based graphics applications lies the sketch recognition technology. Given a hand-drawn sketch, sketch recognition can informally be defined as the task of finding groups of ink in the sketch that represent individual objects (segmentation), and then determining the class of the object represented by each ink group (object recognition). So far, researchers have attempted to address both issues within recognition frameworks that mainly differ by the particular kind of information used.
For example, some authors assumed simple definition of drawings and treated icons as gestures. This group of work use distinguishing global features extracted from single or multiple strokes for object recognition [1], [2].
Others preferred to define objects using geometric and spatial constraints [3], [4], [5], [6], [7]. These constraint-based approaches are founded on cognitive science studies which suggest that, when shown a symbol, people attend preferentially to certain geometric features (e.g., a rectangle is formed by two pairs of lines of equal length, and the lines meet with a 90° angle).
Other authors have taken a more computer-vision-like approach to recognition and formulated image-based algorithms that use image features such as pixel intensities, and intensity histograms [8], [9], [10].
A fourth class of recognition algorithms are based on the temporal stroke-ordering patterns that are naturally used while drawing diagrams [11], [12], [13], [14]. The motivation for these time-based approaches is based on the observation that when people sketch objects, they use highly characteristic drawing orders (e.g., when drawing a stick figure, most people draw the head first, and then respectively draw the body, the legs and the arms). Hence the stroke-ordering patterns in sketches can be used for sketch recognition.
So far, research efforts have mostly focused on getting the best recognition accuracy with any one of the approaches listed above (gesture, constraint, image, and time-based approaches). Relatively little effort has been spent to explore how various recognition methods can be used as individual sources of information, and combined to boost sketch recognition accuracy. Specifically, the issue of how temporal recognition methods can be combined with others for segmenting and recognizing complete sketches has not been studied.
This paper is a step in this direction. We focus on combining image-based and time-based recognition methods. We have three main contributions:
- •
Drawing upon results from combining classifiers, we choose a set of combination methods and evaluate them for combining image-based and time-based recognizers.2
- •
We describe a mathematically well-founded classifier combination method for full sketch recognition (i.e., continuous sketch recognition).
- •
Using two databases, we show that fusing image-based and temporal features yields better recognition rates compared to using either method alone. These results not only show the virtues of combining multiple recognition methods, but are also the first to show the complementary nature of image and time-based methods for full sketch recognition, which has long been suggested, but never supported by data.
In the rest of this paper, we first describe an image-based recognition algorithm that uses Zernike moments, and a time-based sketch recognition algorithm that uses Hidden Markov Models. In Section 4, we describe five methods for classifier fusion that are subsequently used for fusing image-based and time-based features for isolated symbol recognition. In Section 5, we describe how isolated symbol recognizers can be combined using dynamic programming to simultaneously segment and recognize entire sketches with many symbols. In the evaluation section, first we evaluate the performance of the five classifier fusion methods for isolated symbol recognition using two different databases. Then, we report recognition accuracies of image-based, time-based, and combined recognition methods for recognizing full sketches. We also report the runtime for our recognition and preprocessing algorithms. We conclude with related work and a summary of future research directions.
Section snippets
Image-based recognition method: Zernike moments
Although there are many image-based recognition methods, we adopt one based on Zernike moments, which was demonstrated as a simple and effective method for sketch recognition. Our use of Zernike moment features for sketch recognition is based on work by Hse et al. [9], and we refer the reader to this work for the details of feature extraction using Zernike features.
Zernike moments work with bitmap image representations, where the input is represented by a function f(x,y), which is equal to 1 if
Time-based recognition method: hidden Markov model (HMM)
Existing time-based methods use either HMMs or Dynamic Bayesian Networks, which generalize HMMs. For our purposes, both approaches are essentially equivalent, hence we use an HMM-based approach.
Fusion of the Zernike moments and HMM methods
The goal of combining multiple information sources is to achieve a superior performance by exploiting the redundant and complementary nature of the information provided by the different sources of information.
The ways in which the information sources can be combined vary depending on the context of the work, and various communities have come up with different taxonomies that emphasize different aspects of the combination operation. For example, in the context of machine learning, one can talk
Segmentation and recognition of full diagrams
A major problem in sketch recognition is segmentation: partitioning a sketch into groups of ink that represent individual symbols. Knowing the correct segmentation of a sketch immensely simplifies recognition. Therefore, some methods force the users to explicitly specify when they finish drawing each symbol making the system less usable, which defeats the main motivation behind sketch-based interfaces.
One could argue that sketch segmentation should not be considered separate from individual
Evaluation of the proposed methods
Our evaluation included measuring the accuracy of isolated object and complete sketch recognition. We also measured the time required for processing each additional stroke, including the time required to fragment the stroke, and the amount of time required for updating the recognition and segmentation hypotheses for each added stroke.
Related work
In this paper, we fused an image-based method with a time-based method in an attempt to combine the knowledge of how objects look (appearance) with the knowledge of how they are drawn (stroke orderings). We focused on appearance and stroke orderings because they not only represent conceptually different aspects of sketching, but they have also been shown to aid recognition individually. However, our combination method is general and any method producing probabilistic confidence values can be
Future work
The HMM-based method can be improved using sophisticated features computed from ink groups or image patches. Hence features do not necessarily have to be primitive-based. For example, carefully designed features based on shape contexts [51], congealing [52] or other local descriptors can be used. Furthermore feature engineering and feature selection techniques, which are outside the scope of our contribution here, can be used to boost accuracy.
As discussed in Section 3.4, one drawback of the
Summary
In this paper, we presented a framework for fusing an image-based method with a time-based method in an attempt to combine the knowledge of how objects look (image data) with the knowledge of how they are drawn (temporal data). This is unlike most existing approaches, which focus on one kind of feature only. We presented evaluation results for two databases illustrating that combining classifiers yields higher recognition accuracies, and confirmed the complementary nature of image-based and
Tevfik Metin Sezgin graduated summa cum laude with Honors from Syracuse University in 1999. He completed his MS in the Artificial Intelligence Laboratory at Massachusetts Institute of Technology in 2001. He received his PhD in 2006 from Massachusetts Institute of Technology. He subsequently moved to University of Cambridge, and joined the Rainbow group at the University of Cambridge Computer Laboratory as a Postdoctoral Research Associate. Dr. Sezgin is currently an Assistant Professor in the
References (54)
- et al.
Iconic and multi-stroke gesture recognition
Pattern Recognition
(2009) - et al.
LADDER, a sketching language for user interface developers
Computers and Graphics
(2005) - et al.
Multimodal human–computer interaction: a survey
Computer Vision and Image Understanding
(2007) - et al.
Adaptive binary tree for fast SVM multiclass classification
Neurocomputing
(2009) - et al.
Sketch recognition in interspersed drawings using time-based graphical models
Computers and Graphics
(2008) - et al.
Combining geometry and domain knowledge to interpret hand-drawn diagrams
Computers and Graphics
(2005) - et al.
On-line hand-drawn electric circuit diagram recognition using 2D dynamic programming
Pattern Recognition
(2009) - et al.
Combining statistical and structural approaches for handwritten character description
Image and Vision Computing
(1999) - et al.
Combining diverse on-line and off-line systems for handwritten text line recognition
Pattern Recognition
(2009) Specifying gestures by example
SIGGRAPH Computer Graphics
(1991)
SketchREAD: a multi-domain sketch recognition engine
Eager interpretation of on-line hand-drawn structured documents: the DALI methodology
Pattern Recognition
HMM-based efficient sketch recognition
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE
Fundamentals of Speech Recognition
On combining classifiers
IEEE Transactions on Pattern Analysis and Machine Intelligence
Methods of combining multiple classifiers and their applications to handwriting recognition
IEEE Transactions on Systems, Man and Cybernetics
Multiple classifier decision combination strategies for character recognition: a review
International Journal on Document Analysis and Recognition
Cited by (0)
Tevfik Metin Sezgin graduated summa cum laude with Honors from Syracuse University in 1999. He completed his MS in the Artificial Intelligence Laboratory at Massachusetts Institute of Technology in 2001. He received his PhD in 2006 from Massachusetts Institute of Technology. He subsequently moved to University of Cambridge, and joined the Rainbow group at the University of Cambridge Computer Laboratory as a Postdoctoral Research Associate. Dr. Sezgin is currently an Assistant Professor in the College of Engineering at Koç University, Istanbul. His research interests include intelligent human–computer interfaces, multimodal sensor fusion, and HCI applications of machine learning. Dr. Sezgin is particularly interested in applications of these technologies in building intelligent pen-based interfaces. He currently leads the Intelligent User Interfaces at Koç University.
- 1
Part of this work was completed when the authors were at the University of Cambridge.