Elsevier

Computers & Graphics

Volume 42, August 2014, Pages 14-30
Computers & Graphics

Technical Section
An efficient, classification-based approach for grouping pen strokes into objects

https://doi.org/10.1016/j.cag.2014.03.003Get rights and content

Highlights

  • Developed a novel 2-step algorithm for grouping pen strokes.

  • Classifying single pen strokes first makes grouping more efficient and effective.

  • We have developed an accurate multi-class, single-stroke classifier.

  • Our approach to grouping is unique in its formulation as a classification task.

  • The 2-step process enables a simple classifier to achieve high grouping accuracy.

Abstract

Objects in freely drawn sketches often have no spatial or temporal separation, making object identification difficult. We present a two-step stroke-grouping algorithm that first classifies individual strokes according to the type of object to which they belong, and then groups strokes with like classifications into clusters representing individual objects. The first step facilitates clustering by naturally separating the strokes, and both steps fluidly integrate spatial and temporal information. Our single-stroke classifier has comparable accuracy to an existing state-of-the-art single-stroke classifier on text vs. non-text classification, and is significantly more efficient. Furthermore, our classifier is also suitable for applications with more than two classes of strokes. Our approach to grouping is unique in its formulation as an efficient classification task rather than, for example, an expensive search task. In experiments on several types of sketches, our grouping method performed accurately, correctly grouping up to 92% of the ink, with up to 79% of the shapes being perfectly clustered.

Introduction

One of the most difficult challenges in sketch understanding is clustering the strokes into distinct objects. Often there are no clear spatial or temporal boundaries between the objects in a freely drawn sketch, and the primary clue that a group of strokes is supposed to be grouped together is that they form a meaningful shape. This is the inherent chicken-and-egg problem for sketch recognition — shapes cannot be recognized until their strokes have been grouped together, but the strokes cannot be grouped until the shapes have been recognized.

The clustering problem is so challenging that many existing recognition systems avoid it by placing constraints on the way users draw. For example, some systems require the user to provide explicit cues, such as button clicks or pauses, to demarcate each object (e.g., [14]); others require each symbol to be drawn with a single-stroke (e.g., [26], [33], [25], [13]) or a temporally contiguous sequence of strokes (e.g., [8]). While these constraints aid recognition, they do not generally match the way people naturally draw [2].

To solve the problem of simultaneous grouping and recognition, Kara et al.׳s [17] mark-group-recognize technique relies on “marker symbols” — symbols that can be accurately and inexpensively extracted from a continuous stream of pen strokes, and that tend to separate the remaining symbols. While this approach is efficient, it is limited to domains that have effective markers.

Other recent work has focused on the problem of single-stroke classification. Jain et al. [16] use stroke length and stroke curvature to distinguish text from non-text in online documents. Qi et al. [23] present a method for using conditional random fields to classify strokes in organizational chart diagrams as either connectors or boxes. Addressing a similar problem, Bishop et al. [5], Patel et al. [21], and Bhat and Hammond [4] present methods that integrate shape and temporal information for classifying individual strokes as either text or drawing strokes. Wang et al. [32] improve on Bishop et al.׳s method. Indermühle et al. [15] distinguish text from non-text using a top-down approach that segments a document into regions of text and non-text. They also developed a bottom-up approach that examines the neighborhoods of individual pixels and considers connected components. More recently, Delaye and Liu [7] used conditional-random fields to jointly model local, spatial, and temporal information to achieve accurate discrimination of text from non-text. Finally, Blagojevic and Plimmer [6] present an approach in which pen strokes are characterized with quantitative features and a classifier is used to distinguish text from non-text.

The goal of most previous single-stroke classification techniques is to identify the text strokes so they can be sent to a character recognizer, while the shape strokes (i.e., strokes comprising graphic objects) are left ungrouped. Our approach goes further, grouping the shape strokes as well. We aim to achieve accurate grouping in domains for which marker symbols do not exist and single-stroke classification is more involved. We present a two-stage clustering algorithm that first classifies pen strokes into different classes of objects, and then groups strokes with like classifications into clusters representing individual objects. Fig. 1 illustrates our approach. In the first step of processing (Fig. 1a), individual strokes are classified as belonging to text, gate, or wire objects. This classification spatially and temporally separates individual objects of the same class — as in Fig. 1b, which shows only strokes classified as gates — making the strokes easier to cluster. Once our technique has grouped the strokes in a sketch into distinct objects, there are many existing sketch recognizers that can be used to recognize them (e.g., [18], [20], [14], [3], [25]).

While our grouping technique is the primary contribution of the present work, we also make important contributions to stroke-level classification. First, while previous approaches to single-stroke classification were applied only to two-way classification (usually text vs. non-text), our approach is accurate on both three-way and four-way classification of text and different types of graphics. Second, for two-way classification, our approach achieves accuracy comparable to that of a state-of-the-art approach, and is significantly more efficient. Third, the separation between objects that results from our single-stroke classification technique enables our novel formulation of the grouping problem as an inexpensive classification task.

Section snippets

Related work

In addition to the work described above, a growing body of free-sketch recognition research involves simultaneous stroke grouping and symbol recognition. Some grouping techniques rely directly on geometric properties of the strokes. For example, Saund et al. [28] decompose a sketch into sequences of contiguous line segments corresponding to line art, and “blobs” of dense ink corresponding to text. They use Gestalt principles to group these objects into larger structures. The approach is

Single-stroke classification

Our goal at this stage is to classify strokes into general categories to facilitate stroke grouping. We use a feature-based machine-learning approach with a standard classification algorithm and a feature set that extends the set presented in Patel et al. [21]. Our classifier uses AdaBoost with decision trees and is trained using WEKA [9]. Specifically the classifier is AdaBoostM1 using 10 iterations, a seed of 1, no resampling, and a weight threshold of 100. The base classifier is a pruned J48

Grouping

Classifying the individual pen strokes reduces the complexity of stroke grouping by decomposing the problem into smaller, easier problems, one for each class. However, even for the strokes in a single class, brute force grouping techniques, such as attempting to recognize all combinations of strokes, are still too expensive for interactive systems. Instead, we use a classifier to determine if each pair of strokes of the same class should be joined to form a cluster. If a stroke is joined with

Datasets

We tested both our single-stroke and clustering classifiers on freely drawn sketches in three different domains: digital circuits (Table 2), family trees (Table 3), and solutions to statics problems (Table 4, Table 5). We collected eight digital circuit sketches from each of the 24 subjects for a total of 192 sketches. Half of these sketches were copied from a picture of a circuit, while the rest were synthesized from a logical equation. Additionally, half of the sketches were drawn on a Tablet

Single-stroke classification results

To perform grouping, we classify strokes into three or more classes. However, to benchmark our classifier, here we restrict it to two classes, text vs. non-text, and compare it to three state-of-the-art methods: the Entropy method from [4], Microsoft InkAnalyzer®, which is a commercial product, and Blagojevic and Plimmer׳s Divider [6]. For comparisons with the entropy method and InkAnalyzer, we use all four of our sketch domains including digital circuits, family trees, and statics solutions

Grouping results

We trained all of the classifiers in a user-holdout fashion in which data from one subject was selected for testing and data from the other subjects was used for training. We trained classifiers for each of the four domains separately. Results are averaged across subjects.

The Inductive Pairwise Classifiers were trained with AdaBoostM1 and J48 decision trees in WEKA, using the same parameters as used for single-stroke classification. For the Thresholded Pairwise Classifier, training consists of

Single-stroke classification

Single-stroke classification is an important part of our method. It simplifies pairwise comparisons, leading to more efficient and accurate grouping. Our classifier uses adaptive boosting (AdaBoost) with decision trees and a set of features that extends previous efforts at single-stroke classification [21]. For the task of classifying text vs. non-text, our method performed better than the entropy method described in [4] and Microsoft׳s InkAnalyzer®.

Additionally, our approach had comparable

Conclusion

Grouping strokes in freely drawn sketches is so challenging that few recognition systems attempt it. Our work is a significant step toward solving this important problem. We have shown that separating pen strokes into different classes can make the grouping process both more efficient and more effective. We achieve the separation using an accurate multi-way single-stroke classifier.

Previous approaches to single-stroke classification were applied only to two-way classification (usually text vs.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant nos. 0729422 & 0735695.

References (34)

  • L. Gennari et al.

    Combining geometry and domain knowledge to interpret hand-drawn diagrams

    Comput Graph

    (2005)
  • Mark Hall et al.

    The weka data mining softwarean update

    ACM SIGKDD Explorations Newsl

    (2009)
  • Hammond Tracy, Davis Randall. Tahuti: a geometrical sketch recognition system for UML class diagrams. In: AAAI spring...
  • Herold James, Stahovich Thomas F. Classyseg: a machine learning approach to automatic stroke segmentation. In:...
  • Herold James, Stahovich Thomas F. The one cent recognizer: a fast, accurate, and easy-to-implement handwritten gesture...
  • H. Hse et al.

    Recognition and beautification of multi-stroke symbols in digital ink

    Comput Graph

    (2005)
  • Indermühle Emanuel, Bunke Horst, Shafait Faisal, Breuel Thomas M. Text versus non-text distinction in online...
  • Cited by (14)

    • Enabling data mining of handwritten coursework

      2016, Computers and Graphics (Pergamon)
      Citation Excerpt :

      To provide a reference, we combined our features with feature set S27. Table 8 summarizes the S27 features; for more details, see [46]. Table 9 lists the results of the information gain analysis.

    • A flexible framework for online document segmentation by pairwise stroke distance learning

      2015, Pattern Recognition
      Citation Excerpt :

      State-of-the-art flowchart recognition systems only report the proportion of objects correctly segmented and labeled (see last columns of Table 6), but the comparison shows that without domain knowledge and without support from recognition modules, our system still provides good segmentation for these sketches. As another element of comparison, on various comparable sketching applications (electrical diagrams, family trees, engineering notes), the segmentation method of Stahovich et al. [19] is reported to correctly extract between 61.6% and 82.4% of the segments. In order to evaluate the robustness of the feature space optimized for the FlowchartDB documents, we perform a segmentation experiment with the same 20 features on the FADB dataset.

    • Local context-based recognition of sketched diagrams

      2014, Journal of Visual Languages and Computing
    • Finding and Segmenting Mathematical Equations in Students' Online Handwritten Assignments

      2020, Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
    View all citing articles on Scopus

    This article was recommended for publication by Beryl Plimmer.

    View full text