Abstract
Gestural air-writing involves the process of writing continuous characters or words in free space using hand or finger motion. It differs from traditional pen-based writing from the fact that it does not contain delimiting points which helps in demarcation of valid writing segments. Thus, in gestural air-writing, detection of meaningful writing events from a continuous gestural sequence containing irrelevant writing movements is an intricate task which needs special attention. This paper presents an automatic method of gesture spotting and segmentation which identifies the meaningful air-written character segments confined within a continuous character pattern using a hybrid spatiotemporal and statistical feature set. A sliding window-based approach is employed for extracting the writing events from a continuous stream of hand-motion data, suppressing the superfluous idle data points. Consecutive writing events are then categorized into valid character segments and redundant ones. The relative performance of the proposed system is examined by taking various Assamese characters into consideration. Experimental results reveal that the proposed model achieves an overall segment error rate of 1.31%.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Human Computer Interaction (HCI) platforms have attained tremendous demand in today’s world. In contrast to conventional human-machine interconnecting tools, gestural air-writing nowadays serves as an important substituent for natural and effortless HCI. It allows users to interact intuitively with computing devices by writing freely in an unrestricted and comfortable way [1].
However, gestural air-writing is different from conventional pen-based writing or 2D space handwriting in that the stroke flow is continuous and there are no intermediate pauses in between consecutive characters as well as different segments of a character [2]. For example, an in-air handwritten Assamese character is generally completed in a single continuous stroke, and some intermediate repositioning movements connect the adjacent strokes of the character. The connecting links which occurs in between adjacent characters as well as within individual characters are termed as ligatures or movement epentheses. The presence of these ligatures makes the task of writing event detection and segmentation a challenging one. Moreover, these irrelevant connecting motions are diverse and vary widely depending upon users and their speed of articulation [1]. Therefore, the primary objective of this work is to implement a forward gestural character spotting and segmentation system which is user-independent and which shall form an important constituent for effective HCI.
Being a popular and intriguing research topic of current time, several in-air handwritten gesture spotting and segmentation techniques has been formulated by combining different algorithms. Chen et al. [3], Amma et al. [4] and Schick et al. [5] have utilized HMMs for modeling separate characters after which word recognition is performed by concatenating character HMMs with a repositioning HMM for describing the translocation that occurs between individual characters. Although HMM is a well-established method for modeling continuous characters, however with the increase in number of character patterns the model training becomes extensive as individual HMMs are constructed per label. Again, certain studies have considered alignment-based approaches for gesture spotting and recognition. For example, Frovola and Berman [6] and Jin et al. [7] have used most probable longest common sub-sequence (MPLCS) and dynamic time warping (DTW) approaches for segmenting out the potential character segments from gesture streams by measuring the similarity between extracted features of hand gesture template and a predefined template in the database. Alignment-based methods become computationally complex when the range of patterns increases, as these require warping of the temporal series with each and every template sequence in the database. Instead of HMMs or temporal alignment based methods for gestural sequence modeling, the proposed work achieves the same task by employing a distinctive feature set which determines the terminal points of character segments, extracts them and hence models the transience in their shapes, thus avoiding the need of extensive training or matching. Also, from survey it is seen that many systems have adopted preambles like manual gestures, physical buttons and other explicit signals for depicting the beginning and terminating points of character fragments in a temporal character stream. Murata and Shin [8] and Ayachi et al. [9] have utilized gestural commands, while Amma et al. [4] have used manual keys for temporal character segmentation. In contrast, our proposed method does not necessitate explicit preambles to specify the writing events in a continuous character sequence.
More specifically, the main contribution of this paper is to develop an efficient framework for gestural character spotting and segmentation by incorporating a two-stage approach. In the first stage, a window-based scheme is adopted to observe the fluctuations of a kinematic feature set and hence to spot the start and end points of different character segments existing inside a character pattern. In the second stage, spatiotemporal and statistical heuristics are applied to categorize the character segments obtained between terminal points as valid and ligature patterns.
The rest of the paper is organized as follows. In Sect. 2, the proposed methodology of air-written gesture spotting and segmentation is presented with elaborate description of all the individual processes involved. Section 3 discusses the results obtained through experimental evaluation on a large dataset with dynamic variations. Finally Sect. 4 concludes the paper and highlights some future scope of the present work.
2 Proposed System
The overall schematic block diagram of the proposed gestural air-writing detection and segmentation framework for spotting and identifying the valid strokes from a character sequence is shown in Fig. 1. The working of the first two modules of the proposed system i.e. hand segmentation and hand tracking are elaborated in [10].
This work mainly concentrates on the air-writing spotting and segmentation tasks. The detailed working methodology of these processes is described as follows.
2.1 Gestural Air-Writing Spotting and Segmentation
The process of air-writing detection and segmentation requires recognizing the relevant stroke segments in a continuous character pattern. So for automatic spotting and segmentation of air-writing, we have amalgamated certain distinctive spatiotemporal features derived from hand tracking signals, and have employed a window-based approach for modeling the statistical variability in character patterns. The detailed functioning of the air-writing spotting and segmentation modules are described in the following sections.
Air-Writing Spotting.
The workflow of the proposed air-writing detection module is shown in Fig. 2. Firstly, each sample of the hand trajectory is represented by positional coordinates pi = (xi, yi). Then, a set of motion features is derived and a sliding window is superimposed over this motion data to look for the presence of a writing event. The general characteristics of writing events include acute changes in writing direction and high average velocity in comparison to idling events. So, a window (of length L) is positioned at each designated frame by sliding it through a shift width (w), and the spotting algorithm observes the desired properties within these windows to determine the boundary points (frames) of all the writing segments inside a character pattern. Here, we have empirically selected window length of 30 frames (2 s) and shift width of 2 frames (0.13 s).
On the selected sliding windows, the following motion features are estimated:
-
Average velocity
$$ \overline{v} = \frac{1}{L}\sum\limits_{i = 1}^{L} {\left| {v_{i} } \right|} $$(1) -
Average angular change in orientation
$$ \overline{{{\Delta \theta }}} \text{ = }\sum\limits_{{\text{i = 1}}}^{\text{L}} {{\Delta \theta }_{\text{i}} } $$(2)
where, L indicates the number of samples inside the window, vi and Δθi denotes the velocity and angular change in orientation for a sample. These features are then normalized to reduce the effect of variations in writing style and speed. The normalized features are given by
where, vmax and \( \Delta \theta_{\hbox{max} } \) denotes the maximum velocity and angular change obtained within an observation window.
Practically, it is observed that while air-writing a character segment these feature values tend to increase, while towards the end of a writing segment the values of these parameters decreases and reaches a minimum value. However, since there are large number of character patterns having wide variations in writing speed and shape, so global threshold values for V′ and Δθ′ shall not be fruitful for demarcating the start and end points of all the character patterns in the database. So, the computed features (V′ and Δθ′) for a sample are compared with adaptively computed thresholds (TV and Tθ respectively) determined using K-Means clustering algorithm. Hence, a trajectory sample is designated as start point (S) or end point (E) of a writing segment based on whether the kinematic parameters (V′ and Δθ′) are greater or less compared to their respective adaptive thresholds (TV and Tθ). The resulting end points signify non-writing events and are thus eliminated from the character stream. Finally, the handwriting (HW) segments procured between each pair of boundary points are separated out for further processing.
Algorithm for determination of adaptive thresholds (TV and Tθ) using K-Means clustering approach:
-
1.
Choose a window Win having length of 6 samples, and consider the feature values V′i and Δθ′i (i \( \in \) 1 to 6) inside this window.
-
2.
Take two clusters CL1 and CL2 and initially assume their cluster centers (C1 and C2) as the 2nd and 4th feature points (i.e. V′2, Δθ′2 and V′4, Δθ′4).
-
3.
Calculate the distance of all the feature points (V′i and Δθ′i) within Win from the cluster centers (C1 and C2) using Euclidean distance.
-
4.
Assign each of the feature points to the cluster CL1 or CL2 with nearest cluster center.
-
5.
Update the cluster centers (C1 and C2) by taking the mean of all the feature points within the respective clusters (CL1 and CL2).
-
6.
Repeat steps 3, 4 and 5 until the cluster centers C1 and C2 do not change.
-
7.
Compute adaptive thresholds (TV and Tθ) by taking the mean of the converged cluster centers obtained in step 6.
Air-Writing Segmentation.
After successful extraction of HW segments, the next task is to classify them into valid and ligature patterns so that the ligatures can be suppressed and the valid patterns can be passed on for recognition of the overall character pattern. The flow diagram of the gestural air-writing segmentation module is shown in Fig. 3.
In our work, we are concerned with modeling the variations which occurs in the shape of Assamese characters. To consummate this task, a composite feature set is proposed comprising of kurtosis, entropy and normalized length of HW segment. These features are described as follows.
-
Kurtosis (K): The kurtosis of a distribution is given by the ratio of fourth moment (m4) and second moment (m2) squared [11]
$$ K(x) = \frac{{m_{4} }}{{m_{2}^{2} }} = \frac{{m_{4} }}{{\left( {\sigma^{2} } \right)^{2} }} = \frac{1}{N}\sum {\left( {\frac{x - \mu }{\sigma }} \right)^{4} = E(z^{4} )} $$(5)
where z (= x − µ/σ) is standardized value, x represents the observation values, µ is the mean, σ is the standard deviation and E() is the expectation operator. The K value for a normal distribution is 3 and the K value for all other distributions reflects the variations of observations from a normal distribution. Equation (5) has a simpler interpretation and is given as [11]
where var() is the variance. Accordingly, kurtosis is inferred as the extent to which observations are dispersed away from the shoulders of a distribution (z2 = 1, i.e. z = ± 1).
In this study, we take scaled inverse kurtosis (K′) as the first feature for character segmentation. It is given as
where c is a scaling constant empirically taken as 6, and K(x) is the kurtosis determined using Eq. (6). This implies that, if the observation values x of a character segment are more dispersed from the shoulders, then K < 3 and hence the K′ value will be more compared to the segments exhibiting less dispersion which have K ≥ 3.
-
Entropy (H): Entropy gives a measure of uncertainty of probability distribution and is given by [12]
$$ H = - \sum\limits_{i} {P_{i} } \log P_{i} $$(8)where Pi denotes the probability of observation values.
-
Normalized segment length (L′): The normalized length of HW segment is given by
$$ L^{'} = \frac{\left\| d \right\|}{h} $$(9)where d is the total distance traversed by the HW segment and h is the height of a minimum-area bounding rectangle [13] encompassing the segment.
So, in this procedure we firstly approximate every consecutive “n” points of the HW segment with a least square (LS) fitting ellipse [13] and compute its angular eccentricity values, which describe the variations in shape of the HW segments. The angular eccentricity of a point P(x,y) on an ellipse with semi-major axis of length a and semi-minor axis of length b is given by
From the α profile of a HW segment, the inverse kurtosis value (K′(α)) and entropy of its probability distribution (H(α)) is calculated using Eqs. (7) and (8). Simultaneously, the normalized length of the HW segment (L′) is computed using Eq. (9). Now, in gestural air-writing of Assamese characters the valid character segments shall have more variations in α value than the ligatures. Ligatures are generally simple and have less variability. So, according to the concept of kurtosis, the valid HW segments will have K′ value more compared to the ligature segments. Concurrently, the valid HW segments will have higher amount of uncertainty in α values i.e. more information content, and hence will have higher entropy value than ligature segments. Further, the ligature segments being simple in nature will inherently have shorter normalized length than valid segments. Here, we have considered combination of three features (K′, H and L′) for character segmentation, because in certain cases if a single feature fails to capture the distinguishing traits of valid and ligature patterns, then the score from the remaining features shall aid in correct character segmentation. Thus, the final score of a HW segment is given by the weighted sum of kurtosis, entropy and signal length.
Here, the weights ω1, ω2 and ω3 are taken uniformly as 1.
Thus, the final score of all the HW segments inside a character pattern are computed using Eq. (11), and the decision threshold (TD) is determined by taking the average of the first two smallest scores. Suppose, C = {C1, C2, C3… Cn} is a character pattern consisting of n HW segments, then a character segment is classified as valid (‘1’) or ligature pattern (‘0’) using the following decision equation
In case of characters consisting of only one HW segment, i.e. without any connecting links, only the kurtosis value (K) computed using Eq. (6) is taken as discriminating feature for classifying it into valid or ligature pattern. If the K value is less than 3, it implies that the character segment has wide variations and hence it will be designated as valid pattern, conversely it will be classified as ligature pattern.
3 Experimental Results and Discussions
We evaluate the performance of our proposed air-writing spotting and segmentation model on an Assamese character dataset. The experimental database consists of 46 Assamese characters, comprising of 11 vowels, 10 numerals and the first 25 consonants. Each character is written 20 times by the 2 subjects, thus forming an overall mixed dataset of 1840 patterns. Thus, the database encompasses dynamic variations in trajectory shape and duration of the character patterns. The video corpus of gestural characters is generated using a webcam with frame rate of 15 frames/s and resolution of 640 × 360. The following sections describe the results derived from our gestural air-writing detection and segmentation modules.
3.1 Air-Writing Spotting Results
In this section, we depict the profile distributions for the feature set (V′ and Δθ′) employed for gesture spotting and the corresponding writing event detection. Figure 4(a) and (b) shows the distributions of normalized velocity (V′) and relative angular change in orientation (Δθ′) used for gesture spotting of a continuous character “ ”. It is observed that during the commencement and eventual writing of a HW segment the feature values increase, while towards the end of the writing event these values tend to decline and attains a minimum point.
Figure 5 illustrates a few occurrences of the air-writing path of character pattern “ ” along with the start and end points (S and E) of writing segments obtained by employing the proposed feature set and window-based analysis. So, at the end of gesture spotting it is seen that we obtain three HW segments for the pattern “ ”, which has to be classified as valid and ligature in the next stage.
3.2 Air-Writing Segmentation Results
Figure 6 shows the angular eccentricity (α) profile which outlines the variations in shape of all the three HW segments of the character pattern “ ”.
Table 1 highlights the features values (K′, H and L′) for all the three HW segments, their corresponding scores and the final decision. From Fig. 6 and Table 1, it is seen that the valid segments (1st and 3rd) have more variations in α values compared to ligature segments, and hence these have higher inverse kurtosis value. Similarly, there is more uncertainty in α values for valid segments, and hence it has more entropy than ligatures. Further, it is observed that valid HW segments have greater normalized length than ligature patterns.
3.3 Performance Evaluation
We evaluate the performance of our proposed system by calculating the overall Segment Error Rate (SER). For a character pattern Ci, the SER is given by
where Nc is the total number of HW segments in the character pattern, E is the total number of segment errors in the character pattern. It is given by the sum of substitution (S) and deletion (D) errors [14]. Therefore, the overall SER is given by
where N indicates the total number of character patterns in the database.
The composite SER of our proposed system with and without score fusion is presented in Table 2.
By comparing the results obtained using score fusion to the results acquired considering individual features, it is seen that there is a considerable improvement in performance while considering combined features. This is because, in some characters there are certain valid character segments which have very less variations, and in such cases the system will fail to resolve the ambiguities and falsely interpret it as ligature if only individual features are taken into consideration. The idea of aggregating three features into one platform helps in producing comparatively lesser SER than considering single features for classification.
4 Conclusion
In this paper, we have formulated a vision-based forward air-writing detection and segmentation mechanism for determining the principle character segments from continuous character patterns. For gesture spotting and character segmentation, we have implemented a sliding window-based approach and applied a mixed feature set by integrating spatiotemporal and statistical parameters. Experimental evaluation on continuous Assamese character patterns reveal that our proposed technique offers an overall segment error rate of around 1.3%, thereby demonstrating its effectiveness in extricating out legitimate portions from a character sequence. In this study, the efficiency of the spotting and segmentation paradigm has been validated for the vowels, numerals and a few consonants of Assamese vocabulary. Later, the remaining consonants and a few words shall be taken into consideration, and the database shall be upgraded by including more samples from different participants. In future work, a challenging task related to this field of study deals with the recognition of certain Assamese characters having similar trajectory pattern. In real world scenarios, the proposed system in conjunction with a recognition module can function as a complementary modality for applications such as communicative aid for hearing-impaired people, smart media controller, game controllers and yet more.
References
Choudhury, A., Sarma, K.K.: Visual gesture-based character recognition systems for design of assistive technologies for people with special necessities. In: Handmade Teaching Materials for Students with Disabilities, pp. 294–315. IGI Global, Hershey (2019)
Agarwal, C., Dogra, D.P., Saini, R., Roy, P.P.: Segmentation and recognition of text written in 3D using leap motion interface. In: 3rd Proceedings on IAPR Asian Conference on Pattern Recognition (ACPR), pp. 539–543. IEEE, Kuala Lumpur (2015)
Chen, M., AlRegib, G., Juang, B.-H.: Air-writing recognition—Part I: modeling and recognition of characters, words, and connecting motions. IEEE Trans. Hum.-Mach. Syst. 46(3), 403–413 (2016)
Amma, C., Gehrig, D., Schultz, T.: Airwriting recognition using wearable motion sensors. In: Proceedings of Augmented Human International Conference. ACM (2010)
Schick, A., Morlock, D., Amma, C., Schultz, T., Stiefelhagen, R.: Vision-based handwriting recognition for unrestricted text input in mid-air. In: Proceedings of the 14th International Conference on Multimodal Interaction, pp. 217–220. ACM (2012)
Frolova, D., Stern, H., Berman, S.: Most probable longest common subsequence for recognition of gesture character input. IEEE Trans. Cybern. 43(3), 871–880 (2013)
Jin, L., Yang, D., Zhen, L.-X., Huang, J.-C.: A novel vision-based finger-writing character recognition system. J. Circ. Syst. Comput. 16(3), 421–436 (2007)
Murata, T., Shin, J.: Hand gesture and character recognition based on kinect sensor. Int. J. Distrib. Sens. Netw. 10(7), 278460 (2014)
Ayachi, N., Kejriwal, P., Kane, L., Khanna, P.: Analysis of the hand motion trajectories for recognition of air-drawn symbols. In: Proceedings of 5th International Conference on Communication Systems and Network Technologies, pp. 505–510. IEEE, Gwalior (2015)
Choudhury, A., Sarma, K.K.: A novel approach for gesture spotting in an Assamese gesture-based character recognition system using a unique geometrical feature set. In: Proceedings of 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 98–104. IEEE, Noida (2018)
Liang, Z., Wei, J., Zhao, J., Liu, H., Li, B., Shen, J., Zheng, C.: The statistical meaning of kurtosis and its new application to identification of persons based on seismic signals. Sensors 8(8), 5106–5119 (2018)
Wang, Q.A.: Probability distribution and entropy as a measure of uncertainty. J. Phys. A: Math. Theor. 41(6), 065004 (2008)
Bradski, G., Kaehler, A.: Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media Inc., Sebastopol (2008)
Elmezain, M., Al-Hamadi, A., Sadek, S., Michaelis, B.: Robust methods for hand gesture spotting and recognition using hidden markov models and conditional random fields. In: Proceedings of the 10th International Symposium on Signal Processing and Information Technology, pp. 131–136. IEEE, Luxor (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Choudhury, A., Sarma, K.K. (2019). A Two Stage Framework for Detection and Segmentation of Writing Events in Air-Written Assamese Characters. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D., Bora, P., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science(), vol 11941. Springer, Cham. https://doi.org/10.1007/978-3-030-34869-4_63
Download citation
DOI: https://doi.org/10.1007/978-3-030-34869-4_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34868-7
Online ISBN: 978-3-030-34869-4
eBook Packages: Computer ScienceComputer Science (R0)