Abstract
Handwriting analysis is a standard forensics practice to assess the identity of a person from written documents. Forensic document examiners consider different features related to the motion and pressure of the hand, as well as the shape of the different characters and the spatial relationship among them. While examiners rely on standard protocols, documents are generally processed manually. This requires a significant amount of time and may lead to a subjective analysis which is difficult to replicate. Automated forensics tools to perform handwriting analysis from scanned documents are desirable to help examiners extract information in a more objective and replicable way. To this aim, in this paper we present GRAPHJ, a forensics tool for handwriting analysis. The tool has been designed to implement the forensics protocol employed by the “Reparto Investigazioni Scientifiche” (RIS) of Carabinieri. GRAPHJ allows the examiner to (1) automatically detect text lines as well as the different words within the document; (2) search for a specific character and detect its occurrences in the handwritten text; (3) measure different quantities related to the detected elements (e.g., character height and width) and (4) generate a report containing measurements, statistics and all parameters used during the analysis. The generation of the report helps to improve the repeatability of the whole process. We also present a set of experiments to assess the compliance of GRAPHJ with respect to conventional handwriting analysis methods. Given a set of handwritten documents, the experiments compare measurements and statistics produced by GRAPHJ to those obtained by an expert forensics examiner performing classic manual analysis.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Forensic handwriting examination is the analytical process of detecting regularities and singularities of a handwritten text to assess the identity of the writer [1]. The analysis focuses on recognizing the fundamental shapes of the stroke, as well as the relative positions and sizes of letters and words. For handwriting analysis methods it is important to adopt a quantitative approach in order to limit any subjective evaluation due to the personal experience of the examiner.
With this goal in mind, many forensics experts make use of a graphometric-based approach which takes into account the different quantifiable features of handwritten text. Many authors have considered the fundamental principles and techniques of document examination [2,3,4,5]. However, handwriting analysis should be robust to variations in the writing process due to several intrinsic and extrinsic causes, such as different writing speeds, dissimulation, tiredness and available space. In this regard, it is well known that the study of character heights is valuable to identify the range of variability of the writer [6]. For instance, Morris [7] underlined the importance of the analysis of dimensional parameters and the comparison of absolute and relative quantities. This kind of analysis, considering speed, slope and style, can be used to identify an attempt of forgery. Hayes [1] shown that dimensions reflect the range of finger and hand movements which are characteristic of individual expression, e.g., some people produce an extremely small writing while others a taller one. Kelly and Lindblom [8] analyzed the ratio of lowercase to uppercase letters, showing that this value can be useful to identify the writer.
In this work, we present GRAPHJ, a useful tool for handwriting analysis. The proposed tool implements several algorithms to perform the analysis of handwritten documents which are currently considered in the protocol used by RIS - Carabinieri in Italy. For instance, GRAPHJ allows to detect text lines and words in the document and to search for all occurrences of a specific character. GRAPHJ is also designed to simplify and improve the documentation of the analysis process, by generating a report containing statistics and measurements. Our approach is similar to the one of Fabiańska et al. [9], but our tool allows for automated detection of elements, in order to minimize the amount of required manual intervention. Aimed at assisting the examiner in analyzing digitalized handwritten documents, GRAPHJ can be considered a multimedia forensics tool [10]. It should be noted that our approach is different from handwriting recognition [11], since we are not interested in digitalizing the analyzed text. To validate GRAPHJ, we performed experiments comparing the produced measurements and statistics to those obtained with classic manual analysis performed by a forensics expert. A video demo of GRAPHJ is reported at our web page http://iplab.dmi.unict.it/graphj/.
2 GRAPHJ
We developed GRAPHJ as a plugin for ImageJ [12], which is a standard framework to perform many specific image processing tasks. The developed plugin allows to automate the standard procedures needed to analyze handwritten documents. The implemented algorithms allow to perform three main tasks: (1) automated detection of elements (text lines and words); (2) automated search of instances of a given character; (3) automated measurement of quantities (e.g., distance between words and characters, height and width of characters). Furthermore, the examiner can manually intervene to adjust automated detections and measure other quantities such as absolute and relative heights. A typical workflow employed to perform handwriting analysis in GRAPHJ is shown in Fig. 1. Developed algorithms are detailed in the following sections.
2.1 Automated Search of Text Lines
A text line can in general be divided into three areas considering the analysis protocol: a lower area, a median area, and a higher area, as it is illustrated in Fig. 2. Automated search of text lines is performed in two steps. First, all median areas are detected in the document. Second, a lower and higher areas are identified for each detected median area.
The algorithm to search for median areas is illustrated in Fig. 3 and discussed in the following:
-
1.
as a first step, the image is binarized (the resulting binary image is denoted by B) setting to 0 (black color) all pixels whose value exceeds a given threshold \(\mathcal {T}\) and setting to 1 (white color) all other pixels;
-
2.
a per-row histogram (\(H_r\)) is created by counting the number of zero pixels contained in each pixel row of the binary image. The histogram will contain a number of bins equal to the number of rows contained in the original image (i.e., the image height);
-
3.
To detect the central lines of median areas, the algorithm considers all peaks of the histogram which values are above a user-specified threshold \(s_1\). Threshold \(s_1\) is introduced to reduce the influence of noise in the search of median areas;
-
4.
the algorithm hence finds starting and ending rows of each median area. Since histogram values are expected to decay gradually around the peak, this is done by searching for the nearest lower and higher rows which value is over 1 / 4 of the value of the histogram at the given peak.
The complete procedure is reported in Algorithm 1, where: histRows(B) computes the per-row histogram of zero pixels as discussed above; findPeaks(\(H_r\)) finds the peaks (i.e., local maxima) of histogram \(H_r\) and returns both positions (indMax) and values (valMax). The algorithm returns a list of starting and ending row indexes of all detected median areas (ind).

Once median areas are detected, the algorithm detects row indexes of higher and lower areas. This is done by looking for the nearest higher and lower empty rows. Such rows are easy to detect since they do not contain any black pixel, and hence histogram \(H_r\) has value equal to zero at those locations. If no empty rows can be found, the indexes of higher and lower areas are set to correspond to the starting and ending rows of the related median area. Algorithm 2 reports the procedure used to locate indexes of higher and lower areas. The algorithm returns a list of tuples of four values: starting row index of the higher area, starting row index of median area, ending row index of median area, ending row index of lower area.

2.2 Automated Detection of Words
Automated detection of words is performed starting from text lines detected in the binarized image B. The process works in two steps:
-
1.
word boundaries are detected;
-
2.
higher and lower areas are refined for each word.
The first step of the algorithm is illustrated in Fig. 4 and discussed in the following. Let L be a crop of a given text line obtained from the binary image B. A column histogram \(H_c\) counting the number of black pixels contained in each column of L is computed. Note that computation of \(H_c\) is similar to computation of \(H_r\). To find word boundaries, the algorithm searches for bins in \(H_c\) which contain zero values. Such bins represent columns of L not containing any black pixel. If the detected gap is larger than a given threshold \(s_2\), then the starting and ending column indexes of a new word are stored in a list. The algorithm eventually returns a list of tuples of starting and ending indexes \((i_s,i_e)\). The procedure is reported in Algorithm 3.

Once word boundaries are obtained, median, higher and lower areas are detected for each word. This step is performed because words on same text line may have different size and orientation. The procedure works on image crops w of words detected using Algorithm 3. To determine the orientation of a given word, its crop w is rotated by different angles \(\alpha \) sampled from interval \([-N,N]\) at step k. For each rotated crop, a row histogram \(H_w\) is computed using function histRows and its maximum m is computed. The correct orientation is obtained by selecting angle \(\alpha \) leading to the highest value for m. This arises from the observation that, if the word is aligned horizontally, histogram \(H_w\) will be strongly peaked. Once the correct orientation has been determined, median, higher and lower areas are detected using Algorithm 1 and Algorithm 2. The whole procedure to detect lower, median and higher areas for each word is reported in Algorithm 4, where function rotate(\(w,\alpha \)) rotates word w by \(\alpha \) degrees.

2.3 Automated Search of Characters
This algorithm allows to search for all occurrences of a specific character in the document. To this end, the system allows the examiner to select a bounding box around the desired character to define a template T. The algorithm hence performs a sliding window search over the whole document to locate possible occurrences of characters. The size of the search window W is selected to be equal to the one of the template. To gain robustness to small rotations, additional candidates are generated by rotating the content of each search window by \(10^\circ \) and \(-10^\circ \). Each window is assigned a score \(S_W\) using the procedure outlined in Algorithm 5. Search windows with scores larger than a threshold set by the operator are retained as correctly detected character instances.
The scoring function reported in Algorithm 5 counts the number of black pixels contained in template T which are present in window W.

2.4 Measures
GRAPHJ allows to measure some quantities about words and characters in an automated way. The considered quantities are currently used in the standard protocol. In particular, the algorithm implements two functions:
-
automatic computation of the biaxial proportion and relative average;
-
automatic computation of the side expansion and relative average.
Automatic Computation of the Biaxial Proportion and Relative Average. Biaxial proportions are the width and the height of the oval characters (see Fig. 5). To convert such measures from pixels to millimeters, we use the dedicated ImageJ functions. For each character, GRAPHJ computes the average \(\rho _i=\dfrac{w_i}{h_i}\), where \(w_i\) and \(h_i\) are width and height of the \(i^{th}\) characters respectively.
Automatic Computation of the Side Expansion and Relative Average. The side expansion represent the distance between the characters of the word and the distance between words. Distances between words are easily computed using starting and ending word indexes computed using Algorithm 3. Distance between characters is computed in a similar way. To remove the influence of lower and upper termination of characters, those are removed. Figure 6 illustrates the computation of side expansions.
For each computed distance between characters (denoted as \(\mathcal {D}^{(C)}_j\)) and words (denoted as \(\mathcal {D}^{(W)}_j\)), GRAPHJ calculates the following ratios:
where \(\overline{w}=\sum _i w_i\) is the average width of oval characters.
3 Experimental Analysis
GRAPHJ was tested on 10 different writing samples. Samples have been written voluntarily by 10 different right-handed subjects. All documents have been written in cursive writing and using similar ink and paper. Every subject was asked to write the same long paragraph of text which was dictated to him. The text included all letters of the Italian alphabet, as well as sentences with different length and complexity.
To compare GRAPHJ performance with standard analysis methods, each sample has been manually analyzed by a forensics expert of RIS. In particular, the examiner measured the heights of two groups of 40 different letters analyzed in a sequential way with a degree of precision of \(0.1\,mm\). In the first group, it is analyzed the height (U) of letters with an upper elongate stroke on the right or on the left side (i.e. “l”, “t”, “d”, “f”, “t”, ...). In the second group, it is analyzed the height fo the body in the median zone (M) of letters without elongate stroke (i.e. “a”, “c”, “o”, “m”, ...).
Table 1 shows the mean \(\mu \) and standard deviation \(\sigma \) for the two groups of letters. The table compares measurements performed by the forensics expert to those obtained using GRAPHJ on the 10 documents of the dataset. Table 2 reports the mean absolute percentage error related to the measurements obtained on the two groups of letters considering the 10 documents in the dataset. Results show compliance of GRAPHJ analysis to measurements obtained by experts using standard manual techniques. The report generated bt GRAPHJ guarantees repeatability of the process.
A video demo of GRAPHJ is reported at our web page http://iplab.dmi.unict.it/graphj/.
4 Conclusion
We have presented GRAPHJ, an automated tool to aid the analysis of handwritten documents by forensics experts. The tool has been implemented as a plugin for ImageJ and allows to automate many operations such as detection of elements (e.g., text lines, words and characters) and measurement of quantities (e.g., character height and width). Experiments show that analyses carried out using GRAPHJ are compliant to those obtained by forensics experts using standard manual techniques.
References
Allen, M.J.: Forensic handwriting examination: a definitive guide, Reed Hayes, Reed Hayes Publications, Honolulu, HI, 254 p. ($49.95), p. 104 (2008). ISBN: 0-9778415-0-2
Evett, I.W., Totty, R.N.: A study of the variation in the dimensions of genuine signatures. J. Forensic Sci. Soc. 25(3), 207–215 (1985)
Huber, R.A., Headrick, A.M.: Handwriting Identification: Facts and Fundamentals. CRC Press, Boca Raton (1999)
Abbey, S.E.: Natural variation and relative height proportions. Int. J. Forensic Doc. Examiners 5, 108–116 (1999)
Maciaszek, J.: Natural variation in measurable features of initials. Probl. Forensic Sci. 85, 25–39 (2011)
Koppenhaver, K.M.: Forensic Document Examination: Principles and Practice. Springer Science & Business Media, Heidelberg (2007)
Morris, R.: Forensic Handwriting Identification: Fundamental Concepts and Principles. Academic press, Cambridge (2000)
Seaman Kelly, J., Lindblom, B.S.: Scientific Examination of Questioned Documents. CRC Press, Boca Raton (2006)
Fabiańska, E., Kukicki, M., Zador, G., Dziedzic, T., Bułka, D.: Graphlog-computer system supporting handwriting analysis. Probl. Forensic Sci. 68, 394–408 (2006)
Battiato, S., Giudice, O., Paratore, A.: Multimedia forensics: discovering the history of multimedia contents. In: Proceedings of the 17th International Conference on Computer Systems and Technologies 2016, pp. 5–16. ACM (2016)
Plamondon, R., Srihari, S.N.: Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)
Abràmoff, M.D., Magalhães, P.J., Ram, S.J.: Image processing with ImageJ. Biophotonics Int. 11(7), 36–42 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Guarnera, L. et al. (2017). GRAPHJ: A Forensics Tool for Handwriting Analysis. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds) Image Analysis and Processing - ICIAP 2017 . ICIAP 2017. Lecture Notes in Computer Science(), vol 10485. Springer, Cham. https://doi.org/10.1007/978-3-319-68548-9_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-68548-9_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68547-2
Online ISBN: 978-3-319-68548-9
eBook Packages: Computer ScienceComputer Science (R0)