ABSTRACT
In this paper, we describe a novel approach to Writer Identification in Offline handwriting using Latent Dirichlet Allocation. State-of-the-art methods for writer identification employ the traditional feature-classification paradigm which does not provide enough information about the handwriting attributes such as writing style which are key components in any forensic analysis of handwriting. This problem is also compounded due to lack of efficient rules for defining a particular writing style that can capture writer specific characteristics over a large dataset. We propose to address this issue by using a generative model in form of Latent Dirichlet Allocation(LDA) that automatically infers writing styles from handwritten document collection without any pre-defined set of rules. This information is then used to represent each writer as a distribution over multiple writing style for classifying any unknown writer sample. We describe our approach on two different feature sets consisting of contour angle features as well as structural and concavity features. Our experimental results show comparable performance with baseline systems and also demonstrate the efficacy of LDA for learning multiple handwriting styles.
- Bresenham line drawing algorithm. http://en.wikipedia.org/wiki/bresenham's_line_algorithm.Google Scholar
- Latent dirichlet allocation. http://www.cs.princeton.edu/~blei/lda-c/.Google Scholar
- Morphological waveform coding for writer identification. Pattern Recognition, 33(3):385--398, 2000.Google ScholarCross Ref
- A. Bhardwaj, M. Malgireddy, S. Setlur, V. Govindaraju, and S. Ramachandrula. Writer identification in offline handwriting using topic models. In Proceedings of the NIPS 2009 Workshop on Applications of Topic Models: Text and Beyond, 2009.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarCross Ref
- M. Bulacu and L. Schomaker. Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell., 29(4):701--717, 2007. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines, 2001.Google Scholar
- F. Farooq, L. Lorigo, and V. Govindaraju. On the accent in handwriting of individuals. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition, 10 2006.Google Scholar
- J. T. Favata and G. Srikantan. A multiple feature/resolution approach to handprinted digit and character recognition. International Journal of Imaging Systems and Technology, 7(4):304--311, 1996.Google ScholarCross Ref
- U. Marti and H. Bunke. The iam-database: an english sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5(1):39--46, 2002.Google ScholarCross Ref
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI '04: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 487--494, Arlington, Virginia, United States, 2004. AUAI Press. Google ScholarDigital Library
- H. E. S. Said, T. N. Tan, and K. D. Baker. Personal identification based on handwriting. Pattern Recognition, 33(1):149--160, 2000.Google ScholarCross Ref
- S. Srihari, S.-H. Cha, H. Arora, and S. Lee. Individuality of handwriting: a validation study. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on, pages 106--109, 2001. Google ScholarDigital Library
- S. N. Srihari, M. J. Beal, K. Bandi, and V. Shah. A statistical model for writer verification. In ICDAR '05: Proceedings of the Eighth International Conference on Document Analysis and Recognition, pages 1105--1109, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- S. P. Tan, H. E. S. Said, G. S. Peake, T. N. Tan, and K. D. Baker. Writer identification from non-uniformly skewed handwriting images. In In Proc. of the 9th British Machine Vision Conference, pages 478--487, 1998.Google Scholar
Index Terms
- Latent Dirichlet allocation based writer identification in offline handwriting
Recommendations
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataExtraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Discriminating Features for Writer Identification
ICDAR '13: Proceedings of the 2013 12th International Conference on Document Analysis and RecognitionThis paper investigates highly discriminating features for writer identification for off-line handwritten text lines and passages. Five categories of features are tested: slant and slant energy, skew, pixel distribution, curvature, and entropy. Four ...
A New Database for Writer Demographics Attributes Detection Based on Off-Line Persian and English Handwriting
MedPRAI-2016: Proceedings of the Mediterranean Conference on Pattern Recognition and Artificial IntelligenceThis paper describes a database of multi-script (Persian and English) for typical and new aspects and challenges of offline handwriting automatic analysis field. This database can be used for typical aspects such as different levels of segmentation and ...
Comments