Document Image Segmentation Using Deep Features

Jobin, K. V.; Jawahar, C. V.

doi:10.1007/978-981-13-0020-2_33

K. V. Jobin¹² &
C. V. Jawahar¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 841))

Included in the following conference series:

National Conference on Computer Vision, Pattern Recognition, Image Processing, and Graphics

1525 Accesses
4 Citations

Abstract

This paper explores the effectiveness of deep features for document image segmentation. The document image segmentation problem is modelled as a pixel labeling task where each pixel in the document image is classified into one of the predefined labels such as text, comments, decorations and background. Our method first extracts deep features from superpixels of the document image. Then we learn an svm classifier using these features, and segment the document image. Fisher vector encoded convolutional layer features (fv-cnn) and fully connected layer features (fc-cnn) are used in our study. Experiments validate that our method is effective and yields better results for segmenting document images in comparison to the popular approaches on benchmark handwritten datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhong, Y., Karu, K., Jain, A.K.: Locating text in complex color images. In: ICDAR (1995)
Article Google Scholar
Chen, K., Wei, H., Hennebert, J., Ingold, R., Liwicki, M.: Page segmentation for historical handwritten document images using color and texture features. In: ICFHR (2014)
Google Scholar
Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: ICDAR (2015)
Google Scholar
Ganin, Y., Lempitsky, V.: N4-fields: neural network nearest neighbor fields for image transforms. In: ACCV (2015)
Google Scholar
Chen, K., Liu, C.L., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation for historical document images based on superpixel classification with unsupervised feature learning. In: DAS (2016)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting, labeling sequence data. In: ICML (2001)
Google Scholar
Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Liu, C.L., Ingold, R.: Page segmentation for historical handwritten document images using conditional random fields. In: ICFHR (2016)
Google Scholar
Chen, K., Seuret, M.: Convolutional neural networks for page segmentation of historical document images. arXiv:1704.01474 (2016)
Cimpoi, M., Maji, S., Vedaldi, A.: Deep filter banks for texture recognition, segmentation. In: CVPR (2015)
Google Scholar
Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. P. R. Lett. 33, 934–942 (2012)
Article Google Scholar
Fischer, A., Frinken, V., Fornés, A., Bunke, H.: Transcription alignment of Latin manuscripts using hidden Markov models. In: Workshop on HDIP (2011)
Google Scholar
Fischer, A., Wuthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: Virtual Systems, Multimedia (2009)
Google Scholar
Simistira, F., Seuret, M., Eichenberger, N., Garz, A., Liwicki, M., Ingold, R.: DIVA-HisDB: a precisely annotated large dataset of challenging medieval manuscripts. In: ICFHR (2016)
Google Scholar
Leung, T., Malik, J.: Recognizing surfaces using three-dimensional textons. In: CVPR (1999)
Google Scholar
Julez, B., Bergen, J.R.: Human factors, behavioral science: textons, the fundamental elements in preattentive vision and perception of textures. In: Readings in Computer Vision (1987)
Google Scholar
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_11
Chapter Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: PAMI (2012)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Article MathSciNet Google Scholar
Imagenet classification with deep convolutional neural networks: visualizing data using t-SNE. In: JMLR (2008)
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
http://www.fki.inf.unibe.ch/databases/iam-historical-document-database
Chen, K., Seuret, M., Wei, H., Liwicki, M., Hennebert, J., Ingold, R.: Ground truth model, tool, and dataset for layout analysis of historical documents. In: DRR (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

CVIT, IIIT-Hyderabad, Hyderabad, India
K. V. Jobin & C. V. Jawahar

Authors

K. V. Jobin
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. V. Jobin .

Editor information

Editors and Affiliations

Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India
Renu Rameshan
Indraprastha Institute of Information Technology, New Delhi, India
Chetan Arora
Indian Institute of Technology, New Delhi, India
Sumantra Dutta Roy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jobin, K.V., Jawahar, C.V. (2018). Document Image Segmentation Using Deep Features. In: Rameshan, R., Arora, C., Dutta Roy, S. (eds) Computer Vision, Pattern Recognition, Image Processing, and Graphics. NCVPRIPG 2017. Communications in Computer and Information Science, vol 841. Springer, Singapore. https://doi.org/10.1007/978-981-13-0020-2_33

Download citation

DOI: https://doi.org/10.1007/978-981-13-0020-2_33
Published: 26 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0019-6
Online ISBN: 978-981-13-0020-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics