Abstract
This paper explores the effectiveness of deep features for document image segmentation. The document image segmentation problem is modelled as a pixel labeling task where each pixel in the document image is classified into one of the predefined labels such as text, comments, decorations and background. Our method first extracts deep features from superpixels of the document image. Then we learn an svm classifier using these features, and segment the document image. Fisher vector encoded convolutional layer features (fv-cnn) and fully connected layer features (fc-cnn) are used in our study. Experiments validate that our method is effective and yields better results for segmenting document images in comparison to the popular approaches on benchmark handwritten datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhong, Y., Karu, K., Jain, A.K.: Locating text in complex color images. In: ICDAR (1995)
Chen, K., Wei, H., Hennebert, J., Ingold, R., Liwicki, M.: Page segmentation for historical handwritten document images using color and texture features. In: ICFHR (2014)
Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: ICDAR (2015)
Ganin, Y., Lempitsky, V.: N4-fields: neural network nearest neighbor fields for image transforms. In: ACCV (2015)
Chen, K., Liu, C.L., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation for historical document images based on superpixel classification with unsupervised feature learning. In: DAS (2016)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting, labeling sequence data. In: ICML (2001)
Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Liu, C.L., Ingold, R.: Page segmentation for historical handwritten document images using conditional random fields. In: ICFHR (2016)
Chen, K., Seuret, M.: Convolutional neural networks for page segmentation of historical document images. arXiv:1704.01474 (2016)
Cimpoi, M., Maji, S., Vedaldi, A.: Deep filter banks for texture recognition, segmentation. In: CVPR (2015)
Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. P. R. Lett. 33, 934–942 (2012)
Fischer, A., Frinken, V., Fornés, A., Bunke, H.: Transcription alignment of Latin manuscripts using hidden Markov models. In: Workshop on HDIP (2011)
Fischer, A., Wuthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: Virtual Systems, Multimedia (2009)
Simistira, F., Seuret, M., Eichenberger, N., Garz, A., Liwicki, M., Ingold, R.: DIVA-HisDB: a precisely annotated large dataset of challenging medieval manuscripts. In: ICFHR (2016)
Leung, T., Malik, J.: Recognizing surfaces using three-dimensional textons. In: CVPR (1999)
Julez, B., Bergen, J.R.: Human factors, behavioral science: textons, the fundamental elements in preattentive vision and perception of textures. In: Readings in Computer Vision (1987)
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_11
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: PAMI (2012)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Imagenet classification with deep convolutional neural networks: visualizing data using t-SNE. In: JMLR (2008)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
http://www.fki.inf.unibe.ch/databases/iam-historical-document-database
Chen, K., Seuret, M., Wei, H., Liwicki, M., Hennebert, J., Ingold, R.: Ground truth model, tool, and dataset for layout analysis of historical documents. In: DRR (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jobin, K.V., Jawahar, C.V. (2018). Document Image Segmentation Using Deep Features. In: Rameshan, R., Arora, C., Dutta Roy, S. (eds) Computer Vision, Pattern Recognition, Image Processing, and Graphics. NCVPRIPG 2017. Communications in Computer and Information Science, vol 841. Springer, Singapore. https://doi.org/10.1007/978-981-13-0020-2_33
Download citation
DOI: https://doi.org/10.1007/978-981-13-0020-2_33
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0019-6
Online ISBN: 978-981-13-0020-2
eBook Packages: Computer ScienceComputer Science (R0)