ABSTRACT
Weakly-Supervised Semantic Segmentation (WSSS) has been widely studied as a feasible option to reduce the expensive annotation costs of deep semantic segmentation models. While numerous studies have proposed novel approaches for generating pseudo-labels and demonstrated their effectiveness, their domain is limited to natural scene images. There is a growing need for a comprehensive exploration of WSSS in the domain of document images, given the increasing number of digitized historical document images and the growing importance of document image segmentation for successful information retrieval. Importantly, document images possess inherent characteristics that distinguish them from natural scene images, rendering conventional image-level labels unsuitable. Consequently, the application of recent WSSS frameworks designed for natural scene images is limited. In this work, we propose a simple yet effective pseudo-label generation technique using content-adaptive geometric feature analysis. This approach enables the training of a segmentation model in a weakly-supervised manner without relying on image-level labels. Our method utilizes a Gravity-map, which can highlight potential regions of interest without requiring a priori-knowledge, serving as an initial coarse pixel-level label. The Gravity-map is subsequently refined through simple binarization and noise removal to form a pseudo-label. Finally, a segmentation model is trained using the generated pseudo-labels. Experimental results on the publicly available historical document collection demonstrate that the proposed pseudo-label generation technique offers a viable option for training the semantic segmentation model in the document image domain.
- Jiwoon Ahn, Sunghyun Cho, and Suha Kwak. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2209--2218.Google ScholarCross Ref
- Jiwoon Ahn and Suha Kwak. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4981--4990.Google ScholarCross Ref
- Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. 2016. What's the point: Semantic segmentation with point supervision. In Computer Vision--ECCV2016: 14thEuropean Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part VII 14. Springer, 549--565.Google Scholar
- Christopher Michael Bishop. 2016. Pattern Recognition and Machine Learn- ing. springer.Google Scholar
- Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics 21, 1 (2020), 1--13.Google Scholar
- Jifeng Dai, Kaiming He, and Jian Sun. 2015. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE international conference on computer vision. 1635--1643.Google ScholarDigital Library
- Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (12 2014).Google Scholar
- Koichi Kise, Akinori Sato, and Motoi Iwata. 1998. Segmentation of page images using the area Voronoi diagram. Computer Vision and Image Understanding 70, 3 (1998), 370--382.Google ScholarDigital Library
- Bernhard Liebl and Manuel Burghardt. 2021. An evaluation of DNN architectures for page segmentation of historical newspapers. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 5153--5160.Google ScholarCross Ref
- Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, and Jian Sun. 2016. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3159--3167.Google ScholarCross Ref
- Sofia Ares Oliveira, Benoit Seguin, and Frederic Kaplan. 2018. dhSegment: A generic deep-learning approach for document segmentation. In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 7--12.Google ScholarCross Ref
- Chulwoo Pack, Leen-Kiat Soh, and Elizabeth Lorang. 2021. Visual domain knowledge-based multimodal zoning for textual region localization in noisy historical document images. Journal of Electronic Imaging 30, 6 (2021), 063028. https://doi.org/10.1117/1.JEI.30.6.063028Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234--241.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Christoph Wick and Frank Puppe. 2018. Fully convolutional neural networks for page segmentation of historical document images. In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 287--292.Google ScholarCross Ref
- Yue Xu, Wenhao He, Fei Yin, and Cheng-Lin Liu. 2017. Page segmentation for historical handwritten documents using fully convolutional networks. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 541--546.Google ScholarCross Ref
- Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.Google ScholarCross Ref
- Zhi-Hua Zhou. 2017. A brief introduction to weakly supervised learning. National Science Review 5, 1 (08 2017), 44--53. https://doi.org/10.1093/nsr/nwx106arXiv:https://academic.oup.com/nsr/article-pdf/5/1/44/31567770/nwx106.pdfGoogle Scholar
Index Terms
- Weakly-supervised Semantic Segmentation on Historical Document Images
Recommendations
Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation
AbstractThe major obstacle in semantic segmentation is that it requires a large number of pixel-level labeled data to train an effective model. In order to reduce the cost of annotation, weakly-supervised methods use weaker labels to overcome the need for ...
Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on MultimediaSuccessful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-...
Dual-aware Domain Mining and Cross-aware Supervision for Weakly-supervised Semantic Segmentation
Weakly Supervised Semantic Segmentation with image-level annotation uses localization maps from the classifier to generate pseudo labels. However, such localization maps focus only on sparse salient object regions, it is difficult to generate high-quality ...
Comments