Original papers
Synthetic bootstrapping of convolutional neural networks for semantic plant part segmentation

https://doi.org/10.1016/j.compag.2017.11.040Get rights and content
Under a Creative Commons license
open access

Highlights

  • This paper investigates convolutional neural networks on large agricultural datasets.

  • Synthetic dataset bootstrapping and empirical dataset fine-tuning is researched.

  • Plant parts can be recognized on a per-pixel level.

  • We show only a small annotated empirical dataset of 30 images is required.

  • A large synthetic dataset for bootstrapping improves performance.

Abstract

A current bottleneck of state-of-the-art machine learning methods for image segmentation in agriculture, e.g. convolutional neural networks (CNNs), is the requirement of large manually annotated datasets on a per-pixel level. In this paper, we investigated how related synthetic images can be used to bootstrap CNNs for successful learning as compared to other learning strategies. We hypothesise that a small manually annotated empirical dataset is sufficient for fine-tuning a synthetically bootstrapped CNN. Furthermore we investigated (i) multiple deep learning architectures, (ii) the correlation between synthetic and empirical dataset size on part segmentation performance, (iii) the effect of post-processing using conditional random fields (CRF) and (iv) the generalisation performance on other related datasets. For this we have performed 7 experiments using the Capsicum annuum (bell or sweet pepper) dataset containing 50 empirical and 10,500 synthetic images with 7 pixel-level annotated part classes. Results confirmed our hypothesis that only 30 empirical images were required to obtain the highest performance on all 7 classes (mean IOU = 0.40) when a CNN was bootstrapped on related synthetic data. Furthermore we found optimal empirical performance when a VGG-16 network was modified to include à trous spatial pyramid pooling. Adding CRF only improved performance on the synthetic data. Training binary classifiers did not improve results. We have found a positive correlation between dataset size and performance. For the synthetic dataset, learning stabilises around 3000 images. Generalisation to other related datasets proved possible.

Keywords

Computer vision
Semantic segmentation
Synthetic dataset
Bootstrapping
Big data

Cited by (0)