Review
Crowdsourcing for botanical data collection towards to automatic plant identification: A review

https://doi.org/10.1016/j.compag.2018.10.042Get rights and content

Highlights

  • A comprehensive survey on various crowdsourcing systems for botanical data collecting.

  • Questionnaire-based evaluation with subjects of different expertise levels in botany.

  • Evaluation of different factors of deep learning-based plant identification methods.

Abstract

Nowadays, a number of crowdsourcing systems are available, with community-driven forums contributing both visual datasets of flora and assisting members in determining species names of a given visual observation. However, crowdsourced problem has not clearly analyzed, particularly, in terms of providing data resources for establishing a powerful vision-based plant identification. In this paper, we carry out a comprehensive survey on various crowdsourcing systems for botanical data collecting. We first analyze six systems with respect of their focus, platforms, advantages as well as drawbacks. We then conduct questionnaire-based evaluations with a number of subjects having different expertise levels in botany. The evaluation results show that (1) the current systems have been accepted by a large number of users and (2) automatic plant identification based on images plays an important role in attracting the use of these systems. However, in order to make these systems be used in worldwide level, several issues still need to address. One of these issues is to improve the automatic plant identification. In order to understand the factors that affects identification performance, we have conducted several experiments with the state-of-the-art method based on deep learning techniques on different datasets. Results from these experiments show the crucial role of crowdsourcing system in collecting visual data for developing robust and effective plant identification.

Introduction

Crowdsourcing systems in the field of biodiversity, botany, and plant identification, become more and more important with regard to different aspects of the human life. They are collaborating places where members are able to share various information such as plant identification, its practical/potential uses; geography, ecology, chemistry, genetics, etc. For the last decade, various online platforms have been deployed. For instance, iNaturalist or iSpot are initial communities where anyone can ask, contribute, give answers/comments related to a plant species. Nowadays, there is an increasing interest in these platforms thanks to the development in mobile devices, storage devices and network bandwidth technology as well as in the achievements in computer vision and machine learning, data collection. This new trend is named crowdsourcing biodiversity monitoring (Joly et al., 2016, Conrad and Hilchey, 2011). In order to make a crowdsourcing system to be more useful and acceptable to a large number of users, a review on current existing ones is naturally raised. Through this survey, crowdsourced problems for the botanical data collection and for the plant identification task can be recognized. In addition, evaluations of users’ behavior with current tools are analyzed. A list of open issues, necessary for establishing a power tool for the plant identification task, will be provided with evidences collected from recent research outcomes in this field.

In principle, available crowdsourcing systems are targeted to attract participants from different regions, countries and to facilitate the identification of plants. The goal is to obtain good quality and quantity data (Jacobs, 2016, Devillers et al., 2007). Existing crowdsourcing systems are usually based on one of three social network structures (Silvertown et al., 2015), as shown in Fig. 1. In the first one (Fig. 1a), contributions of all participants have equal weight. Concerning the second structure (Fig. 1b), a recognized expert connects and verifies information from contributions of the other users while the third type structure is based on the fact that no one can be an expert in the identification of all taxonomic groups (Fig. 1c). Each person is assigned a different weight depending on their ability to contribute to the community and the feedback from the community. In this study, we survey six representative plant data collection tools. These tools cover three types of the above social network structures. They are systematically analyzed and evaluated with respect to their specific focus, advantages as well as their drawbacks. In order to evaluate the use of these systems, we conduct a series of questionnaire-based surveys. The subjects who joined the evaluations are divided into different levels of the botanical expertise. Their answers are analyzed by statistical measurements.

In this survey, we also are motivated by an existing question: why the automatic plant identification systems are still not widely used by the large public. Recently, valuable reviews on automatic plant identification methods (Cope et al., 2012, Wäldchen and Mäder, 2017) have been published. However, it is still unclear that how designing crowdsourcing systems affects identification rate, particularly, the role of data collection tools. Gaston et al. in these study (Gaston and O’Neill, 2004) have proposed four suggestions for promoting the use of plant identification in reality. According to the authors, the research community should (1) overcome the production of a larger training dataset; (2) reduce the error rate; (3) scale up and (4) be able to detect novel species. The first suggestion involves directly data collection while the other suggestions concern the algorithms proposed for plant identification. To provide evidence for these suggestions, we analyzed the following factors: the number of to-be-identified species, the number of training images for each species and the use of multi-organs for plant identification.

In Section 2 of this paper, we give a brief summary on principles of data collection based on crowdsourcing as well as its issues. Section 3 focuses on analyzing in detail available plant data collection tools while Section 4 shows the evaluation results. Current achievement and also open issues are discussed in Section 5. Section 6 makes some conclusions.

Section snippets

Principles of crowdsourcing for data collection

Crowdsourcing is a technique that aims to take contributions from a large group of people, especially an online community where each person’s contribution combines with those of others to achieve a cumulative result (Robson, 2012, He and Wiggins, 2015, Saxton et al., 2013). That is a trend today with many examples such as the Oxford English Dictionary, Wikipedia, etc.

There are two different ways of collecting crowdsourced data (Joly et al., 2014, Jacobs, 2016). The first one is to automatically

An overview of the plant data collection

Plant information collection usually requires to get different information for an observation. The most important is a species name. Knowing species name allows accessing other information such as geography, ecology, chemistry, etc. Moreover, with the purpose is to collect data for developing automatic plant identification system, images and other context information such as GPS are required. Collecting images of plant is a time and effort consuming work because of the following reasons. The

The evaluation results on the current crowdsourcing systems

In order to take feedback from users and understand the use of these systems, we have conducted a survey for two most widely used explicit systems: Pl@ntNet and iNaturalist. Pl@ntNet is representative for an automatic plant identification supported tool, while iNaturalist relies on plant identification by expert/community. For our survey, three groups of users have been chosen. The first group named G1 contains 26 s-year students in information technology. These students have good knowledge in

Discussion on current results and open issues

Based on the study, we can confirm that crowdsourcing is a right trend for biodiversity data collection allowing to produce big amounts of data that are much more timely and cheaper than the conventional approach. This may open a new approach for ecological monitoring. However, several issues still need to be solved. They are listed below:

(a) A biodiversity crowdsourcing system supports in cross-country

In principle, a crowdsourcing system should be easier to use and attracts more participants

Conclusions

In this paper, we have studied and analyzed the use of crowdsourcing systems for botanical plant collections. Six representative systems, including both implicit and explicit ones, have been presented and analyzed in detail. Moreover, a question-based evaluation has been carried out for two most widely used systems that are Pl@ntNet and iNaturalist. Our study shows that (1) the current systems have been accepted by a large number of users and (2) automatic plant identification based on images

Acknowledgments

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 106.06-2018.23.

References (76)

  • Bonnet, P., Arbonnier, M., Grard, P., 2005. A graphic tool for the identification of west african savannas trees. In...
  • C.C. Conrad et al.

    A review of citizen science and community-based environmental monitoring: issues and opportunities

    Environ. Monit. Assess.

    (2011)
  • Deng, R.S.L.-J.L.K.L.J., Dong, W., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchical image database. In: IEEE...
  • Deng, D.-P., Mai, G.-S., Chuang, T.-R., Lemmens, R., Shao, K.-T., 2014. Social web meets sensor web: From...
  • R. Devillers et al.

    Towards spatial data quality information analysis tools for experts assessing the fitness for use of spatial data

    Int. J. Geograph. Inf. Sci.

    (2007)
  • A. Doan et al.

    Crowdsourcing systems on the world-wide web

    Commun. Assoc. Comput. Mach. (ACM)

    (2011)
  • T.-B. Do et al.

    Plant identification using score-based fusion of multi-organ images

  • K.J. Gaston et al.

    Automated species identification: why not?

    Philosoph. Trans. Roy. Soc. London B: Biol. Sci.

    (2004)
  • M.A.J. Ghasab et al.

    Feature decision-making ant colony optimization system for an automated recognition of plant species

    Expert Syst. Appl.

    (2015)
  • M.M. Ghazi et al.

    Plant identification using deep neural networks via optimization of transfer learning parameters

    Neurocomputing

    (2017)
  • Goëau, H., Bonnet, P., Joly, A., Bakic, V., Barthélémy, D., Boujemaa, N., Molino, J.-F., 2013. The imageclef 2013 plant...
  • H. Goëau et al.

    Pl@ntnet mobile app

  • Goëau, H., Joly, A., Yahiaoui, I., Bakić, V., Verroust-Blondet, A., Bonnet, P., Barthélémy, D., Boujemaa, N., Molino,...
  • H. Goëau et al.

    Pl@ntnet mobile 2014: Android port and new features

  • Goëau, H., Joly, A., Bonnet, P., Selmi, S., Molino, J.-F., Barthélémy, D., Boujemaa, N., 2014. Lifeclef plant...
  • Goëau, H., Bonnet, P., Joly, A., 2015. LifeCLEF Plant Identification Task 2015. In: CEUR-WS (Ed.), CLEF: Conference and...
  • Goëau, H., Bonnet, P., Joly, A., 2016. Plant identification in an open-world (lifeclef 2016), CLEF Working Notes...
  • Goëau, H., Bonnet, P., Joly, A., 2017. Plant identification based on noisy web data: the amazing performance of deep...
  • Goëau, H., Bonnet, P., Joly, A., 2018. Overview of expertlifeclef 2018: how far automated identification systems are...
  • R. Govaerts

    How many species of seed plants are there?

    Taxon

    (2001)
  • A. He et al.

    Multi-organ plant identification with multi-column deep convolutional neural networks

  • He, Y., Wiggins, A., 2015. Community-as-a-service: Data validation in citizen science. METHOD 2015 Workshop, pp....
  • A.-x. Hong et al.

    A flower image retrieval method based on roi feature

    J. Zhejiang Univ.-Sci. A

    (2004)
  • T.-H. Hsu et al.

    An interactive flower image recognition system

    Multimedia Tools Appl.

    (2011)
  • http://www.flowerchecker.com/ (retrievel...
  • http://www.gardenanswers.com/ (retrievel...
  • http://www.inaturalist.org/ (retrieved...
  • http://www.plantifier.com (retrieved...
  • Cited by (15)

    View all citing articles on Scopus
    View full text