ReviewCrowdsourcing for botanical data collection towards to automatic plant identification: A review
Introduction
Crowdsourcing systems in the field of biodiversity, botany, and plant identification, become more and more important with regard to different aspects of the human life. They are collaborating places where members are able to share various information such as plant identification, its practical/potential uses; geography, ecology, chemistry, genetics, etc. For the last decade, various online platforms have been deployed. For instance, iNaturalist or iSpot are initial communities where anyone can ask, contribute, give answers/comments related to a plant species. Nowadays, there is an increasing interest in these platforms thanks to the development in mobile devices, storage devices and network bandwidth technology as well as in the achievements in computer vision and machine learning, data collection. This new trend is named crowdsourcing biodiversity monitoring (Joly et al., 2016, Conrad and Hilchey, 2011). In order to make a crowdsourcing system to be more useful and acceptable to a large number of users, a review on current existing ones is naturally raised. Through this survey, crowdsourced problems for the botanical data collection and for the plant identification task can be recognized. In addition, evaluations of users’ behavior with current tools are analyzed. A list of open issues, necessary for establishing a power tool for the plant identification task, will be provided with evidences collected from recent research outcomes in this field.
In principle, available crowdsourcing systems are targeted to attract participants from different regions, countries and to facilitate the identification of plants. The goal is to obtain good quality and quantity data (Jacobs, 2016, Devillers et al., 2007). Existing crowdsourcing systems are usually based on one of three social network structures (Silvertown et al., 2015), as shown in Fig. 1. In the first one (Fig. 1a), contributions of all participants have equal weight. Concerning the second structure (Fig. 1b), a recognized expert connects and verifies information from contributions of the other users while the third type structure is based on the fact that no one can be an expert in the identification of all taxonomic groups (Fig. 1c). Each person is assigned a different weight depending on their ability to contribute to the community and the feedback from the community. In this study, we survey six representative plant data collection tools. These tools cover three types of the above social network structures. They are systematically analyzed and evaluated with respect to their specific focus, advantages as well as their drawbacks. In order to evaluate the use of these systems, we conduct a series of questionnaire-based surveys. The subjects who joined the evaluations are divided into different levels of the botanical expertise. Their answers are analyzed by statistical measurements.
In this survey, we also are motivated by an existing question: why the automatic plant identification systems are still not widely used by the large public. Recently, valuable reviews on automatic plant identification methods (Cope et al., 2012, Wäldchen and Mäder, 2017) have been published. However, it is still unclear that how designing crowdsourcing systems affects identification rate, particularly, the role of data collection tools. Gaston et al. in these study (Gaston and O’Neill, 2004) have proposed four suggestions for promoting the use of plant identification in reality. According to the authors, the research community should (1) overcome the production of a larger training dataset; (2) reduce the error rate; (3) scale up and (4) be able to detect novel species. The first suggestion involves directly data collection while the other suggestions concern the algorithms proposed for plant identification. To provide evidence for these suggestions, we analyzed the following factors: the number of to-be-identified species, the number of training images for each species and the use of multi-organs for plant identification.
In Section 2 of this paper, we give a brief summary on principles of data collection based on crowdsourcing as well as its issues. Section 3 focuses on analyzing in detail available plant data collection tools while Section 4 shows the evaluation results. Current achievement and also open issues are discussed in Section 5. Section 6 makes some conclusions.
Section snippets
Principles of crowdsourcing for data collection
Crowdsourcing is a technique that aims to take contributions from a large group of people, especially an online community where each person’s contribution combines with those of others to achieve a cumulative result (Robson, 2012, He and Wiggins, 2015, Saxton et al., 2013). That is a trend today with many examples such as the Oxford English Dictionary, Wikipedia, etc.
There are two different ways of collecting crowdsourced data (Joly et al., 2014, Jacobs, 2016). The first one is to automatically
An overview of the plant data collection
Plant information collection usually requires to get different information for an observation. The most important is a species name. Knowing species name allows accessing other information such as geography, ecology, chemistry, etc. Moreover, with the purpose is to collect data for developing automatic plant identification system, images and other context information such as GPS are required. Collecting images of plant is a time and effort consuming work because of the following reasons. The
The evaluation results on the current crowdsourcing systems
In order to take feedback from users and understand the use of these systems, we have conducted a survey for two most widely used explicit systems: Pl@ntNet and iNaturalist. Pl@ntNet is representative for an automatic plant identification supported tool, while iNaturalist relies on plant identification by expert/community. For our survey, three groups of users have been chosen. The first group named G1 contains 26 s-year students in information technology. These students have good knowledge in
Discussion on current results and open issues
Based on the study, we can confirm that crowdsourcing is a right trend for biodiversity data collection allowing to produce big amounts of data that are much more timely and cheaper than the conventional approach. This may open a new approach for ecological monitoring. However, several issues still need to be solved. They are listed below:
(a) A biodiversity crowdsourcing system supports in cross-country
In principle, a crowdsourcing system should be easier to use and attracts more participants
Conclusions
In this paper, we have studied and analyzed the use of crowdsourcing systems for botanical plant collections. Six representative systems, including both implicit and explicit ones, have been presented and analyzed in detail. Moreover, a question-based evaluation has been carried out for two most widely used systems that are Pl@ntNet and iNaturalist. Our study shows that (1) the current systems have been accepted by a large number of users and (2) automatic plant identification based on images
Acknowledgments
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 106.06-2018.23.
References (76)
Discovering and developing primary biodiversity data from social networking sites: A novel approach
Ecol. Informat.
(2014)- et al.
Plant species identification using digital morphometrics: A review
Expert Syst. Appl.
(2012) - et al.
Deep learning for plant identification using vein morphological patterns
Comput. Electron. Agric.
(2016) - et al.
A survey of image processing techniques for plant extraction and segmentation in the field
Comput. Electron. Agric.
(2016) - et al.
Interactive plant identification based on social image data
Ecol. Informat.
(2014) - et al.
Improved deep belief networks and multi-feature fusion for leaf identification
Neurocomputing
(2016) - et al.
Plant species identification using elliptic fourier leaf shape analysis
Comput. Electron. Agric.
(2006) - et al.
Leaf recognition of woody species in central europe
Biosyst. Eng.
(2013) - Affouard, A., Goeau, H., Bonnet, P., Lombardo, J.-C., Joly, A., 2017. Pl@ ntnet app in the era of deep...
- Angelova, A., Zhu, S., Lin, Y., Wong, J., Shpecht, C., 2012. Development and deployment of a large-scale flower...
A review of citizen science and community-based environmental monitoring: issues and opportunities
Environ. Monit. Assess.
Towards spatial data quality information analysis tools for experts assessing the fitness for use of spatial data
Int. J. Geograph. Inf. Sci.
Crowdsourcing systems on the world-wide web
Commun. Assoc. Comput. Mach. (ACM)
Plant identification using score-based fusion of multi-organ images
Automated species identification: why not?
Philosoph. Trans. Roy. Soc. London B: Biol. Sci.
Feature decision-making ant colony optimization system for an automated recognition of plant species
Expert Syst. Appl.
Plant identification using deep neural networks via optimization of transfer learning parameters
Neurocomputing
Pl@ntnet mobile app
Pl@ntnet mobile 2014: Android port and new features
How many species of seed plants are there?
Taxon
Multi-organ plant identification with multi-column deep convolutional neural networks
A flower image retrieval method based on roi feature
J. Zhejiang Univ.-Sci. A
An interactive flower image recognition system
Multimedia Tools Appl.
Cited by (15)
Assessing urban forest biodiversity through automatic taxonomic identification of street trees from citizen science applications and remote-sensing imagery
2024, International Journal of Applied Earth Observation and GeoinformationAn adaptive hierarchical loss for taxonomic labels and its application for plant identification
2023, 2023 International Conference on Multimedia Analysis and Pattern Recognition, MAPR 2023 - ProceedingsPASSIVELY CROWDSOURCING ONLINE IMAGES FOR MEASURING BROAD-SCALE FLY (DIPTERA) FLORAL INTERACTIONS AND BIODIVERSITY
2023, Journal of Pollination EcologyPlant hunting: exploring the behaviour of amateur botanists in the field
2021, Biodiversity and ConservationEfficient deep learning models for categorizing Chenopodiaceae in the wild
2021, International Journal of Pattern Recognition and Artificial Intelligence