Elsevier

Ecological Informatics

Volume 59, September 2020, 101107
Ecological Informatics

A cluster-classification method for accurate mining of seasonal honey bee patterns

https://doi.org/10.1016/j.ecoinf.2020.101107Get rights and content

Abstract

Bees are the main pollinators of most wild plant species and insect-pollinated crops and are essential for both plant ecosystems maintenance and humans food production. Among the crops used for human feeding, 75% depend on pollination. In addition to the fact that uncertainty around the beekeeping activity could jeopardize the economic value of pollination, data on honey bee colony losses exist but have not been thoroughly and systematically analyzed to identify potential causal factors. Recognition of seasonal honey bee data patterns can be useful for a number of purposes such as swarming observations, and for forecasting colonies absconding - especially for those hives where the resources are scarce. Here we propose a method to identify honey bee seasonal patterns. The main aim of this research in identifying these patterns is to assist the beekeeper in the management and maintenance of their hives, and, additionally, to prove that with machine learning and, in particular, unsupervised learning is possible to detect seasonal honey bee patterns. We applied a clustering technique in two real datasets from HiveTool.net pursuing brood temperature, relative humidity, and beehives weight. From a clustering validation index and the k-means algorithm, we have found 6 coherent patterns related to seasons. From the found patterns, we compared three well-known classification algorithms (Naive Bayes, k-NN, and Random Forest) to propose a high accuracy classification model (hit rates up to 99.67%) that suggests seasonal honey bee patterns for remote monitoring computing systems.

Introduction

Among all animal pollinators, insects, alone, were valued in €153 billions for their contribution to the pollination of crops worldwide, representing 9.5% of the total value of the world agricultural production used for human food just in 2005 (Gallai et al., 2009). Bees are the most important group of pollinators (Brown et al., 2016) and the Western honey bee (Apis mellifera) is the species most commonly used for pollination purposes around the world (Gallai et al., 2009; Potts et al., 2016). Approximately 75% of crops around the world depend on insects in general for agricultural production of fruits and/or seeds (Ollerton et al., 2011; Potts et al., 2010). Recent works have shown a reduction in the number of pollinating species worldwide (Potts et al., 2016). In particular, honey bee populations have suffered mass deaths in some European regions and in North America due to Colony Collapse Disorder (CCD) and severe winters (Barron, 2015; Gil-Lebrero et al., 2017).

To detect honey bee colonies abnormal states like low adult bee population, spotty brood pattern, and queen loss, it is usually necessary to open the beehives, remove the frames and check on them in a routine called an inspection. In addition to being an invasive process, a careful inspection generates colonies stress (Braga et al., 2020), which can put at risk the pollination services and also honey production. Additionally, bees can be crushed by the frames and box movements. Moreover, many colonies are kept in remote or distant rural apiaries so that inspections at such locations require long shifts. In this sense, remote monitoring of apiaries can assist beekeepers by adding valuable information on the bee's behavior without an invasive inspection (Kridi et al., 2016; Meikle et al., 2017; Murphy et al., 2016; Sánchez et al., 2015; Zacepins et al., 2017; Zogovic et al., 2017), as well as saving the bees from unnecessary stress or other non-productive activities.

Today, thanks to the sensor networks and Internet of Things paradigms, beekeepers and researchers can remotely monitor bee colonies (Kridi et al., 2016; Meikle and Holst, 2015; Zogovic et al., 2017). Remote monitoring via wireless sensors is one of the most important characteristics of the precision beekeeping (Zacepins et al., 2015) which basically involves beehives data collection, data analysis and support decision making in an apiary management context (Dineva and Atanasova, 2018). Once the sensors are installed in the hives, the apiary can be monitored without disturbance, even during periods when invasive inspections of the hives are contraindicated, such as during the winter (Meikle et al., 2017). However, little is yet known about the semantics of the data collected from the hives (Jacobs et al., 2017; Zacepins et al., 2015), such as which physical variables most affect the bees behavior. Such knowledge would help to improve, for instance, the bee colonies' well-being and pollination results.

Here we propose a method for knowledge discovery based on clustering for extracting seasonal honey bee patterns and then get an accurate classification model. Based on this method, our goal is to answer the following central Research Question (RQ): “How to identify biologically relevant seasonal patterns of bee colonies from different hives even if they are from different apiaries?”

The main contribution of this paper is a method that combines clusterization with data classification to detect and recognize seasonal honey bee patterns. To validate the proposed method, we have used two periods found on a yearly basis, the first period corresponds to the spring and summer seasons (the bee active period), and the second corresponds to the autumn and winter seasons in the northern hemisphere, the “quiet” period of the colonies (Kviesis and Zacepins, 2016).

These patterns are composed of temperature, humidity, and weight sensors measurements. To accomplish this, we have established the following activities as showed in Fig. 1:

  • i.

    obtained raw datasets of temperature, humidity, and weight of the bees' colonies during a full cycle year;

  • ii.

    removed anomalies and normalized data;

  • iii.

    split the colony datasets into subsets corresponding to the seasonal periods of a one-year cycle;

  • iv.

    defined the optimal quantity of seasonal patterns for each period;

  • v.

    collected the data seasonal patterns for each period;

  • vi.

    recognized and interpreted each data pattern by expert;

  • vii.

    labeled the dataset;

  • viii.

    split the dataset into training, test, and validation subsets;

  • ix.

    applied classification algorithms.

Section snippets

Material and methods

This section describes the methodological aspects of the research carried out concerning the tool used, data collection and preprocessing, machine learning strategies, as well as the analysis and detection of the bee colony states under study.

Results

In the Fig. 2 it is possible to observe the evolution of the value of the SSD for k = 2,5,10,15,20 and 24 for the 1o period in Arnas dataset. In this example, the convergence occurred in iteration 9 for k = 2. This behavior in relation to the convergence of k-means is repeated for the other periods. Then, the CH index was calculated to determine the amount of partitions that was most appropriate to each period (step 1.3 of Algorithm 1).

With the best prototypes of each k value defined, they were

Discussion

The interpretation and validation of the obtained clusters was done by an expert in beekeeping and by the knowledge available in articles on the subject. The expert in beekeeping have used also plots of external temperature and humidity to support the interpretation of clusters.

For the first period, from March to August 2017, the methodology used returned as a result 5 clusters to the beehive of the Arnas apiary. Their centroids are shown in Table 1, in the columns under the description “Arnas

Conclusion

Here we propose a method based on clustering and classification techniques for recognizing seasonal honey bee patterns that can be customized and integrated into a computer system for apiaries remote monitoring. This method takes into account three variables (internal temperature, internal humidity and hive weight) and can be customized to include other time-windows patterns on a weekly/monthly basis to detect, for instance, swarming, timing for seasonal management, and the incidence of

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Finance Code 001. Danielo G. Gomes and Breno Freitas thanks the financial support of the CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) [grant numbers #302934/2010-3, #310317/2019-3, #432585/2016-8, #129426/2018-0]. Joseph A. Cazier thanks Healthy Hives 2020 grant from Project Apis m [grant number #16-0181].

References (39)

  • A.R. Braga et al.

    Applying the long-term memory algorithm to forecast thermoregulation capacity loss in honeybee colonies

  • A.R. Braga et al.

    A method for mining combined data from in-hive sensors, weather and apiary inspections to forecast the health status of honey bee colonies

    Comput. Elect. Agric.

    (2020)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • M.J.F. Brown et al.

    A horizon scan of future threats and opportunities for pollinators and pollination

    PeerJ

    (2016)
  • T. Cover et al.

    Nearest neighbor pattern classification

    IEEE Trans. Inf. Theory

    (1967)
  • K. Dineva et al.

    Osemn process for working over data acquired by iot devices mounted in beehives

    Curr. Trends Nat. Sci.

    (2018)
  • D.W. Fitzgerald et al.

    Design and development of a smart weighing scale for beehive monitoring

  • S. Gil-Lebrero et al.

    Honey bee colonies remote monitoring system

    Sensors

    (2017)
  • J. Han et al.

    Data Mining: Concepts and Techniques

    (2011)
  • Cited by (8)

    • Machine learning and materials informatics approaches in the analysis of physical properties of carbon nanotubes: A review

      2022, Computational Materials Science
      Citation Excerpt :

      DT tends to overfit and high memory usage, and can generate models with high variation [145], although this problem can be reduced with the necessary configurations. RF shows slow learning, high memory consumption, and difficult interpretation of the generated models [146]. SVM can present a long training time and low performance when outliers are presented [147].

    • Forecasting sudden drops of temperature in pre-overwintering honeybee colonies

      2021, Biosystems Engineering
      Citation Excerpt :

      Thus, control of the inner hive temperature is vital to bee colony health and its loss may indicate the colony is facing a problem. In this paper, machine learning is applied to predict homeostasis loss, since the use of machine learning techniques has already been shown to be a viable alternative for analysing beekeeping data (Braga, Gomes, Rogers, et al., 2020,a). To reduce the need for unnecessary manual inspections, a calibrated long short-term memory (LSTM) algorithm is used to forecast the internal temperature in honeybee colonies.

    • Deep learning-based classification models for beehive monitoring

      2021, Ecological Informatics
      Citation Excerpt :

      The study enabled participatory sensing using mobile phones and a cloud-based platform. In (Braga et al., 2020), a new method was proposed for classifying seasonal honey bee patterns. The method aimed to assist the beekeepers in the management and maintenance of their hives.

    • Technological Adoption and Challenges in Beekeeping: A Review

      2023, 2023 IEEE International Conference on Agrosystem Engineering, Technology and Applications, AGRETA 2023
    View all citing articles on Scopus
    View full text