Original papers
A supervised machine learning method to detect anomalous real-time broiler breeder body weight data recorded by a precision feeding system

https://doi.org/10.1016/j.compag.2021.106171Get rights and content

Highlights

  • A precision feeding system sometimes records anomalous real-time body weight data.

  • A supervised learning method was developed for anomaly detection.

  • Data distribution and features regarding the feeding activity were considered.

  • This method was more effective than 4 other common anomaly detection methods.

Abstract

A precision feeding (PF) system is an intelligent computer-controlled feeding system that can be used to feed individual broilers, breeders or layers automatically based on measuring real-time body weight (BW). Vast amounts of real-time BW data can be generated every day when birds visit a PF station. However, anomalous observations occurred in real-time BW observations, which were caused by multiple birds entering the station at the same time, upward or downward variation in scale measurement in the recorded data due to the movement of the bird, or a misread for radio frequency identification tag. Known anomalous data should be removed because they have a negative impact on the interpretation of the data. Manually cleaning the anomalies is accurate, but it is time-consuming and labor-intensive. Statistical methods and unsupervised machine learning methods are effective in detecting anomalies to some extent because they just check data distribution. The current study reported a supervised machine learning method to detect anomalies in real-time BW recorded by the PF system. Real-time BW data of 5 broiler breeders from day 15 to 306 were checked and the anomalies were manually labeled. Variables regarding the statistical distribution of data and features regarding the feeding activity recorded by the PF system in each day were extracted from the dataset. Among the 4 machine learning algorithms including k-nearest neighbor (KNN), random forest classifier (RF), support vector machine (SVM), and artificial neural network (ANN), RF produced the highest F1 score (0.9712) and area under the precision-recall curve (0.9948). Compared with 4 other common anomaly detection methods including Z-scores, interquartile range (IQR), density-based spatial clustering of applications with noise (DBSCAN), and local outlier factor (LOF), RF had a higher average F1 score (0.9448), which indicated that RF was a more effective anomaly detection algorithm for this type of data.

Introduction

Applying computer technology has proved to be beneficial to animal agriculture. Hardware and software can be used to automatically monitor animal’s performance (Banhazi et al., 2012, Berckmans, 2014), making research and production less labor-intensive, while at the same time collecting big data that is helpful to interpret and improve animal performance. A current example is a precision feeding (PF) system for poultry, which was developed at the University of Alberta (Zuidhof et al., 2017, Zuidhof et al., 2019). It is a sequential feeding system that aims to increase the body weight (BW) uniformity in a flock of birds by allocating the right amount of feed over several small meals each day to birds on an individual basis. Birds are individually weighed in the PF station, and then a decision is made within the system on whether or not to feed the bird based on comparing its real-time BW to the target BW. Birds frequently visit the PF station to gain access to feed, and BW data are recorded upon each visit (Fig. 1). Visit frequency of breeder pullets from 2 to 22 weeks of age varied from 28 to 138 visits per day (Zuidhof, 2018). These data are likely to be contaminated by occasional anomalous observations, which can be caused by multiple birds entering the station at the same time, upward or downward variation in scale measurement in the recorded data due to the movement of the bird, or a misread for radio frequency identification (RFID) tag. These anomalous observations can cause incorrect estimations of daily BW and daily BW gain. Statistical methods and unsupervised learning methods may be used to detect the anomalies in real-time BW. These methods are effective to some extent, because they just focus on checking data distribution and they are incapable of distinguishing reasonable variations of BW caused by the feeding activity of birds from unreasonable variations of BW that cannot be explained by the feeding activity. Removing the anomalies in the data manually is accurate because people can judge anomalous observations by considering data distribution and features regarding the feeding activity of individual birds recorded by the PF system; however, it is time-consuming and labor-intensive. In the current study, a supervised machine learning method was used to detect anomalies in real-time BW of individual birds recorded by the PF system, based on manually labeled data. Variables regarding not only statistical distribution but also features associated with the feeding activity of individual birds recorded by the PF system were extracted from a dataset recorded by a PF system. Based on the labeled data, various machine learning algorithms were applied, and then the algorithm with the highest F1 score and area under the precision-recall curve (AUCPR) was selected to compare with 4 other common anomaly detection methods.

Section snippets

Method

Fig. 2 illustrates the key steps for developing the machine learning method to detect anomalies. In the current study, Python 3.7.0 was used to facilitate all the data analysis work including data preprocessing, feature engineering, algorithm selection, and comparison with other common anomaly detection methods. Scikit-learn library 0.21.0 (Pedregosa et al., 2011) and the deep learning framework Keras (Kumar and Manjula, 2019) were used to implemented machine learning algorithm.

Results

Table 3 shows the evaluation of 4 machine learning algorithms with optimized hyper-parameters. KNN had the highest precision (0.9746) and SVM had the highest recall (0.9917); however, RF had the highest F1 score (0.9712) that was the harmonic mean of precision and recall. In addition, Fig. 4 shows AUCPR of RF (0.9948) was higher than all other algorithms, indicating that RF was a more effective model for this imbalanced binary classification problem. Thus, RF was selected as the best algorithm

Discussion

The PF system recorded real-time broiler breeder BW in two dimensions: real-time BW and time. There were two characteristics for the recorded data: regularly shaped over a long period of time and irregularly scattered in one day (Fig. 1). Since the PF system fed each individual birds following a target BW curve that was a sigmoidal shape, real-time BW data of an individual bird throughout the trial (from day 15 to day 306) that were temporally sequenced can be described by a triphasic Gompertz

Conclusions

The current study was the first to propose a supervised machine learning method to detect anomalies in real-time BW data of broiler breeders collected by a PF system. Real-time BW data of 5 randomly selected broiler breeders were used in the current study. To detect the anomalous observations over the period of trial (from day 15 to day 306), 12 variables considering statistical distribution of data and features regarding the feeding activity recorded by the PF system for each day were created

CRediT authorship contribution statement

Jihao You: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing - original draft. Edmond Lou: Conceptualization, Resources, Supervision, Writing - review & editing. Mohammad Afrouziyeh: Data curation, Writing - review & editing. Nicole Zukiwsky: Data curation, Writing - review & editing. Martin J. Zuidhof: Conceptualization, Project administration, Funding acquisition, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research data originated from a project financed by Alberta Agriculture and Forestry (Edmonton, Alberta). Authors would like to acknowledge students and staff of Poultry Research Center at the University of Alberta for technical support. Authors would also like to acknowledge the technical support from AI-Supercomputing Hub at University of Alberta.

References (32)

  • Breiman, L., 2001. Random forests. Machine learning 45, 5-32, doi 10.1023/A:...
  • Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J., 2000. LOF: identifying density-based local outliers. Proceedings...
  • Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort,...
  • M.K. Cain et al.

    Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

    Behavior Research Methods

    (2017)
  • P. Čisar et al.

    Skewness and kurtosis in function of selection of network traffic distribution

    Acta Polytechnica Hungarica

    (2010)
  • C. Cortes et al.

    Support-vector networks

    Machine learning

    (1995)
  • Cited by (7)

    • Architecture of broiler breeder energy partitioning models

      2022, Poultry Science
      Citation Excerpt :

      Energy requirement predicting models have been used to establish optimized levels of dietary nutrients and more profitable feeding programs for poultry (Sakomura, 2004), yet the effect of dividing BW and production data to different length of periods (chunk size) on the fitting and predictive performance of the models remains to be elucidated. We hypothesized that increasing data chunk size could account for unexplained variation in data caused by variation in health status and voluntary activity level of birds, anomalies in real-time BW data recorded by a precision feeding (PF) system (You et al., 2021), and environmental conditions. Furthermore, the effect of including random terms associated with different model parameters (individual maintenance ME and age) on the fitting performance of the models has been investigated (van der Klein et al., 2020).

    • Precision Livestock Farming (PLF) Systems: Improving Sustainability and Efficiency of Animal Production

      2023, International Series in Operations Research and Management Science
    View all citing articles on Scopus
    View full text