Skip to main content

Advertisement

Log in

A survey of data mining techniques applied to agriculture

  • Review
  • Published:
Operational Research Aims and scope Submit manuscript

Abstract

In this survey we present some of the most used data mining techniques in the field of agriculture. Some of these techniques, such as the k-means, the k nearest neighbor, artificial neural networks and support vector machines, are discussed and an application in agriculture for each of these techniques is presented. Data mining in agriculture is a relatively novel research field. It is our opinion that efficient techniques can be developed and tailored for solving complex agricultural problems using data mining. At the end of this survey we provide recommendations for future research directions in agriculture-related fields.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abello J, Pardalos PM, Resende M (2002) Handbook of massive data sets. Kluwer, New York

  • Aerts J-M, Jans P, Halloy D, Gustin P, Berckmans D (2004) Labeling of cough data from pigs for on-line disease monitoring by sound analysis. Am Soc Agric Biol Eng 48(1):351–354

    Google Scholar 

  • Angiulli F, Folino G (2007) Efficient distributed data condensation for nearest neighbor classification. In: Kermarrec A-M, Bouge L, Priol T (eds) Lecture notes on computer science vol 4641, pp 338–347

  • Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

  • Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):955–974

    Article  Google Scholar 

  • Busygin S, Prokopyev OA, Pardalos PM (2005) Feature selection for consistent biclustering via fractional 0–1 programming. J Comb Optim 10:7–21

    Article  Google Scholar 

  • Brown RL (1995) Accelerated template matching using template trees grown by condensation. IEEE Trans Syst Man Cybernet 25(3):523–528

    Article  Google Scholar 

  • Brudzewski K, Osowski S, Markiewicz T (2004) Classification of milk by means of an electronic nose and SVM neural network. Sens Actuators B98:291–298

    Google Scholar 

  • Camps-Valls G, Gomez-Chova L, Calpe-Maravilla J, Soria-Olivas E, Martin-Guerrero JD, Moreno J (2003) Support vector machines for crop classification using hyperspectral data. Lect Notes Comp Sci 2652:134–141

    Google Scholar 

  • Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531

    Article  Google Scholar 

  • Chedad A, Moshou D, Aerts JM, Van Hirtum A, Ramon H, Berckmans D (2001) Recognition system for pig cough based on probabilistic neural networks. J Agricult Eng Res 79(4):449–457

    Article  Google Scholar 

  • Cortes C, Vapnik V (1995) Support vector networks. Mach Learning 20:273–297

    Google Scholar 

  • Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  Google Scholar 

  • Das KC, Evans MD (1992) Detecting fertility of hatching eggs using machine vision II: neural network classifiers. Trans ASAE 35(6):2035–2041

    Google Scholar 

  • Dempster AP, Laird NM, Rubin RD (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38

    Google Scholar 

  • Devi VS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35:505–513

    Article  Google Scholar 

  • Du C-J, Sun D-W (2005) Pizza sauce spread classification using colour vision and support vector machines. J Food Eng 66:137–145

    Article  Google Scholar 

  • Fagerlund S (2007) Bird species recognition using support vector machines. EURASIP J Adv Signal Processing, Article ID 38637, p 8

  • Fnaiech N, Abid S, Fnaiech F, Cheriet M (2004) A modified version of a formal pruning algorithm based on local relative variance analysis. First international symposium on control, communications and signal processing, March

  • Gates GW (1972), The reduced nearest neighbor rule. IEEE Trans Inf Theory 18:431–433

    Article  Google Scholar 

  • Gil-Garcia R, Badia-Contelles JM, Pons-Porrata A (2007) Parallel nearest neighbour algorithms for text categorization. Lect Notes Comp Sci 4641:328–337

    Article  Google Scholar 

  • Guan Y, Ghorbani AA, Belacel N (2003) Y-means: a clustering method for intrusion Detection. In: IEEE Canadian conference on electrical and computer engineering, proceedings, 1083–1086

  • Hansen P, Mladenovic N (2002) J-means: a new local search heuristic for minimum sum-of-squares clustering. Pattern Recognit 34(2):405–413

    Article  Google Scholar 

  • Hammerstrom D (1993) Neural networks at work. IEEE Spectr:26–32 (June)

  • Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516

    Article  Google Scholar 

  • Hartigan J (1975) Clustering algorithms. John Wiles & Sons, New York

    Google Scholar 

  • Holmgren P, Thuresson T (1998) Satellite remote sensing for forestry planning: a review. Scand J For Res 13(1):90–110

    Article  Google Scholar 

  • Graf HP, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade SVM. In: Bernhard S, Leon B (eds) Advances in neural information processing systems. Lawrence Saul, vol 17, MIT Press

  • Jagtap SS, Lall U, Jones JW, Gijsman AJ, Ritchie JT (2004) Dynamic nearest-neighbor method for estimating soil water parameters. Trans ASAE 47(5):1437–1444

    Google Scholar 

  • Jinlan T, Lin Z, Suqin Z, Lu L (2005) Improvement and parallelism of k-means clustering algorithm. Tsinghua Sci Technol 10(3):277–281

    Article  Google Scholar 

  • Jolliffe IT (1972) Discarding variables in a principal component analysis. I: artificial data. Appl Stat 21(2):160–173

    Article  Google Scholar 

  • Jones JW, Tsuji GY, Hoogenboom G, Hunt LA, Thornton PK, Wilkens PW, Imamura DT, Bowen WT, Singh U (1998) Decision support system for agrotechnology transfer: DSSAT v3. In: Tsuji GY, Hoogenboom G, Thornton PK (eds) Understanding options for agricultural production. Kluwer Academic Publishers, Dordrecht, pp 157–177

    Google Scholar 

  • Jorquera H, Perez R, Cipriano A, Acuna G (2001) Short term forecasting of air pollution episodes. In: Zannetti P (eds) Environmental modeling 4. WIT Press, UK

  • Karimi Y, Prasher SO, Patel RM, Kim SH (2006) Application of support vector machine technology for Weed and nitrogen stress detection in corn. Comput Electronics Agricult 51:99–109

    Article  Google Scholar 

  • Kernel-Machines web site: http://www.kernel-machines.org/

  • Klosgen W, Zytkow JM (2002) Handbook of data mining and knowledge discovery. Oxford University Press

  • Krishna K, Murty M (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern Part B Cybern 29(3):433–439

    Article  Google Scholar 

  • Leemans V, Destain MF (2004) A real time grading method of apples based on features extracted from defects. J Food Eng 61:83–89

    Article  Google Scholar 

  • Leonard RA, Knisel WG, Still DA (1987) GLEAMS: groundwater-loading effects of agricultural management systems. Trans Am Soc Agric Eng 30(5):1403–1418

    Google Scholar 

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  Google Scholar 

  • Nahapatyan A, Busygin S, Pardalos P (2008) An improved heuristic for consistent biclustering problems. In: Mondaini RP, Pardalos PM (eds) Mathematical modelling of biosystems. Appl Optim 102. Springer, pp 185–198

  • The MathWorks: http://www.mathworks.com/

  • Meyer GE, Neto JC, Jones DD, Hindman TW (2004) Intensified fuzzy clusters for classifying plant, soil, and residue regions of interest from color images. Comput Electronics Agric 42:161–180

    Article  Google Scholar 

  • Moreaux B, Beerens D, Gustin P (1999) Development of a cough induction test in pigs: effects of SR 48968 and enalapril. J Veterinary Pharmacol Ther 22:387–389

    Article  Google Scholar 

  • Moshou D, Chedad A, Van Hirtum A, De Baerdemaeker J, Berckmans D, Ramon H (2001) An intellingent alarm for early detection of swine epidemics based on neural networks. Am Soc Agric Eng 44(1):167–174

    Google Scholar 

  • Moshou D, Chedad A, Van Hirtum A, De Baerdemaeker J, Berckmans D, Ramon H (2001) Neural recognition system for swine cough. Math Comput Simul 56:475–487

    Article  Google Scholar 

  • Mucherino A, Papajorgji P, Pardalos PM (2009) Data mining in agriculture. Springer, New York (in press)

  • Nurnberger A, Pedrycz W, Kruse R (2002) Neural network approaches. In: Klosgen W, Zytkow JM (eds) Handbook of data mining and knowledge discovery. Oxford University Press

  • Papajorgji P, Pardalos PM (2006) Software engineering techniques applied to agricultural systems an object-oriented and UML Approach. Springer, New York

  • Pardalos PM, Boginski LV, Vazacopoulos A (2007) Data mining in biomedicine. Springer, New York

  • Pardalos PM, Hansen P (2008) Data mining and mathematical programming. American Mathematical Society, USA

  • Patel VC, McClendon RW, Goodrum JW (1994) Crack detection in eggs using computer vision and neural networks. Artif Intell Appl 8(2):21–31

    Google Scholar 

  • Fernandez Pierna JA, Baeten V, Michotte Renier A, Cogdill RP, Dardenne P (2004) Combination of support vector machines (SVM) and near-infrared (NIR) imaging spectroscopy for the detection of meat and bone meal (MBM) in compound feeds. J Chemom 18:341–349

    Article  Google Scholar 

  • Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schilkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, MA, USA, pp 185–208

  • Rajagopalan B, Lall U (1999) A k-nearest-neighbor simulator for daily precipitation and other weather variables. Wat Res Res 35(10):3089–3101

    Article  Google Scholar 

  • Reed R (1993) Pruning algorithms—A survey. IEEE Trans Neural Netw 4(5):740–747

    Article  Google Scholar 

  • Riul A Jr, de Sousa HC, Malmegrim RR, dos Santos DS Jr, Carvalho ACPLF, Fonseca FJ, Oliveira Jr ON, Mattoso LHC (2004) Wine classification by taste sensors made from ultra-thin films and using neural networks. Sens Actuators B98:77–82

    Google Scholar 

  • Schwenker F (2000) Hierarchical support vector machines for multi-class pattern recognition. In: Proceedings of the 4th international conference on knowledge-based intellingent engineering systems and allied technologies (KES 2000), vol 2, pp 561–565, Brighton, UK

  • Shahin MA, Tollner EW, McClendon RW (2001) Artificial intelligence classifiers for sorting apples based on watercore. J Agric Eng Res 79(3):265–274

    Article  Google Scholar 

  • Seiffert U (2002) Artificial neural networks on massively parallel computer hardware. European symposium on artificial networks proceedings, Bruges (Belgium), 319–330

  • Spath H (1980) Cluster analysis algorithms for data reduction and classification of objects. Ellis Horwood, Chichester

    Google Scholar 

  • Stockle CO, Martin SA, Campbell GS (1994) CropSyst, a cropping systems model: water/nitrogen budgets and crop yield. Agric Syst 46(3):335–359

    Article  Google Scholar 

  • Sung KK, Poggio T (2009) Example-based learning for view-based human face detection. A.I. Memo 1521, MIT

  • Tripathi S, Srinivas VV, Nanjundiah RS (2006) Downscaling of precipitation for climate change scenarios: a support vector machine approach. J Hydrol 330:621–640

    Article  Google Scholar 

  • Urtubia A, Perez-Correa JR, Meurens M, Agosin E (2004) Monitoring large scale wine fermentations with infrared spectroscopy. Talanta 64:778–784

    Article  Google Scholar 

  • Urtubia A, Perez-Correa JR, Soto A, Pszczolkowski P (2007) Using data mining techniques to predict industrial wine problem fermentations. Food Control 18:1512–1517

    Article  Google Scholar 

  • Vapnik VN (1998) Statistical learning theory. Wiley, New York

  • Verheyen K, Adriaens D, Hermy M, Deckers S (2001) High-resolution continuous soil classification using morphological soil profile descriptions. Geoderma 101:31–48

    Article  Google Scholar 

  • Wu Y, Ianakiev K, Govindaraju V (2002) Inproved k-nearest neighbor classification. Pattern Recognit 35:2311–2318

    Article  Google Scholar 

  • Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37

    Article  Google Scholar 

  • Zhang Y, Xiong Z, Mao J, Ou L (2006) The study of parallel k-means Algorithm. In: Proceedings of the 6th world congress on intelligent control and automation 2:5868–5871

Download references

Acknowledgment

This research has been partially supported by NSF grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petraq Papajorgji.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mucherino, A., Papajorgji, P. & Pardalos, P.M. A survey of data mining techniques applied to agriculture. Oper Res Int J 9, 121–140 (2009). https://doi.org/10.1007/s12351-009-0054-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12351-009-0054-6

Keywords

Navigation