Abstract
In this survey we present some of the most used data mining techniques in the field of agriculture. Some of these techniques, such as the k-means, the k nearest neighbor, artificial neural networks and support vector machines, are discussed and an application in agriculture for each of these techniques is presented. Data mining in agriculture is a relatively novel research field. It is our opinion that efficient techniques can be developed and tailored for solving complex agricultural problems using data mining. At the end of this survey we provide recommendations for future research directions in agriculture-related fields.
Similar content being viewed by others
References
Abello J, Pardalos PM, Resende M (2002) Handbook of massive data sets. Kluwer, New York
Aerts J-M, Jans P, Halloy D, Gustin P, Berckmans D (2004) Labeling of cough data from pigs for on-line disease monitoring by sound analysis. Am Soc Agric Biol Eng 48(1):351–354
Angiulli F, Folino G (2007) Efficient distributed data condensation for nearest neighbor classification. In: Kermarrec A-M, Bouge L, Priol T (eds) Lecture notes on computer science vol 4641, pp 338–347
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):955–974
Busygin S, Prokopyev OA, Pardalos PM (2005) Feature selection for consistent biclustering via fractional 0–1 programming. J Comb Optim 10:7–21
Brown RL (1995) Accelerated template matching using template trees grown by condensation. IEEE Trans Syst Man Cybernet 25(3):523–528
Brudzewski K, Osowski S, Markiewicz T (2004) Classification of milk by means of an electronic nose and SVM neural network. Sens Actuators B98:291–298
Camps-Valls G, Gomez-Chova L, Calpe-Maravilla J, Soria-Olivas E, Martin-Guerrero JD, Moreno J (2003) Support vector machines for crop classification using hyperspectral data. Lect Notes Comp Sci 2652:134–141
Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531
Chedad A, Moshou D, Aerts JM, Van Hirtum A, Ramon H, Berckmans D (2001) Recognition system for pig cough based on probabilistic neural networks. J Agricult Eng Res 79(4):449–457
Cortes C, Vapnik V (1995) Support vector networks. Mach Learning 20:273–297
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Das KC, Evans MD (1992) Detecting fertility of hatching eggs using machine vision II: neural network classifiers. Trans ASAE 35(6):2035–2041
Dempster AP, Laird NM, Rubin RD (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38
Devi VS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35:505–513
Du C-J, Sun D-W (2005) Pizza sauce spread classification using colour vision and support vector machines. J Food Eng 66:137–145
Fagerlund S (2007) Bird species recognition using support vector machines. EURASIP J Adv Signal Processing, Article ID 38637, p 8
Fnaiech N, Abid S, Fnaiech F, Cheriet M (2004) A modified version of a formal pruning algorithm based on local relative variance analysis. First international symposium on control, communications and signal processing, March
Gates GW (1972), The reduced nearest neighbor rule. IEEE Trans Inf Theory 18:431–433
Gil-Garcia R, Badia-Contelles JM, Pons-Porrata A (2007) Parallel nearest neighbour algorithms for text categorization. Lect Notes Comp Sci 4641:328–337
Guan Y, Ghorbani AA, Belacel N (2003) Y-means: a clustering method for intrusion Detection. In: IEEE Canadian conference on electrical and computer engineering, proceedings, 1083–1086
Hansen P, Mladenovic N (2002) J-means: a new local search heuristic for minimum sum-of-squares clustering. Pattern Recognit 34(2):405–413
Hammerstrom D (1993) Neural networks at work. IEEE Spectr:26–32 (June)
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516
Hartigan J (1975) Clustering algorithms. John Wiles & Sons, New York
Holmgren P, Thuresson T (1998) Satellite remote sensing for forestry planning: a review. Scand J For Res 13(1):90–110
Graf HP, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade SVM. In: Bernhard S, Leon B (eds) Advances in neural information processing systems. Lawrence Saul, vol 17, MIT Press
Jagtap SS, Lall U, Jones JW, Gijsman AJ, Ritchie JT (2004) Dynamic nearest-neighbor method for estimating soil water parameters. Trans ASAE 47(5):1437–1444
Jinlan T, Lin Z, Suqin Z, Lu L (2005) Improvement and parallelism of k-means clustering algorithm. Tsinghua Sci Technol 10(3):277–281
Jolliffe IT (1972) Discarding variables in a principal component analysis. I: artificial data. Appl Stat 21(2):160–173
Jones JW, Tsuji GY, Hoogenboom G, Hunt LA, Thornton PK, Wilkens PW, Imamura DT, Bowen WT, Singh U (1998) Decision support system for agrotechnology transfer: DSSAT v3. In: Tsuji GY, Hoogenboom G, Thornton PK (eds) Understanding options for agricultural production. Kluwer Academic Publishers, Dordrecht, pp 157–177
Jorquera H, Perez R, Cipriano A, Acuna G (2001) Short term forecasting of air pollution episodes. In: Zannetti P (eds) Environmental modeling 4. WIT Press, UK
Karimi Y, Prasher SO, Patel RM, Kim SH (2006) Application of support vector machine technology for Weed and nitrogen stress detection in corn. Comput Electronics Agricult 51:99–109
Kernel-Machines web site: http://www.kernel-machines.org/
Klosgen W, Zytkow JM (2002) Handbook of data mining and knowledge discovery. Oxford University Press
Krishna K, Murty M (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern Part B Cybern 29(3):433–439
Leemans V, Destain MF (2004) A real time grading method of apples based on features extracted from defects. J Food Eng 61:83–89
Leonard RA, Knisel WG, Still DA (1987) GLEAMS: groundwater-loading effects of agricultural management systems. Trans Am Soc Agric Eng 30(5):1403–1418
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Nahapatyan A, Busygin S, Pardalos P (2008) An improved heuristic for consistent biclustering problems. In: Mondaini RP, Pardalos PM (eds) Mathematical modelling of biosystems. Appl Optim 102. Springer, pp 185–198
The MathWorks: http://www.mathworks.com/
Meyer GE, Neto JC, Jones DD, Hindman TW (2004) Intensified fuzzy clusters for classifying plant, soil, and residue regions of interest from color images. Comput Electronics Agric 42:161–180
Moreaux B, Beerens D, Gustin P (1999) Development of a cough induction test in pigs: effects of SR 48968 and enalapril. J Veterinary Pharmacol Ther 22:387–389
Moshou D, Chedad A, Van Hirtum A, De Baerdemaeker J, Berckmans D, Ramon H (2001) An intellingent alarm for early detection of swine epidemics based on neural networks. Am Soc Agric Eng 44(1):167–174
Moshou D, Chedad A, Van Hirtum A, De Baerdemaeker J, Berckmans D, Ramon H (2001) Neural recognition system for swine cough. Math Comput Simul 56:475–487
Mucherino A, Papajorgji P, Pardalos PM (2009) Data mining in agriculture. Springer, New York (in press)
Nurnberger A, Pedrycz W, Kruse R (2002) Neural network approaches. In: Klosgen W, Zytkow JM (eds) Handbook of data mining and knowledge discovery. Oxford University Press
Papajorgji P, Pardalos PM (2006) Software engineering techniques applied to agricultural systems an object-oriented and UML Approach. Springer, New York
Pardalos PM, Boginski LV, Vazacopoulos A (2007) Data mining in biomedicine. Springer, New York
Pardalos PM, Hansen P (2008) Data mining and mathematical programming. American Mathematical Society, USA
Patel VC, McClendon RW, Goodrum JW (1994) Crack detection in eggs using computer vision and neural networks. Artif Intell Appl 8(2):21–31
Fernandez Pierna JA, Baeten V, Michotte Renier A, Cogdill RP, Dardenne P (2004) Combination of support vector machines (SVM) and near-infrared (NIR) imaging spectroscopy for the detection of meat and bone meal (MBM) in compound feeds. J Chemom 18:341–349
Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schilkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, MA, USA, pp 185–208
Rajagopalan B, Lall U (1999) A k-nearest-neighbor simulator for daily precipitation and other weather variables. Wat Res Res 35(10):3089–3101
Reed R (1993) Pruning algorithms—A survey. IEEE Trans Neural Netw 4(5):740–747
Riul A Jr, de Sousa HC, Malmegrim RR, dos Santos DS Jr, Carvalho ACPLF, Fonseca FJ, Oliveira Jr ON, Mattoso LHC (2004) Wine classification by taste sensors made from ultra-thin films and using neural networks. Sens Actuators B98:77–82
Schwenker F (2000) Hierarchical support vector machines for multi-class pattern recognition. In: Proceedings of the 4th international conference on knowledge-based intellingent engineering systems and allied technologies (KES 2000), vol 2, pp 561–565, Brighton, UK
Shahin MA, Tollner EW, McClendon RW (2001) Artificial intelligence classifiers for sorting apples based on watercore. J Agric Eng Res 79(3):265–274
Seiffert U (2002) Artificial neural networks on massively parallel computer hardware. European symposium on artificial networks proceedings, Bruges (Belgium), 319–330
Spath H (1980) Cluster analysis algorithms for data reduction and classification of objects. Ellis Horwood, Chichester
Stockle CO, Martin SA, Campbell GS (1994) CropSyst, a cropping systems model: water/nitrogen budgets and crop yield. Agric Syst 46(3):335–359
Sung KK, Poggio T (2009) Example-based learning for view-based human face detection. A.I. Memo 1521, MIT
Tripathi S, Srinivas VV, Nanjundiah RS (2006) Downscaling of precipitation for climate change scenarios: a support vector machine approach. J Hydrol 330:621–640
Urtubia A, Perez-Correa JR, Meurens M, Agosin E (2004) Monitoring large scale wine fermentations with infrared spectroscopy. Talanta 64:778–784
Urtubia A, Perez-Correa JR, Soto A, Pszczolkowski P (2007) Using data mining techniques to predict industrial wine problem fermentations. Food Control 18:1512–1517
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Verheyen K, Adriaens D, Hermy M, Deckers S (2001) High-resolution continuous soil classification using morphological soil profile descriptions. Geoderma 101:31–48
Wu Y, Ianakiev K, Govindaraju V (2002) Inproved k-nearest neighbor classification. Pattern Recognit 35:2311–2318
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37
Zhang Y, Xiong Z, Mao J, Ou L (2006) The study of parallel k-means Algorithm. In: Proceedings of the 6th world congress on intelligent control and automation 2:5868–5871
Acknowledgment
This research has been partially supported by NSF grants.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mucherino, A., Papajorgji, P. & Pardalos, P.M. A survey of data mining techniques applied to agriculture. Oper Res Int J 9, 121–140 (2009). https://doi.org/10.1007/s12351-009-0054-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12351-009-0054-6