Abstract
It is estimated that 90% of the world’s species are yet to be discovered and described. The main reason for the slow pace of new species description is that the science of taxonomy can be very laborious. To formally describe a new species, taxonomists have to manually gather and analyze data from large numbers of specimens and identify the smallest subset of external body characters that uniquely diagnose the new species as distinct from all its known relatives. In this paper, we present an automated feature selection and classification scheme using logistic regression with controlled false discovery rate to address the taxonomic research need impediment in new species discovery. Unlike traditional taxonomic practice, our scheme automatically selects body shape features from specimen samples with landmarks that unite populations within species, as well as distinguishing among species. It also provides probabilistic assessment of the classification accuracy using the selected features in identifying new species. We apply the scheme to a taxonomic problem involving species of suckers in the genus Carpiodes. The results confirm the necessity of feature selection for classifier design and provide additional insight on the suspicious specimens which have traditionally been misdiagnosed as C. carpio but are in fact more close to C. cyprinus. We also compare the classification accuracy of our scheme with several well-known machine learning algorithms without and with feature selection.
Similar content being viewed by others
Notes
See http://life.bio.sunysb.edu/morph/ for more details.
References
Pimm SL, Lawton JH (1998) Ecology—planning for biodiversity. Science 279:2068–2069
Rodman JE, Cody JH (2003) The taxonomic impediment overcome: NSF’s partnerships for enhancing expertise in taxonomy (PEET) as a model. Systematic Biology 52:428–435
Wheeler QD, Raven PH, Wilson EO (2004) Taxonomy: impediment or expedient? Science 303:285
Rohlf FJ, Bookstein FL (1990) Proceedings of the Michigan morphometrics workshop. The University of Michigan Museum of Zoology
Zelditch M, Swiderski D, Sheets D, Fink W (2004) Geometric morphometrics for biologists: a primer. Elsevier Academic Press, London
Suttkus RD, Bart HL Jr (2002) A preliminary analysis of the river carpsucker, Carpiodes Carpio, in the southern portion of its range. Libro jubilar en honor al Dr. Salvador Contreras Balderas, Universidad Autonoma de Nuevo Leon, pp 209–221
Adams DC, Rohlf FJ, Slice DE (2004) Geometric morphometrics: ten years of progress following the ‘revolution’. Ital J Zool 71:5–16
Bookstein FL (1991) Morphometric tools for landmark data: geometry and biology. Cambridge University Press, New York
Bart HL, Piller KR, Clements MD, Blanton RE, Cashner M, Doosey MH, Hurley DL (2007) Unusual patterns of morphological and molecular variation in genus Carpiodes (Cypriniformes: Catostomidae). Mol Evol (in preparation)
Liang Y, Guest RM, Fairhurst MC, Potter JM (2007) Feature-based assessment of visuo-spatial neglect patients using hand-drawing tasks. Pattern Anal Appl 10(4):361–374
Theodoridis S, Koutroumbas K (2003) Pattern recognition. Elsevier Academic Press, Amsterdam
Zhang P, Bui TD, Suen CY (2004) Feature dimensionality reduction for the verification of handwritten numerals. Pattern Anal Appl 7(3):296–307
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Kverh B, Leonardis A (2004) A generalisation of model selection criteria. Pattern Anal Appl 7(1):51–65
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Harol A, Lai C, Pekalska E, Duin RPW (2007) Pairwise feature evaluation for constructing reduced representations. Pattern Anal Appl 10(1):55–68
Torkkola K (2004) Discriminative features for text document classification. Pattern Anal Appl 6(4):301–308
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300
Abramovich F, Benjamini Y, Donoho DL, Johnstone IM (2006) Adapting to unknown sparsity by controlling the false discovery rate. Ann Stat 34:584–653
Zhao Z, Chen H, Li XR (2005) Semiparametric model selection with applications to regression. In: Proceedings of IEEE workshop on statistical signal processing, pp 799–804
Schlesinger MI, Hlavac V (2002) Ten lectures on statistical and structural pattern recognition. Kluwer Academic Publishers, Dordrecht
Royall RM, (1997) Statistical evidence: a likelihood paradigm. Chapman and Hall, New York
Bi J, Bennett KP, Embrechts M, Breneman C, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243
Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-norm support vector machines. Adv Neural Inf Process Syst 16:49–56
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288
Kendall DG (1984) Shape-manifolds, procrustean metrics and complex projective spaces. Bull Lond Math Soc 16:81–121
Acknowledgments
This work was supported in part by grants from University of Mississippi, Tulane University, University of New Orleans, and US National Science Foundation (DEB-0237013 to HLB). The authors would like to thank Jinbo Bi for helpful discussions on 1-norm SVM.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, Y., Huang, S., Chen, H. et al. Joint feature selection and classification for taxonomic problems within fish species complexes. Pattern Anal Applic 13, 23–34 (2010). https://doi.org/10.1007/s10044-009-0157-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-009-0157-y