Skip to main content
Log in

Joint feature selection and classification for taxonomic problems within fish species complexes

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

It is estimated that 90% of the world’s species are yet to be discovered and described. The main reason for the slow pace of new species description is that the science of taxonomy can be very laborious. To formally describe a new species, taxonomists have to manually gather and analyze data from large numbers of specimens and identify the smallest subset of external body characters that uniquely diagnose the new species as distinct from all its known relatives. In this paper, we present an automated feature selection and classification scheme using logistic regression with controlled false discovery rate to address the taxonomic research need impediment in new species discovery. Unlike traditional taxonomic practice, our scheme automatically selects body shape features from specimen samples with landmarks that unite populations within species, as well as distinguishing among species. It also provides probabilistic assessment of the classification accuracy using the selected features in identifying new species. We apply the scheme to a taxonomic problem involving species of suckers in the genus Carpiodes. The results confirm the necessity of feature selection for classifier design and provide additional insight on the suspicious specimens which have traditionally been misdiagnosed as C. carpio but are in fact more close to C. cyprinus. We also compare the classification accuracy of our scheme with several well-known machine learning algorithms without and with feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. See http://life.bio.sunysb.edu/morph/ for more details.

References

  1. Pimm SL, Lawton JH (1998) Ecology—planning for biodiversity. Science 279:2068–2069

    Article  Google Scholar 

  2. Rodman JE, Cody JH (2003) The taxonomic impediment overcome: NSF’s partnerships for enhancing expertise in taxonomy (PEET) as a model. Systematic Biology 52:428–435

    Google Scholar 

  3. Wheeler QD, Raven PH, Wilson EO (2004) Taxonomy: impediment or expedient? Science 303:285

    Article  Google Scholar 

  4. Rohlf FJ, Bookstein FL (1990) Proceedings of the Michigan morphometrics workshop. The University of Michigan Museum of Zoology

  5. Zelditch M, Swiderski D, Sheets D, Fink W (2004) Geometric morphometrics for biologists: a primer. Elsevier Academic Press, London

  6. Suttkus RD, Bart HL Jr (2002) A preliminary analysis of the river carpsucker, Carpiodes Carpio, in the southern portion of its range. Libro jubilar en honor al Dr. Salvador Contreras Balderas, Universidad Autonoma de Nuevo Leon, pp 209–221

  7. Adams DC, Rohlf FJ, Slice DE (2004) Geometric morphometrics: ten years of progress following the ‘revolution’. Ital J Zool 71:5–16

    Article  Google Scholar 

  8. Bookstein FL (1991) Morphometric tools for landmark data: geometry and biology. Cambridge University Press, New York

    MATH  Google Scholar 

  9. Bart HL, Piller KR, Clements MD, Blanton RE, Cashner M, Doosey MH, Hurley DL (2007) Unusual patterns of morphological and molecular variation in genus Carpiodes (Cypriniformes: Catostomidae). Mol Evol (in preparation)

  10. Liang Y, Guest RM, Fairhurst MC, Potter JM (2007) Feature-based assessment of visuo-spatial neglect patients using hand-drawing tasks. Pattern Anal Appl 10(4):361–374

    Article  MathSciNet  Google Scholar 

  11. Theodoridis S, Koutroumbas K (2003) Pattern recognition. Elsevier Academic Press, Amsterdam

  12. Zhang P, Bui TD, Suen CY (2004) Feature dimensionality reduction for the verification of handwritten numerals. Pattern Anal Appl 7(3):296–307

    MathSciNet  Google Scholar 

  13. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  14. Kverh B, Leonardis A (2004) A generalisation of model selection criteria. Pattern Anal Appl 7(1):51–65

    Article  MathSciNet  Google Scholar 

  15. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  Google Scholar 

  16. Harol A, Lai C, Pekalska E, Duin RPW (2007) Pairwise feature evaluation for constructing reduced representations. Pattern Anal Appl 10(1):55–68

    Article  MathSciNet  Google Scholar 

  17. Torkkola K (2004) Discriminative features for text document classification. Pattern Anal Appl 6(4):301–308

    MathSciNet  Google Scholar 

  18. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300

    MATH  MathSciNet  Google Scholar 

  19. Abramovich F, Benjamini Y, Donoho DL, Johnstone IM (2006) Adapting to unknown sparsity by controlling the false discovery rate. Ann Stat 34:584–653

    Article  MATH  MathSciNet  Google Scholar 

  20. Zhao Z, Chen H, Li XR (2005) Semiparametric model selection with applications to regression. In: Proceedings of IEEE workshop on statistical signal processing, pp 799–804

  21. Schlesinger MI, Hlavac V (2002) Ten lectures on statistical and structural pattern recognition. Kluwer Academic Publishers, Dordrecht

  22. Royall RM, (1997) Statistical evidence: a likelihood paradigm. Chapman and Hall, New York

    Google Scholar 

  23. Bi J, Bennett KP, Embrechts M, Breneman C, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243

    Article  MATH  Google Scholar 

  24. Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-norm support vector machines. Adv Neural Inf Process Syst 16:49–56

    Google Scholar 

  25. Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61

    Article  MathSciNet  Google Scholar 

  26. Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288

    MATH  MathSciNet  Google Scholar 

  27. Kendall DG (1984) Shape-manifolds, procrustean metrics and complex projective spaces. Bull Lond Math Soc 16:81–121

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported in part by grants from University of Mississippi, Tulane University, University of New Orleans, and US National Science Foundation (DEB-0237013 to HLB). The authors would like to thank Jinbo Bi for helpful discussions on 1-norm SVM.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yixin Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Huang, S., Chen, H. et al. Joint feature selection and classification for taxonomic problems within fish species complexes. Pattern Anal Applic 13, 23–34 (2010). https://doi.org/10.1007/s10044-009-0157-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-009-0157-y

Keywords

Navigation