Elsevier

Ecological Informatics

Volume 3, Issue 6, December 2008, Pages 387-396
Ecological Informatics

Using neural networks to detect patterns in inter-specific data: An example from net-spinning caddisflies (Trichoptera: Annulipalpia)

https://doi.org/10.1016/j.ecoinf.2008.08.001Get rights and content

Abstract

We introduce neural networks (NNs) as a method for detecting patterns and visually comparing multivariate inter-specific morphological data. Neural networks have relatively relaxed statistical assumptions, do not require a phylogeny, and can collapse multivariate data sets into two dimensions. The NN converts the multivariate data into vectors which are then plotted in two dimensions on a self-organizing map. Self-organizing maps visually display any hidden patterns in the data uncovered by the NN. We used a NN to study multivariate sexual dimorphism in 40 species of adult net-spinning caddisflies (Trichoptera: Annulipalpia) from North America. Utilizing eight morphological traits of adult caddisflies, the NN accurately predicted phylogenetic structure (accuracy rate: family = 92%; genus = 82%; species = 72%) and sexual dimorphism (80%) based solely on morphology. Leg traits were most important in discriminating among families and sexes whereas antennal length and eye width were most important for predicting genus and species. Overlaying the self-organizing map on the phylogenetic tree indicated that sexual dimorphism is widespread among net-spinning caddisfly taxa. Our neural network can be used to detect patterns in inter-specific biological data from any set of organisms. Future work should be aimed at developing NNs as a tool in evolutionary biology.

Introduction

Over the past 20+ years, many statistical techniques have been developed to compare inter-specific data (i.e., phylogenetic comparative methods = PCMs). As currently employed, PCMs are extensions of linear statistics well known to ecologists and evolutionary biologists (e.g., ANOVA, linear regression) that are used to compare data among closely related species without violating the assumptions of the standard statistical techniques (Bell, 1989, Freckleton et al., 2002, Harvey and Pagel, 1991, Cheverud et al., 1985, Felsenstein, 1985, Garland et al., 1992, Grafen, 1989, Huelsenbeck et al., 2000, Lynch, 1991, Martins et al., 2002, Martins and Hansen, 1996). For example, ANOVA and regression require that each data point represent an independent observation. Direct comparisons among species using either method violate this assumption because species are evolutionarily related and therefore are not independent observations (i.e., pseudo-replication). PCMs overcome the problem of non-independence by using phylogenetic information to create a new set of biological data based on the original data. The new, ‘phylogenetically-corrected’ data set is independent of phylogeny and can be used in the context of ANOVA and regression without violating the assumption of independence (Felsenstein, 1985, Grafen, 1989, Martins et al., 2002).

PCMs are the best tool for comparing biological data among different species and, accordingly, PCMs have become the standard statistical tool. However, PCMs have some limitations. In this paper, we introduce neural networks as an alternative to PCMs when statistical or biological assumptions cannot be met (see below) or as a complementary analysis to supplement results from a PCM analysis. Although PCMs are the best tool for inter-specific analyses, the assumptions underlying PCMs often cannot be met with the biological data at hand, and we suggest that neural networks provide a second-best option under these conditions. Two problems arise when using PCMs. First, most PCMs require at least partial, a priori information about phylogenetic relationships. For many taxa, phylogenetic information is inadequate or unavailable. In these cases, PCMs can provide a distorted picture of inter-specific variation or cannot be applied at all. Second, PCMs make non-trivial assumptions which are rarely addressed (Martins, 2000). For example, assumptions are made about the type of evolutionary change being modelled (Felsenstein, 1985), the time since divergence between phylogenetic groups (Felsenstein, 1985, Grafen, 1989), and the underlying relationships among traits (Quader et al., 2004). These assumptions represent fundamental biological processes and therefore exert strong influence over the interpretation of PCM results; however, these assumptions are rarely tested or discussed and are at the centre of current controversy over the use of PCMs (Martins, 2000). Below, we introduce and employ neural networks for pattern recognition, visualization, and simplification multidimensional morphological data when PCMs cannot be used, i.e., when phylogenies are inaccurate or missing or assumptions are violated. Our goal is two-fold: 1) to provide comparative biologists with another potential tool for analyzing large, complex data sets; and 2) to stimulate research on the use of neural networks in evolutionary biology.

Neural networks (NNs) are made up of interconnected processing elements (i.e., “neurons”), which respond in parallel to a set of input signals (i.e., data). NNs have been used to explore biological systems mainly in two ways: 1) as models of biological nervous systems and; 2) as data analytic methods (Sarle, 1994). We suggest here that NNs can be a useful tool for pattern recognition and visualization of multivariate inter-specific morphological data. Neural networks are highly robust to underlying data distributions and make no assumptions about the independence of data points or relationships among variables (Bishop, 1995). Second, in addition to describing inter-specific patterns of phenotypic variation, hidden structure within the data, such as phylogenetic affinities or gender classification, emerge in the output. In this way, NNs utilize continuous variables to predict categorical membership (e.g., taxon, sex). To this end, NNs are widely used as categorization tools (Bishop, 1995, Haykin, 1999) and could be used by biologists to provide guidance in placing individuals or species within appropriate taxonomic groups. Finally, NNs are similar to other multivariate techniques such as principal component analyses (Sarle, 1994) and can be used to collapse complex multivariate data sets into two-dimensional space. The advantage is that complex phenotypic data (e.g., morphological geometry) can be collapsed and scaled so that taxa can be directly compared. To date, NNs have received limited application in biological categorization and pattern recognition (although, see Dopazo and Carazo, 1997). As far as we are aware, no other studies have used NNs to examine continuous morphological data among closely related species.

We used Kohonen self-organizing maps (a type of NN) to study patterns of multivariate sexual dimorphism in 40 species of adult caddisflies whose aquatic larvae spin silken nets to capture food particles in streams and rivers of North America (Trichoptera: suborder Annulipalpia). We chose this group of caddisflies because qualitative studies suggest that different species of these caddisflies might show alternative patterns of adult morphological variation between sexes (a.k.a., sexual dimorphism) (Betten, 1934, Ross, 1944), signifying that diverse mechanisms might underlie this variation. These reports indicate that adult traits involved in dispersal, mating, and oviposition (Deutsch, 1985, Gullefors and Petersson, 1993, Petersson, 1989, Petersson, 1990, Petersson and Solem, 1987) are sexually dimorphic in many species (wings, antennae, tibiae, eyes: Betten, 1934, Deutsch, 1985, Malicky, 1977, Ross, 1944). Although information on larval caddisfly ecology is extensive (Wiggins, 1996), few data are available on adult ecology, behaviour, and reproduction (Halat and Resh, 1997, Plague, 1999). We have previously quantified patterns of body size dimorphism in these species (Jannot and Kerans, 2003). Our goals are to: 1) demonstrate the use of NNs in biological pattern recognition and categorization by 2) quantifying patterns of multivariate sexual dimorphism, and thus 3) provide a basis for generating hypotheses about the ecology and evolution of sexual dimorphism in North American caddisflies.

Section snippets

Specimens and measurements

We have described our taxonomic choices and methods in detail elsewhere (Jannot and Kerans, 2003); therefore, we provide only a brief synopsis here. Adult caddisflies preserved in ethanol were obtained from museums and collections in North America. When possible, we measured a minimum of 20 individuals per sex; however, sample sizes varied among species and sexes, depending upon availability (see Table 1, Table 2). All measurements and sex determinations were made with a zoom stereoscopic

Results

Even though the map represents real, consistently obtainable relationships between vectors in the data set, the procedure for obtaining the map is stochastic. Thus, the position of the clusters on the map can change even as the composition of clusters remains consistent. Therefore, it is useful to investigate the structure of several maps of the same data to discover what features of the Kohonen map are consistent. We examined five independent runs of the Kohonen network and each gave

Discussion

We have shown that Kohonen networks can provide interesting and useful data concerning inter-specific morphological pattern recognition. Kohonen networks were able to accurately predict the sex and taxonomic grouping of individual caddisflies and proved useful in highlighting the importance of single morphological variables in predicting sex or taxa. Overlaying the Kohonen network maps onto our current phylogenetic hypothesis of caddisfly evolution provided striking patterns of

Acknowledgements

We would like to thank the museum curators who provided us with specimens (list can be found in Jannot and Kerans, 2003). Earlier versions of this manuscript benefited from the comments of A. Boyko and 2 anonymous reviewers. J.E.J., O.A. and K.C. were supported by the NSF Cross-disciplinary Research at Undergraduate Institutions (CRUI) grant # DBI-0442412 to D. Whitman, D. Borst, S. A. Juliano, O. Akman.

References (36)

  • MartinsE.P.

    Adaptation and the comparative method

    Trends Ecol. Evol.

    (2000)
  • BellG.

    A comparative method

    Am. Nat.

    (1989)
  • BettenC.

    The caddisflies of New York

    N.Y. State Mus. Bull.

    (1934)
  • BishopC.M.

    Neural Networks for Pattern Recognition

    (1995)
  • CheverudJ.M. et al.

    The quantitative assessment of phylogenetic constraints in comparative analyses: sexual dimorphism in body weight among primates

    Evolution

    (1985)
  • DeutschW.G.

    Swimming modifications of adult female Hydropsychidae compared with other Trichoptera

    Freshw. Invertebr. Biol.

    (1985)
  • DopazoJ. et al.

    Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree

    J. Mol. Evol.

    (1997)
  • FelsensteinJ.

    Phylogenies and the comparative method

    Am. Nat.

    (1985)
  • FreckletonR.P. et al.

    Phylogenetic analysis and comparative data: a test and a review of evidence

    Am. Nat.

    (2002)
  • GarlandT. et al.

    Procedures for the analysis of comparative data using phylogenetically independent contrasts

    Syst. Biol.

    (1992)
  • GrafenA.

    The phylogenetic regression

    Philos. trans. R. Soc. Lond., B

    (1989)
  • GulleforsB. et al.

    Sexual dimorphism in relation to swarming and pair formation patterns in Leptocerid caddisflies (Trichoptera: Leptoceridae)

    J. Insect Behav.

    (1993)
  • Halat, K.M., Resh, V.H., 1997. Biological studies of adult Trichoptera: topics location and organisms examined. In:...
  • HarveyP.H. et al.

    The Comparative Method in Evolutionary Biology

    (1991)
  • HaykinS.

    Neural Networks: A Comprehensive Foundation

    (1999)
  • HuelsenbeckJ.P. et al.

    Accommodating phylogenetic uncertainty in evolutionary studies

    Science

    (2000)
  • JannotJ.E. et al.

    Body size, sexual size dimorphism and Rensch's rule in adult hydropsychid caddisflies (Trichoptera: Hydropsychidae)

    Can. J. Zool.

    (2003)
  • KjerK.M. et al.

    Phylogeny of Trichoptera (caddisflies): characterization of signal and noise within multiple datasets

    Syst. Biol.

    (2001)
  • View full text