Elsevier

Ecological Informatics

Volume 66, December 2021, 101473
Ecological Informatics

Generalizing resemblance coefficients to accommodate incomplete data

https://doi.org/10.1016/j.ecoinf.2021.101473Get rights and content
Under a Creative Commons license
open access

Highlights

  • A general framework is provided to redefine resemblance coefficients for incomplete data.

  • Coefficients that consider double absences in abundance data are included.

  • An R Funcion and a stand-alone Windows application are provided for potential users.

Abstract

Large ecological data matrices may be incomplete for various reasons, preventing the use of standard multidimensional scaling (ordination) and cluster analysis packages. Although there exist a few resemblance functions that allow missing scores, there is no theoretical background and software support for most distance and similarity coefficients potentially applied in multivariate data analysis. We provide a general framework for a precise mathematical redefinition of a large set of resemblance functions originally developed for complete data sets with presence-absence (binary) or ratio-scale variables. Included are coefficients which consider double absences in abundance data. Potential problems with the use of these functions are discussed, with the conclusion that incompleteness of data would rarely if ever influence greatly the interpretability of ordinations and classifications. An R function described in the Appendix represents a link to R. We also provide a stand-alone WINDOWS application for users of other computer programs. The new software will allow users of standard data analysis packages to perform multivariate analysis using a wide variety of resemblance coefficients even if the data are incomplete for whatever reason.

Keywords

Cluster analysis
Distance
Dissimilarity
Missing data
Ordination
Similarity

Cited by (0)