Non-symmetric correspondence analysis with ordinal variables using orthogonal polynomials

doi:10.1016/j.csda.2006.12.040

Computational Statistics & Data Analysis

Volume 52, Issue 1, 15 September 2007, Pages 566-577

https://doi.org/10.1016/j.csda.2006.12.040 Get rights and content

Abstract

Non-symmetrical correspondence analysis (NSCA) is a useful tool for graphically detecting the asymmetric relationship between two categorical variables. Most of the theory associated with NSCA does not distinguish between a two-way contingency table of ordinal variables and a two-way one of nominal variables. Typically, singular value decomposition (SVD) is used in classical NSCA for dimension reduction. A bivariate moment decomposition (BMD) for ordinal variables in contingency tables using orthogonal polynomials and generalized correlations is proposed. This method not only takes into account the ordinal nature of the two categorical variables, but also permits for the detection of significant association in terms of location, dispersion and higher order components.

Introduction

Scientific investigations, including sensory evaluation experiments, market research and health evaluations, often collect data where variables are measured on an ordinal scale. For such data, consideration of the partition of Pearson's chi-squared statistic can be made and has been central to the study of symmetric association in ordinal two-way contingency tables (Lancaster, 1953, Best and Rayner, 1996, Beh, 1997, Beh, 2001). Correspondence analysis (CA) considers a partition of the chi-squared statistic to graphically describe the association of categorical variables (Greenacre, 1984, Lebart et al., 1984). When there exists a non-symmetric association between ordinal variables, an informative analysis can be made based on the partition of the Goodman–Kruskal tau index. It is the decomposition of this statistic that lies at the heart of non-symmetric correspondence analysis (NSCA; D’Ambra and Lauro, 1989; Kroonenberg and Lombardo, 1999). In this paper we propose a special partition of the tau index (Goodman and Kruskal, 1954, Light and Margolin, 1971) using orthogonal polynomials (D’Ambra et al., 2002). It has the advantage that it takes into account the dependence relationship (if one exists) and the ordinal structure of the variables by considering a pre-defined set of scores to reflect this structure. For an analogous, although symmetric, analysis of the orthogonal polynomials have been used to perform simple CA (Beh, 1997).

The methodology presented in this paper, is referred to as doubly ordinal non-symmetric correspondence analysis (DONSCA). It is designed to allow the visualization of the dependence relationship between categories of a response and a predictor variable. Such a visualization is useful when identifying the structure of this relationship and does so in terms of components that reflect sources of variation in terms of the location (mean), dispersion (spread) and higher order moments. It can be used to identify important characteristics in the behavior of the response variable given the presence of a predictor variable. After a brief description of the tau index numerator and of classical NSCA in Section 2, a presentation of DONSCA will be given for the visual identification of dependence between ordinal variables (Section 3). In Section 4, distance measures and the interpretation of correspondence plots will be investigated. Two examples illustrating the application of the technique will be given in Section 5 and some final comments will be left for the conclusion.

Section snippets

Classic NSCA

Consider a two-way contingency table $N$ of dimension $I \times J$ according to I and J categories of variables Y (response) and X (predictor), respectively. Denote the matrix of joint relative frequencies by $P = (p_{ij})$ so that $\sum_{i = 1}^{I} \sum_{j = 1}^{J} p_{ij} = 1$ . Also, define the diagonal matrix $D_{I}$ where the $(i, i)$ th element $p_{i •}$ is the row's marginal frequency. Similarly, let the $(j, j)$ th element of the diagonal matrix $D_{J}$ of column's marginal frequencies be $p_{• j}$ . The conditional probability that an individual/unit is classified

Doubly ordered NSCA

The classical approach to NSCA described above is especially useful in cases where the predictor and response variables are nominal. However many studies involve variables that are measured using an ordinal scale. When these situations arise, rather than decomposing $π_{ij}$ using SVD, one may consider the bivariate moment decomposition (BMD) $π_{ij} = \sum_{u = 1}^{I - 1} \sum_{v = 1}^{J - 1} z_{uv} a_{iu}^{*} b_{jv}^{*},$ where $a_{iu}^{*} = p_{i •}^{- 1 / 2} {\hat{a}}_{iu}$ . The vectors ${\hat{a}}_{u}$ and $b_{v}^{*}$ are orthogonal polynomials of generic order u and $v$ associated with the row

Distances in DONSCA

One of the primary reasons for considering NSCA when investigating the asymmetric association between the categorical variables is that a graphical summary (by way of the correspondence plot) of the data can be made. This plot allows the researcher to identify row and/or column categories that are relatively similar or different based on their proximity to one another. For the predictor (column) variable such comparisons can be made by observing the squared distances between the profiles of the

Confidence circles for ordinal and classical NSCA

In the framework of classical CA, Lebart et al. (1984) demonstrated the usefulness of confidence circles for displaying CA results. By considering the derivation of these circles, they can also be applied to ordinal symmetric and non-symmetric correspondence analysis (Beh and D’Ambra, 2007). The radii lengths of the circles using BMD are equivalent to those of Lebart et al. (1984) who considered SVD. However, because of the non-symmetric nature of the association between the row and column

Artificial contingency table

Suppose we consider the artificial two-way contingency table of Table 1. Assume that the categories of the row and column variables are ordered.

Table 1 has been constructed so that the $(2, c)$ th element, associated with row 2 and column c, is relatively very large when compared with the value of the other elements in the table. The $(3, b)$ th cell frequency has been set to be relatively small.

By assuming that there exists an asymmetric relationship between the row and column categories of Table 1

Conclusion

Recent papers that describe the use of orthogonal polynomials for correspondence analysis have shown to be an important tool for identifying sources of association that exist in two-way contingency tables with one ordinal variable (Beh, 2001) and two ordinal variables (Beh, 1997,1998) or in three-way contingency tables (Beh and Davy, 1998, Beh and Davy, 1999; D’Ambra et al., 2006).

In this paper we have discussed the development of non-symmetric correspondence analysis using bivariate moment

References (24)

A. Agresti
Categorical Data Analysis
(1990)
E.J. Beh
Simple correspondence analysis of ordinal cross-classifications using orthogonal polynomials
Biometrical Journal
(1997)
E.J. Beh
A comparative study of scores for correspondence analysis with ordered categories
Biometrical Journal.
(1998)
E.J. Beh
Partitioning Pearson's chi-squared statistic for singly ordered two-way contingency tables
Australian New Zealand J. Statist.
(2001)
E.J. Beh et al.
Partitioning Pearson's chi-squared statistic for a completely ordered three-way contingency table
Australian New Zealand J. Statist.
(1998)
E.J. Beh et al.
Partitioning Pearson's chi-squared statistic for a partially ordered three-way contingency table
Australian New Zealand J. Statist.
(1999)
Beh, E.J., D’Ambra, L., 2007. Some interpretative tools for nominal and ordinal non symmetric correspondence analysis,...
D.J. Best et al.
Nonparametric analysis for doubly ordered two-way contingency tables
Biometrics
(1996)
L. D’Ambra et al.
Non-symmetrical correspondence analysis for three-way contingency table
D’Ambra, L., Lombardo, R., 1993. Normalized non symmetrical correspondence analysis for three-way data sets. Bull....

D’Ambra, L., Lombardo, R., Amenta, P., 2002. Non symmetric correspondence analysis for ordered two-way contingency...

L. D’Ambra et al.

CATANOVA for two-way contingency tables with ordinal variables using orthogonal polynomials

Commun. Statist.

(2005)

Cited by (34)

Simple correspondence analysis using adjusted residuals
2012, Journal of Statistical Planning and Inference
Citation Excerpt :
For example, the ordinal correspondence analysis technique of Beh (1997) could be adapted such that bivariate moment decomposition is applied to the residuals (6) as an alternative to singular value decomposition. Non-symmetric correspondence analysis of two cross-classified nominal categorical variables (D'Ambra and Lauro, 1989) or ordinal variables (Lombardo et al., 2007) could also be performed keeping in mind a variation of the adjusted residuals. Computationally, the SPLUS code of Beh (2004b, 2005) – which can also be incorporated into R – can be modified to incorporate the decomposition of adjusted residuals when using them to perform correspondence analysis.
Correspondence analysis is a versatile statistical technique that allows the user to graphically identify the association that may exist between variables of a contingency table. For two categorical variables, the classical approach involves applying singular value decomposition to the Pearson residuals of the table. These residuals allow for one to use a simple test to determine those cells that deviate from what is expected under independence. However, the assumptions concerning these residuals are not always satisfied and so such results can lead to questionable conclusions.
One may consider instead, an adjustment of the Pearson residual, which is known to have properties associated with the standard normal distribution. This paper explores the application of these adjusted residuals to correspondence analysis and determines how they impact upon the configuration of points in the graphical display.
Investigating the European perception of food using moments obtained from non-symmetrical correspondence analysis
2011, Journal of Statistical Planning and Inference
Citation Excerpt :
Section 4 provides a simple example illustrating how these moments may help to identify features of the configuration in a low dimensional plot. More information on the mathematical aspects of non-symmetrical correspondence analysis can be found in D’Ambra and Lauro (1989), Kroonenberg and Lombardo (1999) and Lombardo et al. (2007). It must also be noted that, while we are treating the row categories as forming the response variable and the column categories form the predictor variable, we can also transpose the asymmetric association.
The perception of food in Europe has been a topic of research for many years due to its importance in better understanding the role of food in helping to define the culture of a country. It is also important from a marketing perspective for identifying how consumers relate to food. Recently, this topic was discussed by Guerrero et al. (2010) who used a graphical statistical technique called correspondence analysis to identify the association between the countries that participated in the study and words that were linked with “Traditional” food. This paper explores the use of non-symmetrical correspondence analysis and provides an interpretation of the configuration of points in the graphical display in terms of its first four moments. In particular, we will focus on the skewness and kurtosis of such a configuration. Such measure's provide further detail on the nature of the association between the countries studied and the words linked with “Traditional” food.
A European perception of food using two methods of correspondence analysis
2011, Food Quality and Preference
Citation Excerpt :
For Table 1, we can treat Country as the predictor variable, and determine how it impacts upon the outcome of Word Association. An account of the mathematical and practical issues of this cousin of simple CA was proposed by D’Ambra and Lauro (1989) and discussed further by Kroonenberg and Lombardo (1999), Lombardo, Beh and D’Ambra (2007), and Lombardo, Kroonenberg and D’Ambra (2000). Therefore the reader is invited to consider any of these for more detail on NSCA.
In a recent issue of this journal, Guerrero et al. (2010) studied an interesting data set involving the analysis of consumer-driven associations to the word ‘‘Traditional”, from a food perspective, in six European countries. As part of their analysis, they demonstrated the sources of association between the words studied and the country of origin of those interviewed using correspondence analysis. In this paper, we focus on this association by assuming that the country of origin is a predictor of the words associated with “Traditional”. This analysis is performed using another member of the correspondence analysis family – non-symmetric correspondence analysis. This paper will also explore the use of both these correspondence analysis techniques on their data and consider the dendrogram and the semantic differential plot as alternative approaches to visually summarising the association.
Special issue on correspondence analysis and related methods
2009, Computational Statistics and Data Analysis
Variants of non-symmetric correspondence analysis for nominal and ordinal variables
2024, Journal of the Korean Statistical Society
An Introduction to Correspondence Analysis
2021, An Introduction to Correspondence Analysis

View all citing articles on Scopus

View full text

Non-symmetric correspondence analysis with ordinal variables using orthogonal polynomials

Abstract

Introduction

Section snippets

Classic NSCA

Doubly ordered NSCA

Distances in DONSCA

Confidence circles for ordinal and classical NSCA

Artificial contingency table

Conclusion

Categorical Data Analysis

Simple correspondence analysis of ordinal cross-classifications using orthogonal polynomials

Biometrical Journal

A comparative study of scores for correspondence analysis with ordered categories

Biometrical Journal.

Partitioning Pearson's chi-squared statistic for singly ordered two-way contingency tables

Australian New Zealand J. Statist.

Partitioning Pearson's chi-squared statistic for a completely ordered three-way contingency table

Australian New Zealand J. Statist.

Partitioning Pearson's chi-squared statistic for a partially ordered three-way contingency table

Australian New Zealand J. Statist.

Nonparametric analysis for doubly ordered two-way contingency tables

Biometrics

Non-symmetrical correspondence analysis for three-way contingency table

CATANOVA for two-way contingency tables with ordinal variables using orthogonal polynomials

Commun. Statist.