Optimal scaling for survival analysis with ordinal data

doi:10.1016/j.csda.2017.05.008

Computational Statistics & Data Analysis

Volume 115, November 2017, Pages 155-171

https://doi.org/10.1016/j.csda.2017.05.008 Get rights and content

Abstract

Medical and psychological studies often involve the collection and analysis of categorical data with nominal or ordinal category levels. Nominal categories have no ordering property, e.g. gender, with the two unordered covariates male and female. Ordinal category levels, however, have an ordering, e.g. when subjects are classified according to their education level, often categorized as low, medium or high education. When analyzing survival data, currently two methods can be chosen to include ordinal covariates in the Cox proportional hazard model. Dummy covariates can be used to indicate category memberships, as is usually done for nominal covariates. Estimated parameters for each category indicate the increase or decrease in risk of experiencing the event of interest compared to the reference category. Since these parameters are estimated independently from each other, the ordering property of the categories is lost in the process. To keep the ordinal property, integer values can be given to the covariate’s categories (e.g. low $=$ 0, medium $=$ 1, high $=$ 2), and the variable is included in the model as a numeric covariate. However, since the ordinal data are now interpreted as numeric data, the property of equal distances between consecutive categories is introduced. This assumption is too strict for this data type; distances between consecutive categories do not necessarily have to be equal. A method is described to include ordinal data in the Cox model. The method implements optimal scaling to find optimal quantifications for the ordinal category levels. These quantifications are chosen such that they preserve the categories’ ordering, and do not force equal distances between consecutive category levels. A simulation study is carried out to compare the performance of optimal scaling with the performance of the two currently used methods described above. Results show that the optimal scaling method increases the model fit if ordinal covariates are included in the model.

Introduction

In medical and psychological studies a lot of data about patients are collected, for example their gender, age, education level, weight, and socio-economic status. These characteristics can have different measurement levels, namely numeric or categorical. Numeric variables are those variables that are measured on a continuous scale, like age and blood pressure. Categorical variables are not measured on a continuous scale, but instead subjects are assigned to one of the pre-defined category levels. There are two types of categorical data, nominal and ordinal. Category levels of nominal variables are unordered, while the categories of ordinal variables are ordered. Nominal variables seen in medical studies are, for example, gender, treatment group, and ethnicity. Gender has the two unordered categories, male and female. Treatment groups may be defined as treatment A, B and C, or treatment vs. placebo, and these are usually unordered. Ethnicity can have several category levels, depending on the ethnicities of interest, but there is no ordering involved. Examples of ordinal categorical variables are education level, and scales like pain severity scales, Likert scales, or the modified Rankin Scale (mRS). Schools and diplomas may be categorizedinto low, medium and high education levels, which clearly have an ordering. Pain severity scales are used to get an indication of the intensity of a patient’s pain. Likert scales are used to measure how strongly people agree or disagree with a statement, e.g. with response options “strongly disagree”, “disagree”, “I don’t know”, “agree”, and “strongly agree”. The mRS is used to measure the degree of disability or dependence in daily activities of patients who suffer from neurological disabilities, e.g. caused by a stroke (van Swieten et al., 1988). A property of the ordered category levels in ordinal data is that the distances between consecutive category levels do not necessarily represent an equal degree of difference. For example, the mRS score ranges from 0 to 5 where 0 indicates no symptoms and 5 severe disability. There is a slight difference between scores 0 and 1; from no symptoms (0) to no significant disability (1). However, the difference between scores 2 and 3 is large, since it indicates the transition from being functionally independent (2) to being functionally dependent (3).

Researchers may choose between analyzing a specific variable according to its measurement level, or to adjust the scale for analysis. For example, the measurement level of age may be numeric (exact ages of patients are known), but researchers may decide to discretize the covariate and include the resulting age groups in the statistical models instead of the exact ages. Due to this discretization the analysis level is ordinal, while the measurement level was numerical.

In many statistical models a linear combination of predictor variables is used to predict an outcome or response variable. Examples of these types of models are the standard linear model, where the outcome is predicted directly from the linear combination of predictors; generalized linear models, in which the outcome is predicted from the linear model through a link function; and the Cox model in survival analysis, where the linear predictor is included in the hazard function. Models with linear predictors are directly applicable for variables that are analyzed on either a numeric or nominal level. Numeric variables are included in the model, where the coefficients indicate the increase or decrease in risk for every unit increase. For nominal data, $C_{k} - 1$ dummy variables are introduced, where $C_{k}$ represents the number of categories for variable $k$ . The $C_{k} - 1$ estimated model parameters indicate the difference in risk between a category level relative to the reference level.

Complications arise for ordinal categorical data. In most literature on models with linear predictors, no methods on how to fit these models for ordinal data are discussed. Researchers usually use either the nominal or numeric approach. In the nominal approach, dummy variables are introduced and the model is fitted in the same way as for nominal data. However, this method ignores the ordering of the ordinal category levels, since it assumes unordered (nominal) category levels. Therefore, it is not guaranteed that the linear predictor increases (or decreases) with each increase of category level. To keep the monotonicity, one can analyze the ordinal data using a numeric approach. In this case, each category is given an integer value (e.g. 0, 1, 2, etc.), and the variable is then included in the model as a numeric variable. By using the integer coding, equal distances between consecutive categories are assumed, although the distances are not necessarily equal in the data. Hence, unfortunately, neither of these two approaches respect the ordinal categorical data characteristics and are therefore not suitable for analyzing this data type.

To analyze ordinal data, optimal scaling techniques have been developed (Gifi, 1990). This methodology provides an optimal nonlinear transformation of the category levels, such that the relation between the response and the predictors is optimal. In this way, the optimal scaling method turns qualitative data (ordered category levels) into quantitative data (numeric values). The resulting optimal quantifications can be treated as numeric data in the model. The nonlinear optimal quantifications are found by fitting a nonlinear monotone transformation on the original category values. The monotonicity restriction of the transformation guarantees that the ordering of the category levels is maintained and the nonlinearity enables unequal distances between consecutive category levels.

The optimal scaling method was first developed for simple linear models, but was extended to more complicated models that include a linear combination of predictors. Actually, optimal scaling can easily be included in any model that is fitted with a least squares algorithm, as the regression (Meulman and van der Kooij, 2016) and the principal components model Linting et al. (2007), Meulman et al. (2004). Including the optimal scaling step results in an alternating least squares algorithm in which the loss function is iteratively minimized over the model parameters and the optimal scaling quantifications.

The inclusion of optimal scaling is more complicated for models that are fitted with a maximum likelihood approach. This complexity may be the reason why optimal scaling is not yet used to analyze variables on an ordinal level in the Cox proportional hazards model in survival analysis, a model that is fitted by the maximum likelihood method. Currently, researchers include ordinal variables in the model by analyzing them on a nominal or numeric level, and in this way lose the ordering property or introduce equal distances between consecutive categories.

Our research focusses on optimal scaling in survival analysis, and in this article we show how the optimal scaling method can be incorporated in the Cox model. In Section 2 we will first describe how ordinal data are currently included in a Cox model, and how optimal scaling is currently used for simple linear regression. In Section 3, a least squares approach to find the maximum likelihood estimator for the Cox model is described, and optimal scaling is incorporated in this algorithm. In Section 4, the performances of different approaches to fit the Cox model for ordinal data (nominal, numeric and optimal scaling) are compared in a simulation study. The simulation results show that the optimal scaling approach gives the most accurate model fit.

Section snippets

Current practice

In this section we will first describe in more detail the methods currently used to incorporate ordinal data in the Cox proportional hazard model. Then, we will discuss the basic principles of the optimal scaling method by showing an application to the simple linear model.

Optimal scaling in survival analysis

As mentioned above, the optimal scaling procedure can easily be implemented for models that are fitted using a least squares algorithm. For the Cox proportional hazards regression model used in survival analysis the parameters are not fitted with a least squares approach, but by maximizing the partial likelihood. Therefore, the optimal scaling procedure as described for the simple linear regression model cannot be implemented directly.

In this article we propose a least squares approach to fit

Simulation study

A large simulation study was done to investigate the performance of the optimal scaling method for survival analysis proposed in this article. The new method is compared with the two currently used methods: dummy and integer coding. To investigate the performance of the different methods, several scenarios were simulated. We investigated the effect of a non-linear monotone increasing set of model parameters $β_{Z_{0}}, β_{Z_{1}}, \dots, β_{Z_{C - 1}}$ , different sample sizes, and different percentages of censored subjects.

Discussion

In many studies, categorical variables are collected and used as predictors to model an outcome of interest. Often, the category levels of the data have an ordering. In medical research many scales are used to assess the severity of a disease. Pain intensity, quality of life, and modified Rankin scales are just three among a broad range of scales. For many of these scales, it is expected that they have a monotone relation with the time to an event of interest. The modified Ranking Scale (mRS)

References (11)

BreslowN.E.
Contribution to the discussion of paper by D.R. Cox
J. R. Stat. Soc. Ser. B Stat. Methodol.
(1972)
GifiA.
Nonlinear Multivariate Analysis
(1990)
KleinJ.P. et al.
KruskalJ.B.
Nonmetric multidimensional scaling: a numerical method
Psychometrika
(1964)
LintingM. et al.
Nonlinear principal components analysis: introduction and application
Psychol. Methods
(2007)

There are more references available in the full text version of this article.

Cited by (3)

Robust regularization for high-dimensional Cox's regression model using weighted likelihood criterion
2021, Chemometrics and Intelligent Laboratory Systems
Citation Excerpt :
Bradic et al. [11] employed a class of folded-concave penalties to the Cox parametric relative risk model and strong oracle properties of non-concave penalized methods are introduced for nonpolynomial (NP) dimensional data. There is some recent related literature on variable selection and estimation in Cox regression models, (for example, Honda and Härdle [21], Lian et al. [24], Willems et al. [33], Zhang et al. [36] and Fan et al. [16]). Although identification of high leverage points and outliers is a standard practice in regression models, less attention has been given to such issues in high-dimensional Cox’s regression models.
Variable selection for Cox’s proportional hazards regression model has realized extensive use in the analysis of time-to-event data with censoring and predictor variables. These predictors may contain many high leverage points. We study this issue in the context of high-dimensional Cox regression, and propose a novel robust penalized estimator for noisy and non-normal survival data. We make use the appropriate weighting function at each observation in the partial likelihood score equation with the adaptive Lasso penalty on regression coefficients. By using the weighted partial likelihood and l₁-norm, the proposed regularized method is robust to outliers and high leverage points in the predictors. The weight function downweights those observations only if it is necessary, and provide better accuracy and sparsity. The simulation study shows that the proposed regularized method is more robust in estimation and variable selection than the existing penalized methods in the presence of possible high leverage points and heavy-tailed distribution of the response variable. It also yields competitive performance on the two real survival datasets.
Optimal Scaling transformations to model non-linear relations in GLMs with ordered and unordered predictors
2023, arXiv
Estimation in the Cox survival regression model with covariate measurement error and a changepoint
2020, Biometrical Journal

View full text

Optimal scaling for survival analysis with ordinal data

Abstract

Introduction

Section snippets

Current practice

Optimal scaling in survival analysis

Simulation study

Discussion

Contribution to the discussion of paper by D.R. Cox

J. R. Stat. Soc. Ser. B Stat. Methodol.

Nonlinear Multivariate Analysis

Nonmetric multidimensional scaling: a numerical method

Psychometrika

Nonlinear principal components analysis: introduction and application

Psychol. Methods