A unifying model involving a categorical and/or dimensional reduction for multimode data

https://doi.org/10.1016/j.csda.2007.03.001Get rights and content

Abstract

A unifying model is presented that implies a categorical and/or dimensional reduction of one or several modes of a multiway data set. The model encompasses a broad range of (existing as well as to be developed) discrete, continuous, as well as hybrid discrete–continuous reduction models as special cases, which all imply a decomposition of the reconstructed data on the basis of quantifications of the different data modes and a linking array. An analysis of the objective or loss function associated with the model leads to two generic algorithmic strategies, the possibilities and limitations of which are the object of a subsequent discussion.

Introduction

Data that imply one or more sets of entities (or modes) with a large number of elements (experimental units, variables, time points, others) imply a major challenge for the data analyst. This is even more the case if the data pertain to more than two modes, that is, if they are multiway multimode in nature. The complexity of the information as present in such data may be tremendous. In order to grasp it in a proper way, the data analyst may wish to subject one or more of the data modes to a (simultaneous) reduction. Reduction is here to be understood either in a categorical sense, in that the elements of the reduced mode are grouped into a small number of clusters (which may be overlapping or not, and which may cover the full mode or not), or in a dimensional sense, in that the elements of the reduced mode are represented as points in a lowdimensional space. A simultaneous reduction further can be purely categorical, that is, categorical for all reduced modes, purely dimensional, that is, dimensional for all reduced modes, or hybrid, that is, categorical for some of the reduced modes and dimensional for the other ones. Purely categorical reduction models can be amply found in the clustering domain, examples including one-mode partitioning models (such as k-means type models and all kinds of one-mode mixture models, e.g., McLachlan and Chang, 2004), two-mode clustering (or biclustering) models (such as two-mode hierarchical and additive clustering models, Furnas, 1980, Gaul and Schader, 1996, and two-mode hierarchical classes models, De Boeck and Rosenberg, 1988, Van Mechelen et al., 1995), as well as their multimode generalizations (e.g., Ceulemans and Van Mechelen, 2005, Eckes and Orlik, 1994). Pure dimension reduction models can be amply found in the domain of component and factor analysis, examples including the standard two-mode principal component model and its multimode generalizations (such as PARAFAC/CANDECOMP and the family of N-mode Tucker models, e.g., Kroonenberg, 1983). Examples of hybrid models include various projection pursuit type clustering methods (e.g., Bock, 1987, Vichi and Kiers, 2001), cluster differences scaling (Heiser and Groenen, 1997), and cluster unfolding (De Soete and Heiser, 1993).

The family of categorical and dimensional reduction models for multimode data clearly is very large in number. Moreover, it is also fairly heterogeneous, both in terms of the mathematical structures implied by the different models and by the principles and methods used in the associated data analysis. In the present paper, we will contribute to a clarification of this situation by introducing a unifying model that encompasses a broad range of (existing as well as to be developed) discrete, continuous and hybrid reduction models as special cases. The to be proposed unifying model considerably extends the already very broad CANDCLUS and MUMCLUS models as proposed by Carroll and Chaturvedi (1995), with this extension including a much broader family of decomposition functions other than (generalized) Cartesian products, room for various types of modeling constraints, and room for a possible addition of distributional assumptions. An analysis of the objective or loss function associated with the unifying model will further lead to two generic algorithmic strategies, the possibilities and limitations of which are the object of a subsequent discussion.

The remainder of this paper is organized as follows: In Section 2 we will introduce the type of data under study, along with a few associated concepts. In Section 3 we will introduce our unifying reduction model. The associated objective or loss function will be dealt with in Section 4 and the algorithmics in Section 5. Section 6 will present a general discussion.

Section snippets

Data

Data arrays can have different conceptual structures. In order to typify the various cases, Carroll and Arabie (1980) have introduced some terminology (which in turn relies on work by Tucker, 1964). To use this terminology, a data set is conceived as a mapping D from a Cartesian product S=S1×S2××SN of N sets S1,,SN to some (typically univariate) domain Y: for any N-tuple (s1,s2,,sN) with s1S1,,sNSN a value D(s1,s2,,sN) from Y is recorded. The total number N of constituent (possibly

Model

Assume a real-valued (I1××In××IN) N-way N-mode data array D (i.e., a mapping D:S1×S2××SNR, with #Sn=In) with entries di1iniN. The unifying reduction model we propose for D includes a deterministic heart and optional additional stochastic assumptions. We will now successively introduce both. Subsequently, we will discuss how various existing reduction models show up as special cases of the unifying model.

Criterion to be optimized in the data analysis

In the deterministic case, the objective or loss function l to be minimized in the data analysis will typically be of the least Lp type,l(A1,,AN,W)=i1,,in,,iN|di1iniN-(f(A1,,AN,W)i1iniN)|p,with, in case p=2:l(A1,,AN,W)=i1,,in,,iN[di1iniN-(f(A1,,AN,W)i1iniN)]2.

In the stochastic case, the objective function to be maximized will be the likelihood. In this regard, it may be useful to note that for models of real-valued data maximizing the likelihood is equivalent to minimizing the

Two propositions

A first proposition focuses on the core array W:

Proposition 1

If W is real-valued and unconstrained, f equals a generalized Cartesian product (3), and the loss function equals (15), then the conditionally optimal W, given component matrices (A1,,An,,AN), can be expressed as a closed form function of the component matrices, g(A1,,An,,AN).

Proof

The conditionally optimal W can be considered a set of regression weights in the prediction of the vectorized data on the basis of P1××Pn××PN predictor vectors that

Discussion

In this paper we introduced a novel unifying model for multimode data based on two key components: (a) the elements of each of the modes involved in the data are reduced to either points in a lowdimensional space or to elements of a limited set of (possibly overlapping) clusters, and (b) the connection between the dimensions and clusters to which each of the data modes are reduced is captured by a linking array. The reduction for each of the modes is further such that the coordinates or cluster

Acknowledgments

Work on this paper has been supported by the Fund for Scientific Research—Flanders (project G.0146.06) and by the Research Fund of K.U. Leuven (GOA/2005/04 and EF/05/007). The authors gratefully acknowledge Henk Kiers and Eva Ceulemans for their useful comments on a previous version of this manuscript.

References (32)

  • J. Schepers et al.

    Three-mode partitioning

    Comput. Stat. Data Anal.

    (2006)
  • M. Vichi et al.

    Factorial k-means analysis for two-way data

    Comput. Stat. Data Anal.

    (2001)
  • H.-H. Bock

    Simultaneous clustering of objects and variables

  • H.-H. Bock

    On the interface between cluster analysis, principal component analysis, and multidimensional scaling

  • R. Bro

    Multi-way analysis in the food industry. Models, algorithms and applications. Unpublished doctoral dissertation

    (1998)
  • J.D. Carroll et al.

    Multidimensional scaling

    Annu. Rev. Psychol.

    (1980)
  • J.D. Carroll et al.

    A general approach to clustering and multidimensional scaling of two-way, three-way, or higher-way data

  • E. Ceulemans et al.

    Hierarchical classes models for three-way three-mode binary data: interrelations and model selection

    Psychometrika

    (2005)
  • E. Ceulemans et al.

    Tucker3 hierarchical classes analysis

    Psychometrika

    (2003)
  • E. Ceulemans et al.

    Adapting the formal to the substantive: constrained Tucker3-HICLAS

    J. Classification

    (2004)
  • C.H. Coombs

    A Theory of Data

    (1964)
  • P. De Boeck et al.

    Hierarchical classes: model and data analysis

    Psychometrika

    (1988)
  • W.S. DeSarbo et al.

    Three-way metric unfolding via alternating weighted lesat squares

    Psychometrika

    (1985)
  • G. De Soete et al.

    A latent class unfolding model for analyzing single stimulus preference ratings

    Psychometrika

    (1993)
  • T. Eckes et al.

    Three-mode hierarchical cluster analysis of three-way three-mode data

  • G.W. Furnas

    Objects and their features: the metric analysis of two-class data. Unpublished doctoral dissertation

    (1980)
  • Cited by (10)

    • A Framework for Low-Level Data Fusion

      2019, Data Handling in Science and Technology
      Citation Excerpt :

      Subsequently, we describe a few existing examples of our generic proposal. The first ingredient of our framework is a submodel for each data block as described in more detail elsewhere [28]. This submodel is made of two parts: quantifications of the modes per data block and association rules that define how these quantifications can be combined to model each block.

    • A generic linked-mode decomposition model for data fusion

      2010, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      An extension to the N-way N′-mode case is rather straightforward and will be briefly touched upon below.) The submodel for data block B is subsumed by a unifying model as proposed by Ref. [14]. The heart of this unifying model is deterministic in nature; yet, optionally, the deterministic heart can be extended with a stochastic error model to represent discrepancies between the actual entries in the data and the corresponding reconstructed entries in the deterministic heart of the model (for one possible general procedure to build a stochastic extension of a deterministic model, see Ref. [15]).

    • Simultaneous analysis of coupled data blocks differing in size: A comparison of two weighting schemes

      2009, Computational Statistics and Data Analysis
      Citation Excerpt :

      From a data-analytic viewpoint, this implies that a global model is needed in which the different data blocks, one block per piece of information, are analyzed simultaneously. In the present paper, in this regard, global models will be considered that consist of different submodels, one for each data block, with each submodel implying a (dimensional or categorical) quantification of all modes of the corresponding data block (Van Mechelen and Schepers, 2007). Further, only global models, consisting of different submodels, will be considered in which each common mode of the coupled data is represented by a single quantification, which is the same for all submodels of the global model that mode belongs to.

    • Algorithms for additive clustering of rectangular data tables

      2008, Computational Statistics and Data Analysis
    • Block clustering with Bernoulli mixture models: Comparison of different approaches

      2008, Computational Statistics and Data Analysis
      Citation Excerpt :

      These procedures differ in the patterns they seek, the types of data they apply to, and the assumptions on which they rest. In particular we should mention the work of Hartigan (1975), Bock (1979), Garcia and Proth (1986), Marchotorchino (1987), Govaert (1983, 1984, 1995), Arabie and Hubert (1990), Duffy and Quiroz (1991) and Mechelen and Schepers (2007), all of whom have proposed algorithms dedicated to different kinds of matrices. In recent years block clustering has become an important challenge in data mining.

    • Statistical Learning Methods Including Dimensionality Reduction

      2007, Computational Statistics and Data Analysis
    View all citing articles on Scopus
    View full text