Geometric multidimensional scaling: A new approach for data dimensionality reduction
Introduction
Multidimensional scaling (MDS) is a well-known option for dimensionality reduction of multidimensional data and for visual analysis of the data [1], [2]. Recently, MDS remains the main method of this class and finds applications in different areas: traffic jam prediction [3], analysis of the evolution of EEG coherence networks [4], face recognition [5], analysis of regional economic development [6], image graininess characterization [7]. These are several examples among lots of ones only. Related fields are neural networks [8], intrinsic dimensionality [9], relative mapping [10], etc. Multidimensional scaling is realized in various data mining tools (see [11], [12]).
The multidimensional data set is an array of n-dimensional data points . Usually, data point is the result of observation of some object or phenomenon dependent on n features or parameters. Dimensionality reduction problem looks for the coordinates of new points in a lower-dimensional space (d < n). If d ≤ 3, dimensionality reduction results may be presented visually for human decision. MDS tries to hold the proximities dij between pairs of multidimensional points Xi and as much as possible. Proximity dij can be the distance between a pair of points Xi and Xj, but other notions of proximity are available [1]. Distance is a proximity that indicates how two objects Xi and Xj are dissimilar.
An example of dimensionality reduction using MDS is presented in Fig. 1 visualizing 9-dimensional breast cancer data (for details, see description of the data in Dzemyda et al. [1]). A large amount of the points, corresponding to the benign tumor data, are concentrated in one area on the right side of the picture, and the other points, corresponding to the malignant tumor data, are spread widely. This indicates a wide variety of malignant tumor cases.
Data for MDS is the symmetric m × m matrix of proximities. Minkowski distance is sufficiently general measure of the proximity between two points Xi and Xj:
Particular cases of (1) are the city-block or Manhattan distance () and the Euclidean distance ().
MDS looks for coordinates of points Yi representing Xi in a lower-dimensional space by minimizing some target function - the stress function. Several variants of MDS with different stress function have been proposed in the literature (see review in Dzemyda et al. [1]) starting from the least squares with further variations seeking the less dependence of the resulting stress value on the absolute magnitude of the proximities (dissimilarities). So-called normalized stress Stress-norm and Stress-1 are suggested to decrease the dependence of results on the size of configuration of [2]:where wij are optional a priori weights for dissimilarities, is the Euclidean distance between points Yi and Yj in a lower dimensional space.
In this paper, we use the raw stress function [13]:It is shown in the next section that the Geometric MDS suggested in this paper does not depend on the scales of dissimilarities and, therefore, may use a much simpler stress function like (2).
In dimensionality reduction, the optimization problem is to find optimal coordinates of points :
The minimization problem (3) can be solved using local descent methods, e.g., Quasi-Newton or conjugate gradient methods [14]. The stress function S( · ) has many local minima, often, when 1 ≤ d < n. Therefore, local descent cannot guarantee a global minimum. Attempts for global minimization are computationally expensive and often find the local minimum only. That leads to the conclusion that the classical approaches [15], [16], [17] to minimize the stress leave the space for new findings in MDS theory.
In this paper, the stress function and multidimensional scaling, in general, have been considered from the geometric point of view. The so-called Geometric MDS has been developed. It creates a basis for a new class of algorithms to minimize the MDS stress.
Section snippets
Geometric MDS
A new approach, Geometric MDS, for minimization of the stress function (2) is presented in this section. Let the proximities between n-dimensional points be defined by m × m matrix . The aim is to solve the problem (3) and to find the optimal d-dimensional points .
Let’s have some initial configuration of points . The main idea of Geometric MDS focuses on optimizing the position of one chosen point (let it be Yj) when the
Multimodality of the local stress function of Geometric MDS
Proposition 8 The local stress function S*( · ) defined by (4) could be multimodal for dimensionality 1 ≤ d < ∞. Proof Consider such a set of d-dimensional points where and coordinates of are all possible 2d permutations of values and . These points are projections of some n-dimensional points . Let the local stress function depends on the pointwhere . Fig. 4 illustrates a symmetric case of function S*(Y6) where . Here we have
Realizations of Geometric MDS
Two algorithms realizing the idea of Geometric MDS are presented below. They are presented as a program in pseudocode. However, we do not optimize the performance of these algorithms seeking maximal clarity of the presented ideas.
The simplest way to minimize the stress S( · ) by Geometric MDS is a consecutive (one-step or multi-step) changing of positions of points many times. This idea is realized in Algorithm 1, where the stress is minimized, namely by a consequent changing the
Experiments
The methodology of evaluation of the efficiency of Geometric MDS focuses on a comparison of several its realizations among themselves and with well-known multidimensional scaling using majorization (SMACOF). Algorithm 1 is a direct realization of Geometric MDS ideas. Algorithm 2 is some generalization for the global minimization of the stress. The deepest investigation is based on Algorithm 1 because it is comparable with SMACOF – the same starting low-dimensional data set may be fixed, both
Conclusions
In this paper, the stress function and multidimensional scaling, in general, have been considered from the geometric point of view. A new strategy (Geometric MDS) for MDS stress minimization has been developed. The new interpretation of the stress allows finding the proper step size and the descent direction forwards the minimum of the stress function analytically if we consider a separate point of the projected space. The exceptional property of the new approach is that we do not need the
Acknowledgements
This research has received funding from the Research Council of Lithuania (LMTLT), agreement no S-MIP-20-19.
References (20)
- et al.
A new web-based solution for modelling data mining processes
Simul. Model. Pract. Theory
(2017) - et al.
Multidimensional data visualization: methods and applications
Springer Optimization and its Applications
(2013) - et al.
Applied Multidimensional Scaling and Unfolding
(2018) - et al.
Multidimensional scaling and application in traffic jam prediction
Appl. Mech. Mater.
(2013) - et al.
Visual analysis of evolution of eeg coherence networks employing temporal multidimensional scaling
VCBM 18: Eurographics Workshop on Visual Computing for Biology and Medicine
(2018) - et al.
Low-resolution face recognition and feature selection based on multidimensional scaling joint l 2, 1-norm regularisation
IET Biom.
(2019) - et al.
Visualization of data: methods, software, and applications
- et al.
Graininess characterization by multidimensional scaling
J. Mod. Opt.
(2019) - et al.
Efficient data projection for visual analysis of large data sets using neural networks
Informatica
(2011) - et al.
Geodesic distances in the intrinsic dimensionality estimation using packing numbers
Nonlinear Anal.
(2014)
Cited by (11)
Preface to the virtual special issue recent developments in applied mathematics and computation
2021, Applied Mathematics and ComputationBearing fault diagnosis based on shared neighbors weighted local linear embedding
2024, Jiangsu Daxue Xuebao (Ziran Kexue Ban)/Journal of Jiangsu University (Natural Science Edition)Geometric multidimensional scaling: efficient approach for data dimensionality reduction
2024, Journal of Global OptimizationLIBS-MLIF Method: Stromatolite Phosphorite Determination
2023, ChemosensorsEnhancement of surface handwriting on artifacts based on manifold learning and spectral unmixing: a case study of Cave 38 of Yungang Grottoes
2023, Sciences of Conservation and Archaeology