Independent Component Analysis for the objective classification of globular clusters of the galaxy NGC 5128

doi:10.1016/j.csda.2012.06.008

Computational Statistics & Data Analysis

Volume 57, Issue 1, January 2013, Pages 17-32

https://doi.org/10.1016/j.csda.2012.06.008 Get rights and content

Abstract

Independent Component Analysis (ICA) is closely related to Principal Component Analysis (PCA) and factor analysis. Whereas ICA finds a set of source data that are mutually independent, PCA finds a set of data that are mutually uncorrelated. The assumption that data from different physical processes are uncorrelated does not always imply the reverse case that uncorrelated data are coming from different physical processes. This is because lack of correlation is a weaker property than independence.

In the present case an objective classification of the globular clusters (GCs) of NGC 5128 has been carried out. Components responsible for significant variation have been obtained through both Principal Component Analysis (PCA) and Independent Component Analysis (ICA) and the classification has been done by $K$ -means clustering. The set of observable parameters includes structural parameters, spectroscopically determined Lick indices and radial velocities from the literature.

We propose that GCs of NGC 5128 consist of two groups. One group originated in the original cluster formation event that coincided with the formation of the elliptical galaxy and the other group emerged from an accreted spiral galaxy. This is unlike the previous result (Chattopadhyay et al., 2009) which accounts for a third group originating from the accretion of tidally stripped dwarf galaxies.

Introduction

For many real life situations the number of variables under consideration as well as the number of observations are very large. In order to analyze such multivariate data, it is necessary to reduce the dimension properly. A smaller dimension is necessary for further analysis like classification or clustering. In statistics, Principal Component Analysis (PCA) is the most popular among the dimension reduction techniques. Although basically PCA is an exploratory technique, for making inferences it is necessary to make a normality assumption regarding the underlying multivariate distribution. The eigenvalues and eigenvectors of the covariance or correlation matrix are the main contributors of a PCA. The eigenvectors determine the directions of maximum variability whereas the eigenvalues specify the variances. In practice, decisions regarding the quality of the Principal Component approximation should be made on the basis of eigenvalue–eigenvector pairs. In order to study the sampling distribution of their estimates the multivariate normality assumptions become necessary as otherwise it is too difficult. Principal components (PCs) are a sequence of projections of the data. The components are constructed in such a way that they are uncorrelated and ordered in variance. The PCs of a $p$ -dimensional data set provide a sequence of best linear approximations. As only a few (say, $m ≪ p$ ) of such linear combinations may explain a larger percentage of variation in the data, one can take only those $m$ components instead of $p$ variables for further analysis.

More recently, independent component analysis (ICA) has emerged as a strong competitor to PCA and factor analysis. ICA was primarily developed for non-Gaussian data in order to find independent components (rather than uncorrelated as in PCA) responsible for a larger part of the variation. ICA separates statistically independent component data, which is the original source data, from an observed set of data mixtures. All information in the multivariate data sets are not equally important. We need to extract the most useful information. ICA extracts and reveals useful information from the whole data set. This technique has been applied in various fields like speech processing, brain imaging, stock predictions etc.

Although ICA has already been used for the analysis of astronomical data (e.g., Maino et al. (2002), Funaroa et al. (2003), Capozziello and Funaro (2005) etc.), till now it has not been used for the purpose of the clustering of data. In the present study we have used both PCA and ICA to reduce the dimension of a data set related to the globular clusters of the galaxy NGC 5128 in order to make an objective classification. A Globular Cluster (GC) is generally a spherical collection of metal-poor stars that orbits a galactic core as a satellite. Studies on GCs are important for the understanding of stellar evolution and the formation of galaxies and the underlying cosmology. Although their origin and formation scenario are speculated to be connected to that of their host galaxy, an elaborate investigation is still in process. The stars in the GCs are among the oldest stars in the galaxy. The brightness and distinctive appearance of GCs make them relatively easy to detect at large distances. Classical formation of galaxies can be divided into five major categories: (1) the monolithic collapse model, (2) the major merger model, (3) the multiphase dissipational collapse model, (4) the dissipationless merger model and (5) accretion and in situ hierarchical merging.

According to the monolithic collapse model, an elliptical galaxy is formed through the collapse of an isolated massive gas cloud at high redshift (Larson, 1975, Carlberg, 1984, Arimato and Yoshii, 1987). In this model, the color distribution of GCs is unimodal, and the rotation of GCs is produced by the tidal force from satellite galaxies (Peebles, 1969). In the major merger model, elliptical galaxies are formed by the merger of two or more disk galaxies (Toomre, 1977, Ashman and Zepf, 1992, Zepf et al., 2000). Younger GCs are formed out of the shocked gas in the disk, while blue GCs come from the halos of the merging galaxies (Bekki et al., 2002). As a result, the color distribution is bimodal. In this scenario, the kinematic properties of the GCs depend weakly on the orbital configuration of the merging galaxies, but the metal-rich GCs are generally located in the inner region of the galaxy, and the metal-poor ones in the outer regions.

The multiphase dissipational collapse has been proposed by Forbes et al. (1997). According to this model, the GCs form in distinct star formation episodes through dissipational collapse. In addition, there is tidal stripping of GCs from satellite dwarf galaxies. Blue (metal-poor) GCs form in the initial phase and red (metal-rich) GCs form from the enriched medium at a later epoch, thus producing a bimodal color distribution of the GCs. This model predicts that the system of blue GCs has no rotation and a high-velocity dispersion, while the red GCs show some rotation depending on the degree of dissipation. Côté et al. (1998) proposed a model in which the GC color bimodality is due to the capture of metal-poor GCs through merger or tidal stripping. The metal-rich GCs are the initial population of GCs in the galaxy and are more centrally concentrated than the captured GCs. The main difference with the previous model is that no age difference is expected between the blue and red GC populations. The very different origins for the two populations imply rather different orbital properties; in particular, the metal-poor GCs should show a larger velocity dispersion than the metal-rich ones, comparable in the outer region to that of the neighboring galaxies.

From the above discussion, it appears that there are kinematic differences among the subpopulations of GCs in different galaxies. These differences can be used as an observational constraint on the galaxy formation model. In the above studies, the GCs are classified as metal-rich and metal-poor on the basis of the value of a single variable [Fe/H] $>$ or $< - 1$ which is subjective in nature and also inappropriate in a multivariate setup. Concentrating on a single variable means that one ignores the joint effect of several parameters.

NGC 5128 (Fig. 1) (Centaurus A) is a prominent galaxy in the constellation of Centaurus. NGC 5128 is one of the closest giant elliptical galaxies to Earth. Its active galactic nucleus has been extensively studied by the professional astronomers (Beasley et al., 2008). In a previous work (Chattopadhyay et al., 2009), we have first used a modified technique of PCA (Salibián-Barrera et al., 2006) to search for the optimum set of parameters which gives maximum variations for the GCs (Fig. 2) in NGC 5128. This can be considered as a robust PCA based on a multivariate MM estimator. In that work, for cluster analysis (CA) we have used two methods; one is based on mixture models (Qiu and Tamhane, 2007) and the other one is $K$ -means (MacQueen, 1967). To find the optimum number of clusters we used the method developed by Sugar and James (2003). The robust PCA method and cluster analysis based on mixture models have been discussed in brief in Appendix A and Appendix B respectively.

Although the above mentioned modified PCA and mixture model based clustering methods are quite robust, they perform better when the sample size is quite large in comparison with the dimension of the data set. But here the number of GCs having values of all 15 variables (parameters) is only 130. Further tests for normality show that the samples are from a non-Gaussian distribution.

In the present study we have done $K$ -means clustering on the basis of ICs as well as PCs to identify the proper method applicable to the present data set on the basis of within cluster sum of squares.

In this paper Section 2 is related to different features of ICA while a theoretical comparison between PCA and ICA has been discussed in Section 3. The data analysis part as well as properties of the three groups and conclusions are illustrated in Section 4.

The explanations of all the structural and photometric parameters used in this paper are listed in Table 1.

Section snippets

Independent component analysis

Suppose there are $n$ observations on each of $p$ correlated variables. Let us denote the data matrix by $X^{n \times p}$ . By singular value decomposition one can write $X = {UDV}^{'}$ . Writing $S = \sqrt{n} U$ and $A^{'} = D V^{'} / \sqrt{n}$ , we have $X = S A^{'}$ and hence each of the columns of $X$ is a linear combination of the columns of $S$ . Now since $U$ is orthogonal and assuming that the columns of $X$ have mean zero, it is easy to show that the columns of $S$ have zero mean, unit variance and they are uncorrelated. In terms of random variables we can

Independent component analysis versus principal component analysis

Both independent component analysis and principal component analysis are used for analyzing large data sets. Whereas ICA finds a set of source data that are mutually independent, PCA finds a set of data that are mutually uncorrelated. ICA was originally developed for separating mixed audio signals into independent sources. In this paper we make the comparison by analyzing GC data.

The purpose of PCA is to reduce the original data set of two or more sequentially observed variables by identifying

Data analysis

For data analysis we have used $R$ software and all the necessary programs are written using $R$ script.

Acknowledgments

The authors are grateful to the referees for their comments which significantly improved the quality of the paper.

References (43)

P. Comon
Independent component analysis, a new concept?
Signal Processing
(1994)
A. Hyva˝rinen et al.
Independent component analysis: algorithms and applications
Neural Networks
(2000)
D. Qiu et al.
A comparative study of the $K$ -means algorithm and the normal mixture model for clustering: univariate case
Journal of Statistical Planning and Inference
(2007)
H. Albazzaz et al.
Statistical process control charts for batch operations based on independent component analysis
Industrial & Engineering Chemistry Research
(2004)
N. Arimato et al.
Chemical and photometric properties of a galactic wind model for elliptical galaxies
Astronomy & Astrophysics
(1987)
K.M. Ashman et al.
The formation of globular clusters in merging and interacting galaxies
The Astrophysical Journal
(1992)
P. Barmby et al.
M31 globular clusters: colors and metallicities
The Astronomical Journal
(2000)
M.A. Beasley et al.
A 2dF spectroscopic study of globular clusters in NGC 5128: probing the formation history of the nearest giant elliptical
K. Bekki et al.
Globular cluster formation from gravitational tidal effects of merging and interacting galaxies
S. Capozziello et al.
Separation of artifacts and events in astrophysical images using independent component analysis
International Journal Of Computational Cognition
(2005)

R.G. Carlberg

Dissipative formation of an elliptical galaxy

The Astrophysical Journal

(1984)

S.A. Cellone et al.

Washington photometry of low surface brightness dwarf galaxies in the Fornax cluster: constraints on their stellar populations

The Astrophysical Journal

(1996)

A.K. Chattopadhyay et al.

Study of NGC 5128 globular clusters under multivariate statistical paradigm

The Astrophysical Journal

(2009)

P. Côté et al.

The formation of giant elliptical galaxies and their globular cluster systems

The Astrophysical Journal

(1998)

D.A. Forbes et al.

On the origin of globular clusters in elliptical and cD galaxies

The Astronomical Journal

(1997)

M. Funaroa et al.

Independent component analysis for artefact separation in astrophysical images

Neural Networks

(2003)

D. Geisler et al.

Washington photometry of the globular cluster system of NGC 4472, I, analysis of the metallicities

The Astronomical Journal

(1996)

J.A. Graham

The structure and evolution of NGC 5128

The Astrophysical Journal

(1979)

T. Hastie et al.

A. Hyva˝rinen et al.

Independent Component Analysis

(2001)

F.P. Israel

Centaurus A — NGC 5128

Astronomy & Astrophysics Review

(1998)

Cited by (24)

Comparison among different Clustering and Classification Techniques: Astronomical data-dependent study
2023, New Astronomy
Citation Excerpt :
In astrostatistics, applications of dimension reduction and clustering techniques are quite common. Chattopadhyay et al. (2013a); Chattopadhyay et al. (2013b, 2012) considered such problems in their work. The paper is organized as follows: In Section 2, we discuss the Supervised and Unsupervised clustering and classification techniques, Section 3 gives an idea about the data set under consideration.
In the field of Astrostatistics, clustering and classification of different astronomical objects play a very important role. In cluster analysis, the objective is to group the items such that items in the same cluster are more closely related than those assigned to different clusters. The total number of clusters in the data set may be known in some cases and maybe unknown in others. There are different methods available for clustering, which can be further categorized under supervised and unsupervised learning techniques. In the case of supervised learning, there are some model assumptions but in the case of unsupervised learning, there are no such assumptions. Under both the above-mentioned categories, for clustering and classification, various methods have been developed depending on the nature of the data sets. However, generally, it is difficult to compare the performances of the different techniques. Here we have tried to compare the applicability of some of the clustering techniques on a galaxy data set. To justify the robustness of the variety of unsupervised methods used in our work, a few post-classification techniques are used as supervised learning. Finally, the comparability of clusters, obtained by different techniques, is studied with respect to an ad-hoc techniqueand they are further justified in terms of astrophysical properties of the galaxies. Our main focus is on unsupervised machine learning algorithms, which are used to perform dimensionality reduction, cluster analysis, visualization and to get an idea regarding the best-unsupervised technique that is appropriate for a galaxy data set. It is found that K-means performs best for the galaxy data set under consideration.
Hierarchical independent component analysis: A multi-resolution non-orthogonal data-driven basis
2016, Computational Statistics and Data Analysis
A new method named Hierarchical Independent Component Analysis is presented, particularly suited for dealing with two problems regarding the analysis of high-dimensional and complex data: dimensional reduction and multi-resolution analysis. It takes into account the Blind Source Separation framework, where the purpose is the research of a basis for a dimensional reduced space to represent data, whose basis elements represent physical features of the phenomenon under study. In this case orthogonal basis could be not suitable, since the orthogonality introduces an artificial constraint not related to the phenomenological properties of the analyzed problem. For this reason this new approach is introduced. It is obtained through the integration between Treelets and Independent Component Analysis, and it is able to provide a multi-scale non-orthogonal data-driven basis. Furthermore a strategy to perform dimensional reduction with a non orthogonal basis is presented and the theoretical properties of Hierarchical Independent Component Analysis are analyzed. Finally HICA algorithm is tested both on synthetic data and on a real dataset regarding electroencephalographic traces.
Bayesian predictive kernel discriminant analysis
2013, Pattern Recognition Letters
Discriminant analysis using Kernel Density Estimator (KDE) is a common tool for classification, but depends on the choice of the bandwidth or smoothing parameter of kernel. In this paper, we introduce a Bayesian Predictive Kernel Discriminant Analysis (BPKDA) eliminating this dependence by integrating the KDE with respect to an appropriate prior probability distribution for the bandwidth. Keypoints of the method are: (1) the formulation of the classification rule in terms of mixture predictive densities obtained by integrating kernel; (2) use of Independent Components Analysis (ICA) to choose a transform matrix so that transformed components are as independent as possible; and (3) nonparametric estimation of the predictive density by KDE for each independent component. Results on benchmark data sets and simulations show that the performance of BPKDA is competitive with, and in some cases significantly better than, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Naives Bayes discriminant Analysis with normal distribution (NNBDA).
Multivariate Analysis of the Globular Clusters in M87
2015, Publications of the Astronomical Society of Australia
Use of Cross-Correlation Function to Study Formation Mechanism of Massive Elliptical Galaxies
2014, Publications of the Astronomical Society of Australia
Investigation of the effect of bars on the properties of spiral galaxies: a multivariate statistical study
2024, Communications in Statistics: Simulation and Computation

View all citing articles on Scopus

View full text

Independent Component Analysis for the objective classification of globular clusters of the galaxy NGC 5128

Abstract

Introduction

Section snippets

Independent component analysis

Independent component analysis versus principal component analysis

Data analysis

Acknowledgments

Signal Processing

Neural Networks

Journal of Statistical Planning and Inference

Statistical process control charts for batch operations based on independent component analysis

Industrial & Engineering Chemistry Research

Chemical and photometric properties of a galactic wind model for elliptical galaxies

Astronomy & Astrophysics

The formation of globular clusters in merging and interacting galaxies

The Astrophysical Journal

M31 globular clusters: colors and metallicities

The Astronomical Journal

A 2dF spectroscopic study of globular clusters in NGC 5128: probing the formation history of the nearest giant elliptical

Globular cluster formation from gravitational tidal effects of merging and interacting galaxies

Separation of artifacts and events in astrophysical images using independent component analysis

International Journal Of Computational Cognition

Dissipative formation of an elliptical galaxy

The Astrophysical Journal

Washington photometry of low surface brightness dwarf galaxies in the Fornax cluster: constraints on their stellar populations

The Astrophysical Journal

Study of NGC 5128 globular clusters under multivariate statistical paradigm

The Astrophysical Journal

The formation of giant elliptical galaxies and their globular cluster systems

The Astrophysical Journal

On the origin of globular clusters in elliptical and cD galaxies

The Astronomical Journal

Independent component analysis for artefact separation in astrophysical images

Neural Networks

Washington photometry of the globular cluster system of NGC 4472, I, analysis of the metallicities

The Astronomical Journal

The structure and evolution of NGC 5128

The Astrophysical Journal

Independent Component Analysis

Centaurus A — NGC 5128

Astronomy & Astrophysics Review