Independent Component Analysis for the objective classification of globular clusters of the galaxy NGC 5128
Introduction
For many real life situations the number of variables under consideration as well as the number of observations are very large. In order to analyze such multivariate data, it is necessary to reduce the dimension properly. A smaller dimension is necessary for further analysis like classification or clustering. In statistics, Principal Component Analysis (PCA) is the most popular among the dimension reduction techniques. Although basically PCA is an exploratory technique, for making inferences it is necessary to make a normality assumption regarding the underlying multivariate distribution. The eigenvalues and eigenvectors of the covariance or correlation matrix are the main contributors of a PCA. The eigenvectors determine the directions of maximum variability whereas the eigenvalues specify the variances. In practice, decisions regarding the quality of the Principal Component approximation should be made on the basis of eigenvalue–eigenvector pairs. In order to study the sampling distribution of their estimates the multivariate normality assumptions become necessary as otherwise it is too difficult. Principal components (PCs) are a sequence of projections of the data. The components are constructed in such a way that they are uncorrelated and ordered in variance. The PCs of a -dimensional data set provide a sequence of best linear approximations. As only a few (say, ) of such linear combinations may explain a larger percentage of variation in the data, one can take only those components instead of variables for further analysis.
More recently, independent component analysis (ICA) has emerged as a strong competitor to PCA and factor analysis. ICA was primarily developed for non-Gaussian data in order to find independent components (rather than uncorrelated as in PCA) responsible for a larger part of the variation. ICA separates statistically independent component data, which is the original source data, from an observed set of data mixtures. All information in the multivariate data sets are not equally important. We need to extract the most useful information. ICA extracts and reveals useful information from the whole data set. This technique has been applied in various fields like speech processing, brain imaging, stock predictions etc.
Although ICA has already been used for the analysis of astronomical data (e.g., Maino et al. (2002), Funaroa et al. (2003), Capozziello and Funaro (2005) etc.), till now it has not been used for the purpose of the clustering of data. In the present study we have used both PCA and ICA to reduce the dimension of a data set related to the globular clusters of the galaxy NGC 5128 in order to make an objective classification. A Globular Cluster (GC) is generally a spherical collection of metal-poor stars that orbits a galactic core as a satellite. Studies on GCs are important for the understanding of stellar evolution and the formation of galaxies and the underlying cosmology. Although their origin and formation scenario are speculated to be connected to that of their host galaxy, an elaborate investigation is still in process. The stars in the GCs are among the oldest stars in the galaxy. The brightness and distinctive appearance of GCs make them relatively easy to detect at large distances. Classical formation of galaxies can be divided into five major categories: (1) the monolithic collapse model, (2) the major merger model, (3) the multiphase dissipational collapse model, (4) the dissipationless merger model and (5) accretion and in situ hierarchical merging.
According to the monolithic collapse model, an elliptical galaxy is formed through the collapse of an isolated massive gas cloud at high redshift (Larson, 1975, Carlberg, 1984, Arimato and Yoshii, 1987). In this model, the color distribution of GCs is unimodal, and the rotation of GCs is produced by the tidal force from satellite galaxies (Peebles, 1969). In the major merger model, elliptical galaxies are formed by the merger of two or more disk galaxies (Toomre, 1977, Ashman and Zepf, 1992, Zepf et al., 2000). Younger GCs are formed out of the shocked gas in the disk, while blue GCs come from the halos of the merging galaxies (Bekki et al., 2002). As a result, the color distribution is bimodal. In this scenario, the kinematic properties of the GCs depend weakly on the orbital configuration of the merging galaxies, but the metal-rich GCs are generally located in the inner region of the galaxy, and the metal-poor ones in the outer regions.
The multiphase dissipational collapse has been proposed by Forbes et al. (1997). According to this model, the GCs form in distinct star formation episodes through dissipational collapse. In addition, there is tidal stripping of GCs from satellite dwarf galaxies. Blue (metal-poor) GCs form in the initial phase and red (metal-rich) GCs form from the enriched medium at a later epoch, thus producing a bimodal color distribution of the GCs. This model predicts that the system of blue GCs has no rotation and a high-velocity dispersion, while the red GCs show some rotation depending on the degree of dissipation. Côté et al. (1998) proposed a model in which the GC color bimodality is due to the capture of metal-poor GCs through merger or tidal stripping. The metal-rich GCs are the initial population of GCs in the galaxy and are more centrally concentrated than the captured GCs. The main difference with the previous model is that no age difference is expected between the blue and red GC populations. The very different origins for the two populations imply rather different orbital properties; in particular, the metal-poor GCs should show a larger velocity dispersion than the metal-rich ones, comparable in the outer region to that of the neighboring galaxies.
From the above discussion, it appears that there are kinematic differences among the subpopulations of GCs in different galaxies. These differences can be used as an observational constraint on the galaxy formation model. In the above studies, the GCs are classified as metal-rich and metal-poor on the basis of the value of a single variable [Fe/H] or which is subjective in nature and also inappropriate in a multivariate setup. Concentrating on a single variable means that one ignores the joint effect of several parameters.
NGC 5128 (Fig. 1) (Centaurus A) is a prominent galaxy in the constellation of Centaurus. NGC 5128 is one of the closest giant elliptical galaxies to Earth. Its active galactic nucleus has been extensively studied by the professional astronomers (Beasley et al., 2008). In a previous work (Chattopadhyay et al., 2009), we have first used a modified technique of PCA (Salibián-Barrera et al., 2006) to search for the optimum set of parameters which gives maximum variations for the GCs (Fig. 2) in NGC 5128. This can be considered as a robust PCA based on a multivariate MM estimator. In that work, for cluster analysis (CA) we have used two methods; one is based on mixture models (Qiu and Tamhane, 2007) and the other one is -means (MacQueen, 1967). To find the optimum number of clusters we used the method developed by Sugar and James (2003). The robust PCA method and cluster analysis based on mixture models have been discussed in brief in Appendix A and Appendix B respectively.
Although the above mentioned modified PCA and mixture model based clustering methods are quite robust, they perform better when the sample size is quite large in comparison with the dimension of the data set. But here the number of GCs having values of all 15 variables (parameters) is only 130. Further tests for normality show that the samples are from a non-Gaussian distribution.
In the present study we have done -means clustering on the basis of ICs as well as PCs to identify the proper method applicable to the present data set on the basis of within cluster sum of squares.
In this paper Section 2 is related to different features of ICA while a theoretical comparison between PCA and ICA has been discussed in Section 3. The data analysis part as well as properties of the three groups and conclusions are illustrated in Section 4.
The explanations of all the structural and photometric parameters used in this paper are listed in Table 1.
Section snippets
Independent component analysis
Suppose there are observations on each of correlated variables. Let us denote the data matrix by . By singular value decomposition one can write . Writing and , we have and hence each of the columns of is a linear combination of the columns of . Now since is orthogonal and assuming that the columns of have mean zero, it is easy to show that the columns of have zero mean, unit variance and they are uncorrelated. In terms of random variables we can
Independent component analysis versus principal component analysis
Both independent component analysis and principal component analysis are used for analyzing large data sets. Whereas ICA finds a set of source data that are mutually independent, PCA finds a set of data that are mutually uncorrelated. ICA was originally developed for separating mixed audio signals into independent sources. In this paper we make the comparison by analyzing GC data.
The purpose of PCA is to reduce the original data set of two or more sequentially observed variables by identifying
Data analysis
For data analysis we have used software and all the necessary programs are written using script.
Acknowledgments
The authors are grateful to the referees for their comments which significantly improved the quality of the paper.
References (43)
Independent component analysis, a new concept?
Signal Processing
(1994)- et al.
Independent component analysis: algorithms and applications
Neural Networks
(2000) - et al.
A comparative study of the -means algorithm and the normal mixture model for clustering: univariate case
Journal of Statistical Planning and Inference
(2007) - et al.
Statistical process control charts for batch operations based on independent component analysis
Industrial & Engineering Chemistry Research
(2004) - et al.
Chemical and photometric properties of a galactic wind model for elliptical galaxies
Astronomy & Astrophysics
(1987) - et al.
The formation of globular clusters in merging and interacting galaxies
The Astrophysical Journal
(1992) - et al.
M31 globular clusters: colors and metallicities
The Astronomical Journal
(2000) - et al.
A 2dF spectroscopic study of globular clusters in NGC 5128: probing the formation history of the nearest giant elliptical
- et al.
Globular cluster formation from gravitational tidal effects of merging and interacting galaxies
- et al.
Separation of artifacts and events in astrophysical images using independent component analysis
International Journal Of Computational Cognition
(2005)
Dissipative formation of an elliptical galaxy
The Astrophysical Journal
Washington photometry of low surface brightness dwarf galaxies in the Fornax cluster: constraints on their stellar populations
The Astrophysical Journal
Study of NGC 5128 globular clusters under multivariate statistical paradigm
The Astrophysical Journal
The formation of giant elliptical galaxies and their globular cluster systems
The Astrophysical Journal
On the origin of globular clusters in elliptical and cD galaxies
The Astronomical Journal
Independent component analysis for artefact separation in astrophysical images
Neural Networks
Washington photometry of the globular cluster system of NGC 4472, I, analysis of the metallicities
The Astronomical Journal
The structure and evolution of NGC 5128
The Astrophysical Journal
Independent Component Analysis
Centaurus A — NGC 5128
Astronomy & Astrophysics Review
Cited by (24)
Comparison among different Clustering and Classification Techniques: Astronomical data-dependent study
2023, New AstronomyCitation Excerpt :In astrostatistics, applications of dimension reduction and clustering techniques are quite common. Chattopadhyay et al. (2013a); Chattopadhyay et al. (2013b, 2012) considered such problems in their work. The paper is organized as follows: In Section 2, we discuss the Supervised and Unsupervised clustering and classification techniques, Section 3 gives an idea about the data set under consideration.
Hierarchical independent component analysis: A multi-resolution non-orthogonal data-driven basis
2016, Computational Statistics and Data AnalysisBayesian predictive kernel discriminant analysis
2013, Pattern Recognition LettersMultivariate Analysis of the Globular Clusters in M87
2015, Publications of the Astronomical Society of AustraliaUse of Cross-Correlation Function to Study Formation Mechanism of Massive Elliptical Galaxies
2014, Publications of the Astronomical Society of AustraliaInvestigation of the effect of bars on the properties of spiral galaxies: a multivariate statistical study
2024, Communications in Statistics: Simulation and Computation