Elsevier

Applied Soft Computing

Volume 29, April 2015, Pages 95-109
Applied Soft Computing

A repartition method improving visual quality for PCA image coding

https://doi.org/10.1016/j.asoc.2014.12.029Get rights and content

Highlights

  • This study formulated a clustering method embedded in a GA framework.

  • Individuals of the same group are homogeneous, and vice versa.

  • The homogeneity property is in favor of the PCA subspace projection mechanism.

  • The repartition method effectively increases the image quality and visual effect.

Abstract

Image coding using principal component analysis (PCA), a type of image compression technique, projects image blocks to a subspace that can preserve most of the original information. However, the blocks in the image exhibit various inhomogeneous properties, such as smooth region, texture, and edge, which give rise to difficulties in PCA image coding. This paper proposes a repartition clustering method to partition the data into groups, such that individuals of the same group are homogeneous, and vice versa. The PCA method is applied separately for each group. In the clustering method, the genetic algorithm acts as a framework consisting of three phases, including the proposed repartition clustering. Based on this mechanism, the proposed method can effectively increase image quality and provide an enhanced visual effect.

Introduction

Numerous data analysis techniques, such as regression and principal component analysis (PCA), possess time or space complexity and are thus impractical for large datasets [6], [25]. Therefore, instead of applying such techniques directly to the entire dataset, researchers adopt cluster analysis and apply these techniques to each cluster, which consists of only a portion of the original data. Depending on the type of cluster analysis, the number of clusters, and the accuracy with which the clusters represent the data, the results can be comparable with those that would have been obtained by using all data. Cluster analysis techniques have recently been applied to microarray data, image analysis, and marketing science [13], [26].

Cluster analysis [11] is a core issue in data mining with innumerable applications spanning many fields. In order to mathematically identify clusters in a dataset, it is usually necessary to first define a measure of similarity or proximity which will establish a rule for assigning patterns to a particular cluster. The measure of similarity is usually data dependent. The clustering aims to optimize a cost function that is defined over all possible groupings. Moreover, the cost function depends on the manner by which the data are decomposed and has limited meaning on one separate item [20]. In this technique, the collected information is divided into various clusters to show the system behavior patterns effectively. In other words, patterns in the same group are similar in some sense and patterns in different groups are dissimilar in the same sense [4], [5]. In terms of analysis of variance (ANOVA), the within-variance is low and between-variance is high. Here, “variance” means the sample variance among all possible linear combination of observations [8]. We will apply this property to the proposed method, in which PCA is employed as the data analysis technique for image coding. In this study, we adopt the K-means algorithm [10], [24], [30] proposed by Mac Queen (1967) to minimize the sum of the distance from each data to its cluster center. The K-means algorithm is a popular clustering method for its capability to group huge datasets efficiently.

Data reduction techniques aim to efficiently represent data [14], [15], [28]. One example is the Karhunen–Loeve Transform (KLT), in which a higher dimensional input space is mapped to a lower dimensional feature space through linear transformation [19]. As an alternative approach to feature extraction in the n-dimensional space, PCA finds the m (m < n) basis components, such that the projection to the corresponding subspace possesses the largest variations [27]. In a similar fashion, PCA computes for the covariance matrix of input data with zero mean. After solving the eigenvalues of a covariance matrix, PCA extracts the eigenvectors corresponding to the maximum eigenvalues [7], [16]. Dimension reduction is achieved by using the eigenvectors with the most significant eigenvalues, which form an orthogonal basis for a low dimension subspace. Every vector in the original space can be approximated by a corresponding to a vector in the subspace [9], [22]. Dimensionality reduction is frequently used as a pre-processing step in data mining. Selecting a smaller number of features carries a significant role in applications involving hundreds or thousands of features. Besides relevant features, there might be derogatory features, indifferent features, and redundant (dependent) ones. Removal of these features not only makes the learning task easier, by reducing computational constraint but also often improves the performance of the classifier [4], [5]. Such data reduction is applied to images to achieve image compression. In this work, we separately use PCA for each cluster, which consists of some specified block images, to reconstruct the original (or input) image [29].

The genetic algorithm (GA) [17], [18], [21], originally developed by Holland over the course of the 1960s and 1970s, is a biological analogy. In the selective breeding of plants or animals, for example, offspring is produced as a combination of the parent chromosomes according to certain characteristics that are determined at the genetic level. When the fitness landscape (or cost surface) of the problem is unclear or riddled with a large number of local optima, the GA usually has good searching capability because the candidate solutions will not become stuck at the local optima [23]. The GA has been successfully applied to many fields of science and engineering [12]. In the proposed algorithm, we partition the dataset into numerous clusters, in which the numbers of principal components using PCA can vary. In this work, we use GA as a framework with three phases, namely, GA operation, repartition clustering, and clustering PCA for image coding. In repartition clustering, the clustering and the number of principal components for each cluster are determined progressively.

Some GA-based clustering algorithms such as stochastic clustering algorithms based on GA, Simple GA (SGA), Hybrid Niching GA (HNGA), and multi-objective GA are mentioned in [1]. In the latter study, these methods are considered only able to find compact hyperspherical, equisized, and convex clusters like those detected by the K-means algorithm [2]. If clusters of different geometric shapes are present in the same dataset, the above methods will not be able to find all of them perfectly [3]. This paper provides a preliminary study in this direction. Here, we apply PCA to the whole dataset obtained from an image to achieve image compression. To improve the reconstructed image quality, we use K-means to partition the dataset, and then apply PCA to each cluster separately. In this method, different numbers of principal components are allowed, and GA is used to identify the optimal number of principal components for each cluster. Finally, we propose the repartition clustering method to improve the image quality and visual effect.

The proposed method can improve the homogeneity in each cluster by increasing the within-group correlation corresponding to PCA image coding. Under the condition that the total numbers of variables to store are roughly the same, the proposed algorithm removes redundant variables in clusters with simple structures and increases the number of principal components to improve the reconstructed quality of certain clusters with complex structures. Experimental results show that the proposed method can effectively increase image quality and improve the visual effect.

Section snippets

PCA image coding

PCA is a variable reduction procedure that is useful when the data that are obtained on a number of variables (possibly a large number of variables) may have some redundancy. In this case, redundancy indicates that there could be features whose presence in the dataset does not affect the performance of a classifier at all. There could even be some correlated set of features and selection of just a few of them might be sufficient for the classifier. This redundancy facilitates the reduction of

PCA image coding with clustering

In this section, we partition the dataset scanned from the original image in Section 2 into K clusters and apply PCA to each cluster separately. The block diagram of the clustering method is shown in Fig. 2. To obtain the optimal number of principal components, GA is introduced. After decoding each cluster, we can reconstruct the image by merging.

Proposed PCA method with repartition clustering

The proposed method imposes a repartition mechanism to the PCA clustering method. For a given dataset S={xi}i=1l, xiRn, the approach is to partition S into K groups by minimizing the within-group MSE in Eq. (3.4) under some pre-specified number of variables to record.

Our goal is to approximate the data point using a representation involving a restricted number m, the number of principal components, with m < n of variables corresponding to a projection onto a lower dimensional subspace. The m

Experimental results

In the clustering methods, there is always a problem “how many clusters”. In most cases, it depends on the dataset itself and the choice is usually heuristic. In our case, the types of important visual information of image blocks to human are smooth region, horizontal/vertical edges, diagonal/subdiagonal edges, and texture. So we adopt the number of cluster K = 4 as the main issue for experiments.

We partition the training set into K clusters and apply PCA to each cluster using the proposed

Conclusions

This study formulated a clustering method embedded in a GA framework to improve the performance of clustering PCA image coding. For cluster analysis, we proposed a repartition clustering algorithm that partitions the image blocks into groups, such that individuals of the same group are homogeneous, and vice versa. Furthermore, the homogeneity property in a group is in favor of the PCA subspace projection mechanism in terms of preserving most of the information. Thus, the proposed method can

Acknowledgment

This work has been supported by the National Science Council of Taiwan under grant NSC 102-2221-E-214-048.

References (30)

  • R. Chakraborty et al.

    Feature selection using a neural framework with controlled redundancy

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • S. Chiu

    Fuzzy model identification based on cluster estimation

    Intell. Fuzzy Syst.

    (1994)
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • K.I. Diamantaras et al.

    Principal Component Neural Networks: Theory and Applications

    (1996)
  • B.S. Everitt et al.

    Cluster Analysis

    (2001)
  • Cited by (0)

    View full text