Elsevier

Knowledge-Based Systems

Volume 242, 22 April 2022, 108364
Knowledge-Based Systems

Group non-convex sparsity regularized partially shared dictionary learning for multi-view learning

https://doi.org/10.1016/j.knosys.2022.108364Get rights and content

Abstract

Multi-view learning aims to obtain more comprehensive understanding than single-view learning by observing objects from different views. However, most existing multi-view learning algorithms are still facing problems in obtaining enough discriminative information from the multi-view data: (1) most models cannot fully exploit consistent and complementary information simultaneously; (2) existing group sparsity based multi-view learning methods cannot extract the most relevant and sparest features. This paper proposes the efficient group non-convex sparsity regularized partially shared dictionary learning for multi-view learning, which employs the partially shared dictionary learning model to excavate both consistency and complementarity simultaneously from the multi-view data, and utilizes the generalized group non-convex sparsity for more discriminative and sparser representations beyond the convex 2,1 norm. To solve the non-convex optimization problem, we derive the generalized optimization framework for different group non-convex sparsity regularizers based on the proximal splitting method. Corresponding proximal operators for structured sparse coding in the framework are derived to form algorithms for different group non-convex sparsity regularizers, i.e., the 2,p (0<p<1) norm and the 2,log regularizer. In experiments, we conduct multi-view clustering in seven real-world multi-view datasets, and performances validate the effectiveness of both group information and non-convexity. Furthermore, results show that appropriate coefficient sharing ratios can help to exploit consistent information while keeping complementary information from multi-view data, thus helping to improve clustering performances. In addition, the convergence performances show that the proposed algorithm can obtain the best clustering performances among compared algorithms and can converge efficiently and stably with reasonable running time costs.

Introduction

Different from the conventional machine learning paradigm, which learns from data in a single view, humans usually look at real-life problems from different views, to make understanding holistically and comprehensively. Multi-view learning can learn representations from multi-view data to help to extract more useful and accurate information than that from single-view data [1], [2]. In general, multi-view data consist of complementary information in different views and consistent information between different views. Complementary information is related to disagreement and diversity of concepts, indicating that each view may own some information that other views do not have, while consistent information is based on the concept consensus and is to maximize the agreement among multiple views [3]. Therefore, multi-view learning is targeted at exploiting more complementary information while keeping consistent information to achieve more comprehensive representations for data than those of single-view methods [4].

The matrix decomposition based multi-view learning is one of the popular methods [5], which represents original multi-view data with basis and corresponding subspace representations [6], [7]. Dictionary Learning (DL) is one approach of matrix-decomposition based methods [4], [8], which learns adaptive dictionaries from multi-view data and then obtains the sparsest possible representations by selecting the most relevant atoms. The sparsity is useful in exploiting features from multi-view data by reducing influences from noise interference and coherence between different data classes, leading to low correlations between representations and multi-view data. Existing DL-based multi-view clustering methods are promising [9], [10], [11], but still cannot exploit enough discriminative information from multi-view data. The discriminative information in representations is directly corresponding to the clustering performances, since more discriminative representations can be accurately clustered more easily [12]. There are two practical routes for tackling this issue.

The first is to exploit more complementary information while keeping consistent information to obtain more discriminative representations from data. When processing multi-view data, the simplest solution is to concatenate signals of different views into a single-view signal based on the view-consistency assumption. However, this solution neglects complementary information and correlations between views. Some others concentrate on learning representations separately from each view, neglecting consistency among views [13], [14]. Recently, a partially shared latent factor learning approach based on Non-negative Matrix Factorization (NMF) was developed [15], [16], which utilizes consistency and complementarity of multi-view data to learn the partially shared coefficients for semi-supervised learning, and achieves good performances. Nevertheless, NMF-based approaches [15], [16] force representations to be non-negative and learn from the whole atom base; thus, they may not take full advantage of information from negative signals and lead to weak robustness to noise. Therefore, developing an efficient method that can utilize the whole signals and exploit both complementary and consistent information to obtain more comprehensive representations from data is imperative for multi-view clustering.

The other is to exploit more discriminative information with suitably structured sparsity constraints. DL-based multi-view clustering methods do not force non-negative signals and representations but utilize the sparsity that widely existed in signals to exploit features. Conventional sparsity, such as the 0 norm [10] and the 1 norm [11], induce sparsity atom-wisely, neglecting the similarity of signals in the same group but tackling separately. Group sparsity is a technology to capture more discriminative and relevant information from data [17]. Based on group information beyond conventional atom-wise sparsity, group sparsity can exploit features directly for each group and reduce correlations between groups, thus group sparse representations can be clustered accurately more easily [12], [18], [19]. An intuitive diagram is shown in Fig. 1, which illustrates how group information functions in dictionary learning to obtain discriminative and lowly correlated representations between groups. However, existing group sparsity learning in multi-view learning usually uses the 2,1 norm [12], [19], [20], in which the 2 norm sums up representations in each group, and the 1 norm promotes sparsity between groups. The 1 norm is convex and can be easily solved. However, the 1 norm regularizer cannot obtain strong enough sparsity and result in bias in various situations, letting the 2,1 norm obtain less discriminative and accurate group sparse representations.

This paper proposes a novel dictionary learning algorithm for multi-view learning. To exploit more discriminative information from multi-view data, firstly, we employ the Partial Shared Dictionary Learning (PSDL) model to exploit consistency with view-specific coefficients for each view, and complementarity with shared coefficients from all views. The PSDL model can fit different datasets by tuning a flexible parameter to change the ratio between view-specific and shared coefficients. To obtain sparser and more discriminative coefficients, we propose the generalized group non-convex sparsity regularizer, beyond the popular group sparsity 2,1 norm. The convex 1 norm is the key part of the 2,1 norm to obtain sparsity between groups. By replacing the convex 1 norm with different non-convexity, the generalize group non-convex sparsity regularizers are formed. As shown in Fig. 2, the non-convexity can help to induce higher sparsity than the convex 1 norm by reducing smaller values with higher speeds [21]. In group non-convex sparsity, the non-convexity can help to enhance sparsity between groups, and obtain more accurate and sparser features from multi-view data, then exploit the group information sufficiently for more discriminative information.

To solve the group non-convex sparsity regularized PSDL problem, the proximal splitting method [22] is employed. We further derive the generalized optimization framework for different group non-convex sparsity regularizers. In specific, we propose two novel group non-convex sparsity regularizers in this paper, i.e., the group p norm 2,p (0<p<1) and the group log regularizer 2,log, and derive corresponding proximal operators to form algorithms for different group non-convex sparsity regularizers. To further accelerate the algorithm, we employ the extrapolation technology [23], to accelerate optimization. Consequently, the Group Non-Convex sparsity regularized PSDL algorithms (GNCPSDL) are formed. Finally, we validate that our proposed framework can obtain good performances in multi-view clustering experiments with different group non-convex sparsity regularizers.

The main contributions of the paper are summarized as follows:

  • 1.

    To obtain more discriminative information from multi-view data, we propose the PSDL model for multi-view learning beyond NMF-based multi-view learning, with a flexible parameter to adjust the ratio between view-specific coefficients for complementarity and shared coefficients for consistency between views.

  • 2.

    To achieve sparser and more discriminative representations, we propose the group non-convex sparsity regularizer for the PSDL problem, which can enhance sparsity and correlations between groups beyond the conventional 2,1 norm, thus exploiting the group information more sufficiently.

  • 3.

    To fast and efficiently solve the PSDL problem with the group non-convex sparsity regularizer, we derive the proximal operator for the generalized group non-convex sparsity regularizer and use the extrapolated proximal splitting method to develop a novel DL algorithm for multi-view learning.

In Section 2, we review multi-view learning, and dictionary learning based multi-view learning methods. In Section 3, we detail the partially shared dictionary learning model and steps for solving the model with the group non-convex sparsity regularizer. Experiments on seven real-world datasets are presented in Section 4. At last, conclusions are given in Section 5.

Section snippets

Notation

In this paper, the boldface uppercase letter, like A, denotes a matrix, and ai denotes the ith column of the matrix; boldface lowercase letter a denotes a vector, and ai denotes the ith entry of a. The upper right corner of A(v) is the view number v. The k at the bottom right corner of (A)k stands for iterations. AT denotes the transpose of the matrix A. Specific notations are listed in Table 1.

Multi-view learning

Multi-view learning is effective in varied areas, e.g., video processing [24], medical signals [25],

Group non-convex sparsity regularized partially shared dictionary learning

In this section, we first present the formulation of the GNCPSDL problem, which is formed as a flexible structure for the partially shared coefficients, and the generalized group non-convex sparsity is employed as the sparsity regularizer. Then, we detail the derivation process of the proposed algorithm for the generalized group non-convex sparsity. The optimization process is mainly divided in three parts, i.e., sparse coding for view-specific and shared coefficients, and dictionary learning.

Experiment

To validate the performances of our proposed algorithms, we present a series of clustering experiments on seven real-world multi-view datasets. For the unsupervised clustering experiments, we obtain group information based on k-mean clustering in the initialization instead of using supervised label information, which are detail in the initialization subsection. All experiments were performed via Matlab R2021a, and programs were run on a PC with a 3.4 GHz Intel core and 16G RAM.

Conclusion

In this paper, we develop an efficient multi-view learning algorithm based on partially shared dictionary learning via generalized group non-convex sparsity. This work is targeted at enhancing the ability to exploit discriminative information from multi-view data. First, the flexible partial shared dictionary learning structure is employed for exploiting more consistent and complementary information to obtain more discriminate representations for multi-view learning. Second, we propose to use

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work described in this paper was supported by the National Key Research and Development Plan (2021YFB2700302), the National Natural Science Foundation of China (62172453), the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (2017ZT07X355), the Pearl River Talent Recruitment Program (No. 2019QN01X130), Guangzhou Science and Technology Program Project (202002030289), and (6142006200403).

References (56)

  • YangY. et al.

    Multi-view clustering: A survey

    Big Data Min. Anal.

    (2018)
  • WangH. et al.

    GMC: Graph-based multi-view clustering

    IEEE Trans. Knowl. Data Eng.

    (2019)
  • LiY. et al.

    A survey of multi-view representation learning

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • XieK. et al.

    Eliminating the permutation ambiguity of convolutive blind source separation by using coupled frequency bins

    IEEE Trans. Neural Netw. Learn. Syst.

    (2019)
  • LiX. et al.

    Multi-view clustering: A scalable and parameter-free bipartite graph fusion method

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2020)
  • ZhuX. et al.

    One-step multi-view spectral clustering

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • LiZ. et al.

    Direct-optimization-based DC dictionary learning with the MCP regularizer

    IEEE Trans. Neural Netw. Learn. Syst.

    (2021)
  • LiB. et al.

    Multi-view multi-instance learning based on joint sparse representation and multi-view dictionary learning

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • WangQ. et al.

    Multi-view analysis dictionary learning for image classification

    IEEE Access

    (2018)
  • ZhuX. et al.

    Multi-view image clustering based on sparse coding and manifold consensus

    Neurocomputing

    (2020)
  • L. Sun, C.H. Nguyen, H. Mamitsuka, Fast and robust multi-view multi-task learning via group sparsity, in: IJCAI...
  • LiuJ. et al.

    Multi-view clustering via joint nonnegative matrix factorization

  • WangJ. et al.

    Diverse non-negative matrix factorization for multiview data representation

    IEEE Trans. Cybern.

    (2017)
  • LiuJ. et al.

    Partially shared latent factor learning with multiview data

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • LiuZ. et al.

    Weighted discriminative sparse representation for image classification

    Neural Process. Lett.

    (2021)
  • SunY. et al.

    Learning discriminative dictionary for group sparse representation

    IEEE Trans. Image Process.

    (2014)
  • ZengZ. et al.

    Robust discriminative multi-view K-means clustering with feature selection and group sparsity learning

    Multimedia Tools Appl.

    (2018)
  • LiZ. et al.

    Accelerated log-regularized convolutional transform learning and its convergence guarantee

    IEEE Trans. Cybern.

    (2021)
  • Cited by (6)

    • Adaptive sparsity-regularized deep dictionary learning based on lifted proximal operator machine

      2023, Knowledge-Based Systems
      Citation Excerpt :

      Sparse representations based on dictionary learning (DL) have been successfully applied to signal processing, computer vision, and machine learning [1–15].

    • Dictionary Learning-Based Reinforcement Learning with Non-convex Sparsity Regularizer

      2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View full text