Elsevier

Information Sciences

Volume 570, September 2021, Pages 795-814
Information Sciences

Deep matrix factorization with knowledge transfer for lifelong clustering and semi-supervised clustering

https://doi.org/10.1016/j.ins.2021.04.067Get rights and content

Abstract

Clustering analysis aims to group unlabeled data in an unsupervised learning manner. However, most existing methods are tailored for single-task data and do not work for a sequence of tasks. In this paper, we propose a Deep Matrix factorization method with Knowledge transfer (DMK) to address clustering problem in a lifelong setting, where DMK approaches a sequence of tasks; after each task is learned, its knowledge will be retained and later used to help future clustering task. To this end, we delve into deep matrix factorization and graph co-clustering, where (1) the former learns a basis feature library across all arrived tasks and a specific representation for each target task to deal with lifelong clustering and (2) the latter builds a consistent feature embedding library to transfer knowledge between each pair of tasks. An iterative optimization algorithm is then proposed to alternatively update the two libraries. In addition, we extend our DMK into a semi-supervised version and propose a Semi-supervised Deep Matrix factorization method with Knowledge transfer (SDMK) by exploiting a few of prior label information for lifelong semi-supervised clustering. Experimental results using four datasets with sequential tasks demonstrate that the proposed methods outperform state-of-the-art baseline methods markedly.

Introduction

Clustering analysis is one of the techniques that partition a set of data objects in such a way that data objects in the same group (or called cluster) are more similar to each other than to those in other groups. In recent years, data clustering has been studied extensively from some novel perspectives, say clustering with noisy data [13], incomplete data clustering [17], multi-view clustering [36], and multi-task clustering [48], [49]. Among these, multi-task clustering transfers the knowledge learned from different tasks to help clustering for a target task and groups the data of multiple tasks at the same time. For example, Zhang et al. [47] proposed a multi-task clustering method to transfer instance knowledge across tasks to help cluster each target task. Zhang et al. [48] also proposed a model to jointly mine task relationship among all tasks to help data clustering of every task. However, multi-task clustering methods mainly consider a fixed task set and do not learn incrementally. Given a sequence of tasks, when a new task is fed into a multi-task clustering model, the model needs to rebuild on all arrived tasks repeatedly, which is costly and time-consuming as the tasks are possibly endless. In this paper, we focus on the problem of data clustering of a sequence of tasks. This problem setting is similar as the current lifelong learning setting [5], which addresses a set of clustering tasks consecutively. Lifelong learning is a continual learning process where the learner learns a sequence of tasks; after learning each task, the learner retains knowledge and later uses the knowledge to help learn future task [2]. Mostly existing works focus on lifelong classification problems. In this work, we concern a lifelong clustering problem.

Rao et al. [25] proposed an unsupervised continual learning model to learn a task specific representation on the top of a larger set of shared parameters, which aims to solve catastrophic forgetting in neural networks. Note that the data in [25] are actually from the same task as the same dataset is partitioned into multiple subsets and then each subset is treated as a task. While, our tasks are from different datasets (see experiment section). Sun et al. [29] proposed a lifelong spectral clustering method to investigate spectral clustering in a lifelong learning setting. However, this method only extracts the shallow information of the data using spectral graph theory and cannot mine the deep intrinsic information among data samples. Our lifelong clustering method is a deep model, which is based on deep matrix factorization to effectively extract deep information [18], [3]. Meanwhile, these methods work in an unsupervised manner without any label information. In practice, a few of prior label information can robust and boost clustering performance remarkably [11], [10], [16]. Learning with partially supervised label information is known as semi-supervised learning [34]. To our knowledge, there is no work on lifelong semi-supervised clustering. In our work, we bring semi-supervised learning into our lifelong clustering framework and further propose a lifelong semi-supervised clustering method. In brief, the framework of our lifelong clustering is shown in Fig. 1.

Regarding clustering in a lifelong learning setting, there are at least two key challenges need to address. (1) How to mine shared knowledge among multiple tasks? Similar to multi-task learning, it needs to extract effective and sufficient information across tasks and to learn the correlation between tasks. (2) How to learn and transfer knowledge from previous tasks to the target/new task? The number of tasks in lifelong learning is possibly never ending and the task set is not fixed unlike multi-task learning. Thus, it would be essential that storing knowledge learned from previous tasks and transferring the knowledge to a new task to boost clustering performance for this new task.

In this paper, we design a deep multi-layer architecture to learn deep-layer representations across a sequence of tasks and meanwhile utilize graph co-clustering to transfer knowledge from one task to another. We then propose a Deep Matrix factorization method with Knowledge transfer (denoted as DMK) for lifelong clustering based on deep matrix factorization and graph co-clustering. Given a set of prior label information in each task, we further propose a Semi-supervised DMK (denoted as SDMK) for lifelong semi-supervised clustering. The proposed DMK and SDMK will be clear shortly. In summary, this paper makes the following contributions:

  • It studies a new clustering paradigm called lifelong clustering, which aims to retain and accumulate knowledge learned from each previous task and to use the knowledge to help future clustering tasks.

  • It proposes a deep multi-layer model (i.e., Deep Matrix factorization method with Knowledge transfer) for lifelong clustering. The proposed model learns a basis feature library shared by all previous tasks and meanwhile learns a task-specific representation for each new/target task. It also exploits graph co-clustering to learn a common feature embedding library to exchange and transfer latent information between sequential tasks.

  • It further proposes a lifelong semi-supervised clustering method by exploiting a few of prior label information, which uses a label knowledge library to transfer semi-supervised knowledge among tasks.

  • An alternating iterative optimization algorithm is presented to optimize the proposed lifelong clustering and semi-supervised clustering methods respectively in the lifelong setting. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the proposed methods.

Section snippets

Related works

In this section, we discuss the related works to our proposed method, most notably, Lifelong Learning, Multi-task Clustering and Matrix Factorization.

Deep matrix factorization method with knowledge transfer for lifelong clustering

This section presents the proposed Deep Matrix factorization method with Knowledge transfer for lifelong clustering (i.e., DMK). We first introduce some preliminaries and then detail the proposed DMK model.

Semi-supervised lifelong clustering based on DMK

This section presents our semi-supervised DMK (i.e., SDMK), which is a lifelong semi-supervised clustering method by utilizing a few of prior label information from data. The key novelty of SDMK against DMK is to learn a label knowledge library to retain and transfer semi-supervised information among sequential tasks. Similar to optimize DMK, an optimization algorithm is proposed to solve the SDMK in a lifelong manner.

Let X={X1,X2,,Xt,,Xm} and Xt=[Xtl,Xtu], where l and u indicates labeled and

Optimization of DMK

This section introduces how to optimize the proposed DMK in a lifelong clustering setting. When the m-th future/new clustering task Tm arrives and flows into our DMK, we update each variable by fixing other variables alternatively. Note that we can update Zi,Him,Hrm and B without accessing the raw data [Xt,X̂t] (t<m) of previous tasks again, which can reduce the time complexity compared to those traditional algorithms. Our update rules are shown as:

Update Zi(ir). We minimize the objective

Experiments

This section evaluates the lifelong clustering performance of the proposed DMK and SDMK. We compared DKM (and SDMK) with several unsupervised clustering (and semi-supervised) baseline models on four real-world datasets.

Conclusions

This paper presented a Deep Matrix factorization method with Knowledge transfer (DMK) for lifelong clustering. The proposed DMK is a deep multi-layer framework to extract deep features against shallow matrix factorization based models. DMK learns an unified representation across tasks and then exploits graph co-clustering to embed task-specific features and to transfer knowledge across tasks. In addition, it accumulates the previously learned knowledge without accessing the previous tasks’ raw

CRediT authorship contribution statement

Yiling Zhang: Conceptualization, Methodology, Software, Investigation, Visualization, Writing - original draft. Hao Wang: Supervision, Writing - review & editing. Yan Yang: Supervision, Writing - review & editing, Funding acquisition. Wei Zhou: Data curation, Supervision, Writing - review & editing. Tianrui Li: Writing - review & editing. Xiaocao Ouyang: Writing - review & editing. Hongyang Chen: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Science Foundation of China (No. 61976247). Hao Wang would like to thank a grant from Project Funded by China Postdoctoral Science Foundation (No. 2020M681960) and a grant Sponsored by Zhejiang Lab (No. 2020KB0AA02).

References (50)

  • H.B. Ammar et al.

    Autonomous cross-domain knowledge transfer in lifelong policy gradient reinforcement learning

  • G. Anthes

    Lifelong learning in artificial neural networks

    Communications of the ACM

    (2019)
  • S. Arora et al.

    Implicit regularization in deep matrix factorization

    Advances in Neural Information Processing Systems

    (2019)
  • H. Cai et al.

    Semi-supervised multi-view clustering based on orthonormality-constrained nonnegative matrix factorization

    Information Sciences

    (2020)
  • Z. Chen et al.

    Lifelong machine learning

    Synthesis Lectures on Artificial Intelligence and Machine Learning

    (2018)
  • C. Ding et al.

    Convex and semi-nonnegative matrix factorizations

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2010)
  • A.A.T. Evgeniou et al.

    Convex multi-task feature learning

    Machine Learning

    (2008)
  • Q. Gu et al.

    Learning the shared subspace for multi-task clustering and transductive transfer classification

  • C. Hong et al.

    Multimodal face-pose estimation with multitask manifold deep learning

    IEEE Transactions on Industrial Informatics

    (2019)
  • Y. Jia et al.

    Semi-supervised non-negative matrix factorization with dissimilarity and similarity regularization

    IEEE Transactions on Neural Networks and Learning Systems

    (2020)
  • Y. Jia et al.

    Semi-supervised adaptive symmetric non-negative matrix factorization

    IEEE Transactions on Cybernetics

    (2020)
  • W. Jiang, F. lai Chung, Transfer spectral clustering, in: Joint European Conference on Machine Learning and Knowledge...
  • Z. Kang et al.

    Robust graph learning from noisy data

    IEEE Transactions on Cybernetics

    (2019)
  • D. Kuang et al.

    Symmetric nonnegative matrix factorization for graph clustering

  • X. Li et al.

    Lifelong multi-task multi-view learning using latent spaces

  • Z. Li et al.

    Robust structured nonnegative matrix factorization for image representation

    IEEE Transactions on Neural Networks and Learning Systems

    (2018)
  • X. Liu, M. Li, C. Tang, J. Xia, J. Xiong, L. Liu, M. Kloft, E. Zhu, Efficient and effective regularized incomplete...
  • M. Ma et al.

    Co-regularized nonnegative matrix factorization for evolving community detection in dynamic networks

    Information Sciences

    (2020)
  • C.D. Manning et al.
    (2008)
  • J.H. Manton

    Optimization algorithms exploiting unitary constraints

    IEEE Transactions on Signal Processing

    (2002)
  • T. Mitchell et al.

    Never-ending learning

    Communications of the ACM

    (2018)
  • F. Nie et al.

    Efficient and robust feature selection via joint l2, 1-norms minimization

  • G.I. Parisi et al.

    Continual lifelong learning with neural networks: A review

    Neural Networks

    (2019)
  • X. Peng et al.

    Robust distribution-based nonnegative matrix factorizations for dimensionality reduction

    Information Sciences

    (2021)
  • D. Rao et al.

    Continual unsupervised representation learning

    Advances in Neural Information Processing Systems

    (2019)
  • Cited by (8)

    • Robust local-coordinate non-negative matrix factorization with adaptive graph for robust clustering

      2022, Information Sciences
      Citation Excerpt :

      As one of the most critical tools in the area of data mining, clustering has attracted much attention because of its low cost and high efficiency. In the past decades, many clustering methods have emerged, such as k-means [3], subspace clustering [4], hierarchical clustering [5], non-negative matrix factorization (NMF) [6,7], etc. Among these clustering methods, NMF has gained more attention due to its excellent geometric significance - the intuitive interpretability of non-negative data and the ability to learn parts-based representation [8–10].

    • A Knowledge Transfer-Based Semi-Supervised Federated Learning for IoT Malware Detection

      2023, IEEE Transactions on Dependable and Secure Computing
    View all citing articles on Scopus
    View full text