Deep matrix factorization with knowledge transfer for lifelong clustering and semi-supervised clustering

doi:10.1016/j.ins.2021.04.067

Information Sciences

Volume 570, September 2021, Pages 795-814

https://doi.org/10.1016/j.ins.2021.04.067 Get rights and content

Abstract

Clustering analysis aims to group unlabeled data in an unsupervised learning manner. However, most existing methods are tailored for single-task data and do not work for a sequence of tasks. In this paper, we propose a Deep Matrix factorization method with Knowledge transfer (DMK) to address clustering problem in a lifelong setting, where DMK approaches a sequence of tasks; after each task is learned, its knowledge will be retained and later used to help future clustering task. To this end, we delve into deep matrix factorization and graph co-clustering, where (1) the former learns a basis feature library across all arrived tasks and a specific representation for each target task to deal with lifelong clustering and (2) the latter builds a consistent feature embedding library to transfer knowledge between each pair of tasks. An iterative optimization algorithm is then proposed to alternatively update the two libraries. In addition, we extend our DMK into a semi-supervised version and propose a Semi-supervised Deep Matrix factorization method with Knowledge transfer (SDMK) by exploiting a few of prior label information for lifelong semi-supervised clustering. Experimental results using four datasets with sequential tasks demonstrate that the proposed methods outperform state-of-the-art baseline methods markedly.

Introduction

Clustering analysis is one of the techniques that partition a set of data objects in such a way that data objects in the same group (or called cluster) are more similar to each other than to those in other groups. In recent years, data clustering has been studied extensively from some novel perspectives, say clustering with noisy data [13], incomplete data clustering [17], multi-view clustering [36], and multi-task clustering [48], [49]. Among these, multi-task clustering transfers the knowledge learned from different tasks to help clustering for a target task and groups the data of multiple tasks at the same time. For example, Zhang et al. [47] proposed a multi-task clustering method to transfer instance knowledge across tasks to help cluster each target task. Zhang et al. [48] also proposed a model to jointly mine task relationship among all tasks to help data clustering of every task. However, multi-task clustering methods mainly consider a fixed task set and do not learn incrementally. Given a sequence of tasks, when a new task is fed into a multi-task clustering model, the model needs to rebuild on all arrived tasks repeatedly, which is costly and time-consuming as the tasks are possibly endless. In this paper, we focus on the problem of data clustering of a sequence of tasks. This problem setting is similar as the current lifelong learning setting [5], which addresses a set of clustering tasks consecutively. Lifelong learning is a continual learning process where the learner learns a sequence of tasks; after learning each task, the learner retains knowledge and later uses the knowledge to help learn future task [2]. Mostly existing works focus on lifelong classification problems. In this work, we concern a lifelong clustering problem.

Rao et al. [25] proposed an unsupervised continual learning model to learn a task specific representation on the top of a larger set of shared parameters, which aims to solve catastrophic forgetting in neural networks. Note that the data in [25] are actually from the same task as the same dataset is partitioned into multiple subsets and then each subset is treated as a task. While, our tasks are from different datasets (see experiment section). Sun et al. [29] proposed a lifelong spectral clustering method to investigate spectral clustering in a lifelong learning setting. However, this method only extracts the shallow information of the data using spectral graph theory and cannot mine the deep intrinsic information among data samples. Our lifelong clustering method is a deep model, which is based on deep matrix factorization to effectively extract deep information [18], [3]. Meanwhile, these methods work in an unsupervised manner without any label information. In practice, a few of prior label information can robust and boost clustering performance remarkably [11], [10], [16]. Learning with partially supervised label information is known as semi-supervised learning [34]. To our knowledge, there is no work on lifelong semi-supervised clustering. In our work, we bring semi-supervised learning into our lifelong clustering framework and further propose a lifelong semi-supervised clustering method. In brief, the framework of our lifelong clustering is shown in Fig. 1.

Regarding clustering in a lifelong learning setting, there are at least two key challenges need to address. (1) How to mine shared knowledge among multiple tasks? Similar to multi-task learning, it needs to extract effective and sufficient information across tasks and to learn the correlation between tasks. (2) How to learn and transfer knowledge from previous tasks to the target/new task? The number of tasks in lifelong learning is possibly never ending and the task set is not fixed unlike multi-task learning. Thus, it would be essential that storing knowledge learned from previous tasks and transferring the knowledge to a new task to boost clustering performance for this new task.

In this paper, we design a deep multi-layer architecture to learn deep-layer representations across a sequence of tasks and meanwhile utilize graph co-clustering to transfer knowledge from one task to another. We then propose a Deep Matrix factorization method with Knowledge transfer (denoted as DMK) for lifelong clustering based on deep matrix factorization and graph co-clustering. Given a set of prior label information in each task, we further propose a Semi-supervised DMK (denoted as SDMK) for lifelong semi-supervised clustering. The proposed DMK and SDMK will be clear shortly. In summary, this paper makes the following contributions:

•
It studies a new clustering paradigm called lifelong clustering, which aims to retain and accumulate knowledge learned from each previous task and to use the knowledge to help future clustering tasks.
•
It proposes a deep multi-layer model (i.e., Deep Matrix factorization method with Knowledge transfer) for lifelong clustering. The proposed model learns a basis feature library shared by all previous tasks and meanwhile learns a task-specific representation for each new/target task. It also exploits graph co-clustering to learn a common feature embedding library to exchange and transfer latent information between sequential tasks.
•
It further proposes a lifelong semi-supervised clustering method by exploiting a few of prior label information, which uses a label knowledge library to transfer semi-supervised knowledge among tasks.
•
An alternating iterative optimization algorithm is presented to optimize the proposed lifelong clustering and semi-supervised clustering methods respectively in the lifelong setting. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the proposed methods.

Section snippets

Related works

In this section, we discuss the related works to our proposed method, most notably, Lifelong Learning, Multi-task Clustering and Matrix Factorization.

Deep matrix factorization method with knowledge transfer for lifelong clustering

This section presents the proposed Deep Matrix factorization method with Knowledge transfer for lifelong clustering (i.e., DMK). We first introduce some preliminaries and then detail the proposed DMK model.

Semi-supervised lifelong clustering based on DMK

This section presents our semi-supervised DMK (i.e., SDMK), which is a lifelong semi-supervised clustering method by utilizing a few of prior label information from data. The key novelty of SDMK against DMK is to learn a label knowledge library to retain and transfer semi-supervised information among sequential tasks. Similar to optimize DMK, an optimization algorithm is proposed to solve the SDMK in a lifelong manner.

Let $X = {X_{1}, X_{2}, \dots, X_{t}, \dots, X_{m}}$ and $X_{t} = [X_{t}^{l}, X_{t}^{u}]$ , where l and u indicates labeled and

Optimization of DMK

This section introduces how to optimize the proposed DMK in a lifelong clustering setting. When the m-th future/new clustering task $T^{m}$ arrives and flows into our DMK, we update each variable by fixing other variables alternatively. Note that we can update $Z_{i}, H_{i}^{m}, H_{r}^{m}$ and B without accessing the raw data $[X_{t}, {\hat{X}}_{t}]$ $(t < m)$ of previous tasks again, which can reduce the time complexity compared to those traditional algorithms. Our update rules are shown as:

Update $Z_{i} (i ⩽ r)$ . We minimize the objective

Experiments

This section evaluates the lifelong clustering performance of the proposed DMK and SDMK. We compared DKM (and SDMK) with several unsupervised clustering (and semi-supervised) baseline models on four real-world datasets.

Conclusions

This paper presented a Deep Matrix factorization method with Knowledge transfer (DMK) for lifelong clustering. The proposed DMK is a deep multi-layer framework to extract deep features against shallow matrix factorization based models. DMK learns an unified representation across tasks and then exploits graph co-clustering to embed task-specific features and to transfer knowledge across tasks. In addition, it accumulates the previously learned knowledge without accessing the previous tasks’ raw

CRediT authorship contribution statement

Yiling Zhang: Conceptualization, Methodology, Software, Investigation, Visualization, Writing - original draft. Hao Wang: Supervision, Writing - review & editing. Yan Yang: Supervision, Writing - review & editing, Funding acquisition. Wei Zhou: Data curation, Supervision, Writing - review & editing. Tianrui Li: Writing - review & editing. Xiaocao Ouyang: Writing - review & editing. Hongyang Chen: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Science Foundation of China (No. 61976247). Hao Wang would like to thank a grant from Project Funded by China Postdoctoral Science Foundation (No. 2020M681960) and a grant Sponsored by Zhejiang Lab (No. 2020KB0AA02).

References (50)

H.B. Ammar et al.
Autonomous cross-domain knowledge transfer in lifelong policy gradient reinforcement learning
G. Anthes
Lifelong learning in artificial neural networks
Communications of the ACM
(2019)
S. Arora et al.
Implicit regularization in deep matrix factorization
Advances in Neural Information Processing Systems
(2019)
H. Cai et al.
Semi-supervised multi-view clustering based on orthonormality-constrained nonnegative matrix factorization
Information Sciences
(2020)
Z. Chen et al.
Lifelong machine learning
Synthesis Lectures on Artificial Intelligence and Machine Learning
(2018)
C. Ding et al.
Convex and semi-nonnegative matrix factorizations
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2010)
A.A.T. Evgeniou et al.
Convex multi-task feature learning
Machine Learning
(2008)
Q. Gu et al.
Learning the shared subspace for multi-task clustering and transductive transfer classification
C. Hong et al.
Multimodal face-pose estimation with multitask manifold deep learning
IEEE Transactions on Industrial Informatics
(2019)
Y. Jia et al.
Semi-supervised non-negative matrix factorization with dissimilarity and similarity regularization
IEEE Transactions on Neural Networks and Learning Systems
(2020)

Y. Jia et al.

Semi-supervised adaptive symmetric non-negative matrix factorization

IEEE Transactions on Cybernetics

(2020)

W. Jiang, F. lai Chung, Transfer spectral clustering, in: Joint European Conference on Machine Learning and Knowledge...

Z. Kang et al.

Robust graph learning from noisy data

IEEE Transactions on Cybernetics

(2019)

D. Kuang et al.

Symmetric nonnegative matrix factorization for graph clustering

X. Li et al.

Lifelong multi-task multi-view learning using latent spaces

Z. Li et al.

Robust structured nonnegative matrix factorization for image representation

IEEE Transactions on Neural Networks and Learning Systems

(2018)

X. Liu, M. Li, C. Tang, J. Xia, J. Xiong, L. Liu, M. Kloft, E. Zhu, Efficient and effective regularized incomplete...

M. Ma et al.

Co-regularized nonnegative matrix factorization for evolving community detection in dynamic networks

Information Sciences

(2020)

C.D. Manning et al.

(2008)

J.H. Manton

Optimization algorithms exploiting unitary constraints

IEEE Transactions on Signal Processing

(2002)

T. Mitchell et al.

Never-ending learning

Communications of the ACM

(2018)

F. Nie et al.

Efficient and robust feature selection via joint l2, 1-norms minimization

G.I. Parisi et al.

Continual lifelong learning with neural networks: A review

Neural Networks

(2019)

X. Peng et al.

Robust distribution-based nonnegative matrix factorizations for dimensionality reduction

Information Sciences

(2021)

D. Rao et al.

Continual unsupervised representation learning

Advances in Neural Information Processing Systems

(2019)

Cited by (8)

Robust local-coordinate non-negative matrix factorization with adaptive graph for robust clustering
2022, Information Sciences
Citation Excerpt :
As one of the most critical tools in the area of data mining, clustering has attracted much attention because of its low cost and high efficiency. In the past decades, many clustering methods have emerged, such as k-means [3], subspace clustering [4], hierarchical clustering [5], non-negative matrix factorization (NMF) [6,7], etc. Among these clustering methods, NMF has gained more attention due to its excellent geometric significance - the intuitive interpretability of non-negative data and the ability to learn parts-based representation [8–10].
With its unique geometric properties, non-negative matrix factorization (NMF) has become one of the widely used clustering methods in the field of data mining. Regrettably, most existing NMF methods are sensitive to super-noise (super-outliers). This paper proposes a novel robust clustering method to address this issue. Based on the Hx loss function, this method establishes a novel robust adaptive local structure learning strategy, reducing the interference of noise (outliers) on data reconstruction and space exploration. In addition, a new orthogonal regularization term is incorporated into the model, ensuring the orthogonality of the factor matrix and enhancing the discriminant ability. Finally, we develop an efficient algorithm to solve the resultant model and analyze its convergence from theoretical and experimental aspects. Experimental results on random synthetic data sets and benchmark databases demonstrate that the proposed method outperforms the existing robust NMF methods in terms of spatial structure learning, discriminant power, and robustness.
Graph-based adaptive and discriminative subspace learning for face image clustering
2022, Expert Systems with Applications
Current graph-based subspace clustering methods have achieved some results for the clustering of face images. The core of those methods lies in graph learning. However, they still have the following problems when learning the graph. Firstly, the graph learning processes of those methods do not consider the alignment of the images. It is well known that the obtained images of the same category may not be aligned due to different devices and shooting angles. The unaligned images used for graph learning directly affect the accuracy of the resulting graph. Hence, the graphs obtained by these methods are not accurate. We know that the inaccuracy of the learned graph will directly reduce the clustering performance of those methods. Secondly, they believe that important features, redundant features, and noise play the same contribution in the process of graph construction and feature representation. Redundant features and noise are not beneficial to graph reconstruction and feature representation, and even cause the learned graph to be inaccurate. Thirdly, the intrinsic structural correlation between samples is rarely considered for graph learning, which makes it difficult for the learned graph to reflect the structural correlation, then a good clustering performance cannot be obtained. To address those problems, this paper proposes a graph-based adaptive and discriminative subspace learning method (GADSL). In GADSL, image alignment is introduced and unified with subspace learning under the same graph learning framework which helps reduce the impact of different shooting equipment. Besides, GADSL can adaptively assign large weights to the important features and small weights to the unimportant features by introducing the weighting matrix. Moreover, in order to consider the correlation between samples, the structural consistency constraint is introduced into the subspace learning process so that the intra-class difference decreases and the inter-class difference increases. The experimental results show that GADSL used for face image clustering has achieved better clustering performance than many state-of-the-art methods.
A New Adaptive Robust Modularized Semi-Supervised Community Detection Method Based on Non-negative Matrix Factorization
2024, Neural Processing Letters
Cohesive Pair-Wises Constrained Deep Embedding for Semi-Supervised Clustering with Very Few Labeled Samples*
2024, International Arab Journal of Information Technology
Incremental trust-aware matrix factorization for recommender systems: towards Green AI
2023, Applied Intelligence
A Knowledge Transfer-Based Semi-Supervised Federated Learning for IoT Malware Detection
2023, IEEE Transactions on Dependable and Secure Computing

View all citing articles on Scopus

View full text

Deep matrix factorization with knowledge transfer for lifelong clustering and semi-supervised clustering

Abstract

Introduction

Section snippets

Related works

Deep matrix factorization method with knowledge transfer for lifelong clustering

Semi-supervised lifelong clustering based on DMK

Optimization of DMK

Experiments

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Autonomous cross-domain knowledge transfer in lifelong policy gradient reinforcement learning

Lifelong learning in artificial neural networks

Communications of the ACM

Implicit regularization in deep matrix factorization

Advances in Neural Information Processing Systems

Semi-supervised multi-view clustering based on orthonormality-constrained nonnegative matrix factorization

Information Sciences

Lifelong machine learning

Synthesis Lectures on Artificial Intelligence and Machine Learning

Convex and semi-nonnegative matrix factorizations

IEEE Transactions on Pattern Analysis and Machine Intelligence

Convex multi-task feature learning

Machine Learning

Learning the shared subspace for multi-task clustering and transductive transfer classification

Multimodal face-pose estimation with multitask manifold deep learning

IEEE Transactions on Industrial Informatics

Semi-supervised non-negative matrix factorization with dissimilarity and similarity regularization

IEEE Transactions on Neural Networks and Learning Systems

Semi-supervised adaptive symmetric non-negative matrix factorization

IEEE Transactions on Cybernetics

Robust graph learning from noisy data

IEEE Transactions on Cybernetics

Symmetric nonnegative matrix factorization for graph clustering

Lifelong multi-task multi-view learning using latent spaces

Robust structured nonnegative matrix factorization for image representation

IEEE Transactions on Neural Networks and Learning Systems

Co-regularized nonnegative matrix factorization for evolving community detection in dynamic networks

Information Sciences

Optimization algorithms exploiting unitary constraints

IEEE Transactions on Signal Processing

Never-ending learning

Communications of the ACM

Efficient and robust feature selection via joint l2, 1-norms minimization

Continual lifelong learning with neural networks: A review

Neural Networks

Robust distribution-based nonnegative matrix factorizations for dimensionality reduction

Information Sciences

Continual unsupervised representation learning

Advances in Neural Information Processing Systems