Elsevier

Information Sciences

Volume 585, March 2022, Pages 41-57
Information Sciences

Three-way multi-granularity learning towards open topic classification

https://doi.org/10.1016/j.ins.2021.11.035Get rights and content

Abstract

Traditional topic classification usually adopts the closed-world assumption that all the test topics have been seen in training. However, in open dynamic environments, the potential new topics may appear in testing due to the evolution of text data over time. Considering the uncertainty and multi-granularity of dynamic text data, such open topic classification needs to detect unseen topics by mining the boundary region continually, and incrementally update the previous models by knowledge accumulation. To address these challenge issues, this paper introduces a unified framework of three-way multi-granularity learning to open topic classification based on the fusion of three-way decision and granular computing. First, we propose the multilevel granular structure of tasks from the temporal-spatial multi-granularity perspective. Then, we construct an adaptive decision boundary and use the centroids and the corresponding radius to discover unknowns by the reject option. Subsequently, we further explore the unknown topics by three-way enhanced clustering and the uncertain instances will be re-investigated in the next stage. Besides, we design a built-in knowledge base represented as the centroid of each topic to store the topic knowledge. Finally, the experiments are conducted to compare the performances of proposed models and the efficiency of knowledge accumulation with classic models.

Introduction

Traditional supervised learning focuses on deterministic conditions and closed world assumption [32], [1], which means that the classes appeared in the test set must be known in the training set [12], [3]. However, this assumption may often be violated in real-world applications. For instance, in open-world object recognition, new objects may appear constantly, and a classifier built from old objects may incorrectly classify a new object as one of the old objects [33]. This situation calls models to be more robust and adapt to the open dynamic environment, such as changes of value/attribute and even the appearance of new categories. Hence, this will present more significant challenges and broader application prospects for current machine learning researches and developments. Open dynamic situations call for on-the-job learning, as opposed to traditional closed world environment. On-the-job learning proposed by Liu [26], which refers to learning after the model has been deployed in an application or during model application. Here, Liu [26] defined an open dynamic system should (1) discover unknowns and create new learning tasks from the unknowns, (2) collect training or ground-truth data through interactions with users and the environment by imitation of humans or other agents, and (3) incrementally learn the new tasks. The whole process is also needed to be carried out on the fly in a self-motivated and self-supervised manner.

The text classification by topics is helpful for searching, data mining, and text analysis. However, topic classification is time-consuming and error-prone, especially the open dynamic tasks such as the dialog system and real-time news reports. Most of the existing state-of-the-art methods rely on supervised algorithms with fixed training data and view tasks in isolation rather than looking at such tasks as a whole. The data is constantly changing for open topic classification tasks, so it creates uncertainty in open classes and the learned knowledge. As shown in Fig. 1, we have four available tags (known classes) for specific topics, such as exchange charge, cancel a transfer, pending top-up, and verify identity. However, there are also texts with open/unknown topics. From the perspective of multi-granularity learning, the open topic classification tasks can be divided into multiple granularity levels. In the coarser granularity level, it is necessary to distinguish these texts from the known topics as much as possible, and all these unknown topics will be considered a whole. In the finer granularity, known topics and unknown topics will be further processed according to the current feature space. The processing procedure of open topic classification tasks can be viewed as a granularity construction process that discovers knowledge from coarser granularity to finer granularity. This paper tries to connect three-way multi-granularity learning with open dynamic learning, and then deal with the uncertainty to empower the ability of learning continually in open topic classification.

For open topic classification tasks, three underlying challenges remain to be addressed. First, based on the open-world assumption, the uncertainty in an open dynamic environment needs to be further studied. Second, most of the advanced topic classification systems center on using complex structures to capture the information, which requires a long time to converge during their training stage. And last, a desirable open dynamic model can capture knowledge at different granularity levels to embody the granularity change of dynamic data. To address these issues, we propose three-way multi-granularity learning towards open topic classification (TWMG-Open) model. In this work, we follow the open-world assumption and the framework of three-way multi-granularity learning has been demonstrated as an effective method for open problems. Also, searching for an appropriate granularity level for decision or classification is a crucial problem [19]. This paper deliberates open topic classification with the framework of three-way multi-granularity and constructs decision-making processes according to different granularity levels of dynamic data.

In this work, the open topic classification task has been discussed in light of three-way multi-granularity learning. Besides, it tackles the entire process from open detection/discovery to open classification and considers different granularities for different kinds of problems. By constructing a built-in knowledge base, the ability of continual learning is formed and the three-way multi-granularity learning is utilized to enhance the accuracy and validity of the learned knowledge through knowledge accumulation.

The remainder of this paper is organized as follows. In Section 2, a review of three-way multi-granularity learning and open topic classification are presented. Section 3 constructs a framework of three-way multi-granularity learning towards open topic classification and introduces each part of the proposed model accordingly. Section 4 designs a series of experiments and provides the experimental analysis. Finally, the conclusion is given in Section 5.

Section snippets

Three-way decision and three-way multi-granularity learning

Three-way decision (3WD) proposed by Yao [38] is initially to describe the three regions of decision-theoretic rough sets, and further be widely investigated as a philosophy of thinking in three, a methodology of working with three, and a mechanism of processing through three [42], [40]. In the traditional two-way classification models, an object is assigned to only two regions: the positive region for positive instances, and the negative region for negative instances. However, either region

Proposed model

With the inspiration of the open-world learning paradigm, we combine multi-granularity learning with open topic classification to construct a dynamic three-way multi-granularity enhanced open topic classification model (TWMG-Open). Given the unknown topic classes arising constantly, TWMG-Open detects unknown topics from known topics in the coarser granularity and then figures out the space feature of unknowns and how new document collections with unknown topics affected the knowledge base in

Experiments

A series of experiments were conducted to demonstrate the effectiveness of proposed cost-sensitive three-way multi-granular open topic classification method. All the experiments were performed on a computer with Intel Xeon E5-2678 v3 and NVIDIA GeForce RTX 3090. The Python version is 3.7 for Windows OS x64.

Conclusions

For an open dynamic task, we are interested in detecting and managing the uncertainty, and implementing the three-way multi-granularity structure in knowledge accumulation. On the basis of three-way multi-granularity learning, the open topic classification was investigated in different levels of granularity to conduct a more efficient approach of learning knowledge continually. Compared with traditional static open systems, we used three datasets to conduct a series of experiments, which showed

CRediT authorship contribution statement

Xin Yang: Conceptualization, Methodology, Writing – original draft. Yujie Li: Software, Writing – original draft. Dan Meng: Writing – review & editing. Yuxuan Yang: Software, Writing – review & editing. Dun Liu: Writing – review & editing. Tianrui Li: Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Nos. 61773324, 61876157), the Humanity and Social Science Youth Foundation of Ministry of Education of China (No. 20YJC630191), the Fintech Innovation Center of Southwestern University of Finance and Economics, and the Financial Intelligence & Financial Engineering Key Laboratory of Sichuan Province.

References (50)

  • Y.Y. Yao

    Three-way decision and granular computing

    Int. J. Approximate Reasoning

    (2018)
  • Y.Y. Yao

    Three-way granular computing, rough sets, and formal concept analysis

    Int. J. Approximate Reasoning

    (2020)
  • H. Yu et al.

    A three-way clustering method based on an improved dbscan algorithm

    Phys. A

    (2019)
  • H. Yu et al.

    A tree-based incremental overlapping clustering method using the three-way decision theory

    Knowl.-Based Syst.

    (2016)
  • C. Zhang et al.

    Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation decision-theoretic rough sets over two universes

    Inf. Sci.

    (2020)
  • L.B. Zhang et al.

    Sequential three-way decision based on multi-granular autoencoder features

    Inf. Sci.

    (2020)
  • Y.B. Zhang et al.

    A cost-sensitive three-way combination technique for ensemble learning in sentiment classification

    Int. J. Approximate Reasoning

    (2019)
  • Y.B. Zhang et al.

    Three-way enhanced convolutional neural networks for sentence-level sentiment classification

    Inf. Sci.

    (2019)
  • A. Bendale et al.

    Towards open world recognition

  • I. Casanueva et al.

    Efficient intent detection with dual sentence encoders

  • Z.Y. Chen et al.

    Lifelong machine learning

    Synthesis Lectures Artif. Intell. Mach. Learn.

    (2018)
  • M.J. Cheok et al.

    A review of hand gesture and sign language recognition techniques

    Int. J. Mach. Learn. Cybern.

    (2019)
  • C. Chow

    On optimum recognition error and reject tradeoff

    IEEE Trans. Inf. Theory

    (1970)
  • A. Cohan et al.

    Structural scaffolds for citation intent classification in scientific publications, in

  • D. Comaniciu et al.

    Mean shift: A robust approach toward feature space analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • Cited by (12)

    • A review of sequential three-way decision and multi-granularity learning

      2023, International Journal of Approximate Reasoning
      Citation Excerpt :

      Therefore, three-way multi-granularity continual learning in open dynamic environment may a potential research topic in future. In fact, there has been a small amount of researches on this topic, Yang et al. [121] studied the three-way multi-granularity learning towards open topic classification and Li et al. [49] proposed a sequential three-way decision model based on continual learning to consider a situation where a system needs to learn new categories after a change of environment. There are three types of continual learning [92], namely class incremental continual learning, task incremental continual learning and domain incremental continual learning.

    View all citing articles on Scopus
    View full text