Elsevier

Knowledge-Based Systems

Volume 218, 22 April 2021, 106876
Knowledge-Based Systems

Cognitive structure learning model for hierarchical multi-label text classification

https://doi.org/10.1016/j.knosys.2021.106876Get rights and content

Abstract

The human mind grows in learning new knowledge, which finally organizes and develops a basic mental pattern called cognitive structure. Hierarchical multi-label text classification (HMLTC), a fundamental but challenging task in many real-world applications, aims to classify the documents with hierarchical labels to form a resembling cognitive structure learning process. Existing approaches for HMLTC mainly focus on partial new knowledge learning or the global cognitive-structure-like label structure utilization in a cognitive view. However, the complete cognitive structure learning model is a unity that is indispensably constructed by the global label structure utilization and partial knowledge learning, which is ignored among those HMLTC approaches. To address this problem, we will imitate the cognitive structure learning process into the HMLTC learning and propose a unified framework called Hierarchical Cognitive Structure Learning Model (HCSM) in this paper. HCSM is composed of the Attentional Ordered Recurrent Neural Network (AORNN) submodule and Hierarchical Bi-Directional Capsule (HBiCaps) submodule. Both submodules utilize the partial new knowledge and global hierarchical label structure comprehensively for the HMLTC task. On the one hand, AORNN extracts the semantic vector as partial new knowledge from the original text by the word-level and hierarchy-level embedding granularities. On the other hand, AORNN builds the hierarchical text representation learning corresponding to the global label structure by the document-level neurons ordering. HBiCaps employs an iteration to form a unified label categorization process similar to cognitive-structure learning: firstly, using the probability computation of local hierarchical relationships to maintain partial knowledge learning; secondly, modifying the global hierarchical label structure based on the dynamic routing mechanism between capsules. Moreover, the experimental results on four benchmark datasets demonstrate that HCSM outperforms or matches state-of-the-art text classification methods.

Introduction

In the psychology field, the cognitive structure is a basic mental pattern for organizing a person’s full knowledge. The cognitive structure provides meaning and guide to practices, and supervises the processing of new knowledge and retrieving stored knowledge. The earliest study on cognitive structure theory is back to the 1960s provided by educational psychologists [1]. Since then, precedent researches [2], [3], [4] extensively studied how the cognitive structure works in the human mind. With the rapid development of cognitive computing, existing works [5], [6] have verified the significant effect of the cognitive structure for machine learning.

Multiple labels for documents are mostly from human annotators, and semantics embedding in labels reflect the cognitive structure of each annotator. Compared with traditional flat multi-label text classification [7], [8], HMLTC is more like the process of cognitive structure learning, and the hierarchical label structure is more like the cognitive structure in a human mind view. The task of HMLTC is to assign a document to multiple hierarchical categories, typically in which semantic labels are organized in a tree-like, or Direct Acyclic Graph (DAG) structured hierarchy [9]. Fig. 1 illustrates such an example. Nowadays, the growth in web text volume, such as social messengers, microblogs, and web forum threads, has made it urgent to develop HMLTC that facilitates understanding or organizing such text information. Faced with the actual demand, both the industry and academia have successfully utilized some applied frameworks [10], [11] to promote the HMLTC growth, such as question answering [12], online advertising [13], and scientific literature organization [14].

According to Jean Piaget and William Perry [15], human’s existing cognitive structure serves as frames of reference to guarantee and facilitate the new knowledge learning process. In turn, new knowledge learning provides the practice and development for the existing cognitive structure. Based on the cognitive structure learning process, we group existing HMLTC into two major types in a human mind view, i.e., the partial knowledge learning approach and the global label structure utilization approach. The partial knowledge learning approach focuses on local semantic relationships within the limitations of the innate hierarchical structure or predefined setting. The partial knowledge learning approach seems like people have to make sense of new knowledge as basic concepts into the existing cognitive structure when facing a new field, such as relationships between nodes for ensemble learning [16], parent–child categories for transfer learning [17], and subgraphs for recurrent neural network [18]. Those above-mentioned partial knowledge frameworks generalize and develop the hierarchical label structure, preserving the independence of each knowledge framework but suffering from the error-propagation problem [19]. In contrast, the global label structure utilization approach focuses on the existing holistic hierarchical label structure, which has the advantage that the parameters of the global label structure utilization approach are considerably less than those of the partial knowledge learning approach. The global label structure utilization approach seems like people retrieve the old cognitive structure and modify it to accommodate the new knowledge. Several strategies for HMLTC can be regarded as the global label structure utilization approach, such as reinforcement learning [20], meta-learning [21], and graph convolutional network [22]. However, those global hierarchical label structure utilization methods mainly capture the partial knowledge from the entire structure, eventually causing the underfitting problem.

Actually, the partial knowledge learning and global label structure utilization are integral mental processes for cognitive structure learning. So far, there is no HMLTC approach that devises a unified model to imitate the cognitive structure learning, developed by the interrelated influence between the partial knowledge and the global label structure. To capture the relationship between the text and the label structure, we divide the unified model into two submodules for the text representation and hierarchical multi-label prediction. Both submodules merge the partial knowledge learning into the global hierarchical label structure to reinforce each other for the HMLTC task. For the text representation, each document has an innate hierarchical structure (words form sentences, sentences form paragraphs, paragraphs form documents). Some hierarchical attention approaches [23], [24] are designed to construct a document representation based on the word level, sentence level, or sentiment level about the innate document structure. Those attention mechanisms are treated as partial knowledge extractors but ignore the corresponding global hierarchical label structure. It is challenging to embed the global hierarchical label structure into those attention mechanisms to form a complete document representation, namely integrating new knowledge into the global label structure. For the hierarchical multi-label prediction, deep learning approaches [25], [26] have achieved a significant improvement in HMLTC. Among those deep learning approaches, the capsule neural network approach [27] using the dynamic routing mechanism [28] shows the superiority over traditional HMLTC to learn part-whole relationships automatically. Extending the dynamic routing process between capsules as a cognitive structure learning to develop the cognitive-structure-like label structure is also challenging.

To address the above challenges, we proposed a comprehensive hierarchical cognitive structure learning model called HCSM to interrelate the partial knowledge learning and global label structure as a unified cognitive structure process for the HMLTC task. HCSM is composed of two submodules, i.e., the AORNN submodule for text representation and the HBiCaps submodule for hierarchical multi-label prediction. AORNN employs two basic embedding levels (the word-embedding and hierarchy-embedding levels) as the partial knowledge learning, which uses the attention mechanism on the relevant contexts. After that, the AORNN submodule integrates those partial embedding vectors into the document-embedding level using the hierarchy-ordered neurons, thus building the relationship between the text and the global hierarchical label structure. The hierarchy-ordered neurons are the set of neurons of the long short-term memory (LSTM), which modifies LSTM architecture to track the global hierarchical label structure. The high-level hierarchy representation as the high-ranking neurons will store the long-term information preserved for many steps, while the low-level hierarchy representation as the low-ranking neurons will store the short-term information forgotten in a few steps. For those hierarchy-ordered neurons, the premise to erase (or update) high-ranking neurons is to erase (or update) all lower-ranking neurons first. HBiCaps employs an iteration composed of the hierarchical top-down and bottom-up traversal fashions. The iteration develops the global hierarchical label structure using partial knowledge from the text and local hierarchical relationships. The hierarchical top-down traversal aims to exploit the local hierarchical relationships between labels as probabilities, thereby learning partial knowledge embedding in the local modularity of labels. The hierarchical bottom-up traversal extends the dynamic routing mechanism between capsules as a cognitive structure learning, thereby merging the hierarchy-embedding representation with the local modularity of labels to form a cognitive structure learning classifier. Both submodules attempt to transform the complete cognitive structure learning process into HMLTC using the unified proposed model.

Comparative studies on four text datasets have been conducted in experiments to demonstrate the effectiveness of the proposed approach. The main contributions of our paper are as follows:

(1) We propose to deploy the cognitive structure learning in HMLTC as a unified model (HCSM). The imitation of cognitive structure advances HMLTC, integrating the partial new knowledge learning to interrelate the global label structure modeling, capturing semantic text representation, and providing meaningful text multi-label categories.

(2) AORNN, as a text representation submodule of HCSM, exploits the hierarchy-ordered neurons of modified LSTM to represent and organize the document-level text representation based on the partial knowledge learning of the word-level and hierarchy-level semantics. We use the AORNN submodule to build the relationship between partial new knowledge and the global hierarchical label structure to form a comprehensive text representation.

(3) The HBiCaps submodule of HCSM employs an iteration that utilizes the local hierarchical relationships between labels in the top-down traversal fashion. In the iteration’s bottom-up traversal fashion, HBiCaps merges relevant hierarchy-embedding text representation and local hierarchical relationships as partial new knowledge into the dynamic routing mechanism between capsules. During two traversal fashions, the global hierarchical label structure is developing for the better multi-label prediction of HMLTC.

Section snippets

Hierarchical multi-label classification

Silla and Freitas [19] grouped existing hierarchical multi-label classification approaches into three major categories, i.e., the flat, local, and global approach. The flat approach [29] is the simplest one to handle the hierarchical multi-label classification as traditional flat multi-label classification, ignoring hierarchical relationships. Compared with the traditional flat multi-label framework, the hierarchical structure reserves a rich source of relationships in a tree or DAG structure 

Problem formalization

In this section, we introduce two basic definitions and then formulate the problem of HMLTC.

Definition 1 Hierarchical Structure

Suppose there is H={V,E} in a tree or DAG structure, where V is a set of nodes vi to represent multi-level labels, and E is a set of edges to represent the direct link between labels. The hierarchical set of nodes V is distributed in the label set Y={Y1,Y2,,Ym}, where Yi is the set of possible categories in the ith hierarchy and m is the total number of the hierarchical level. We define the partial

Proposed approach

Based on our problem formalization, the proposed HCSM has two major submodules: (1) the AORNN submodule G() that aims to extract a hierarchical text representation from the contexts, and (2) the HBiCaps submodule F() that predicts the hierarchical multi-label structure based on the label space. The AORNN submodule modifies and applies various LSTM models in different embedding granularity levels, using partial embedding vectors and global ranking neurons corresponding to the cognitive structure

Evaluation measures and experimental datasets

To fairly measure our algorithm with baselines, we judge the results of experiments with widely used metrics [54], [55] in the field of multi-label text classification, i.e., the macro-precision, macro-recall, macro-F1 (MaP, MaR, MaF1), and the micro-precision, recall, F1 (MiP, MiR, MiF1). MaP, MaR, and MaF1 are defined in Formula (13), and MiP, MiR, and MiF1 are defined in Formula (14) as follows: MaP=1γiNicNipMaR=1γiNicNigMaF1=2×MaP×MaRMaP+MaR MiP=iNiciNipMiR=iNiciNigMiF1=2×MiP×MiRMiP+M

Conclusions

In this paper, we deploy a unified method HCSM composed of the AORNN and HBiCaps submodules to import the cognitive structure learning for HMLTC. The AORNN submodule constructs word-embedding, hierarchy-embedding, and document-embedding levels to integrate partial word-to-hierarchy representation into a global hierarchical text representation based on hierarchy-ordered neurons. The HBiCaps submodule applies the partial text-hierarchy representation, local hierarchical relationships, and the

CRediT authorship contribution statement

Boyan Wang: Conceptualization, Methodology, Software, Investigation, Writing - original draft, Visualization, Funding acquisition. Xuegang Hu: Conceptualization, Validation, Formal analysis, Writing - review & editing, Funding acquisition. Peipei Li: Methodology, Resources, Writing - review & editing, Funding acquisition. Philip S. Yu: Conceptualization, Methodology, Resources, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported in part by the National Key Research and Development Program of China under grant 2016YFB1000901, the National Natural Science Foundation of China under grants (61976077, 62076085, 91746209), the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China under grant IRT17R32, and China Scholarship Council .

References (61)

  • GargiuloF. et al.

    Deep neural network for hierarchical extreme multi-label text classification

    Appl. Soft Comput.

    (2019)
  • BorgesH.B. et al.

    An evaluation of global-model hierarchical classification algorithms for hierarchical classification problems with single path of labels

    Comput. Math. Appl.

    (2013)
  • AusubelD.P. et al.

    Educational psychology: A cognitive view

    (1968)
  • OrtonyA. et al.

    The Cognitive Structure of Emotions

    (1990)
  • DunloskyJ. et al.

    Metacognition

    (2008)
  • CushmanF. et al.

    Finding faults: How moral dilemmas illuminate cognitive structure

    Soc. Neurosci.

    (2012)
  • LiuQ. et al.

    Exploiting cognitive structure for adaptive learning

    SIGKDD

    (2019)
  • AggarwalC.C. et al.

    A survey of text classification algorithms

  • RenZ. et al.

    Hierarchical multi-label classification of social text streams

  • LiuL. et al.

    Neuralclassifier: An open-source neural hierarchical multi-label text classification toolkit

    ACL

    (2019)
  • QuB. et al.

    An evaluation of classification models for question topic categorization

    J. Am. Soc. Inf. Sci. Technol.

    (2012)
  • AgrawalR. et al.

    Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages

    WWW

    (2013)
  • NavaneedhanC.G. et al.

    What is meant by cognitive structures? How does it influence teaching–learning of psychology

    IRA Int. J. Edu. Multidiscip. Stud.

    (2017)
  • BanerjeeS. et al.

    Hierarchical transfer learning for multi-label text classification

    ACL

    (2019)
  • PengH. et al.

    Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification

    IEEE Trans. Knowl. Data Eng.

    (2019)
  • SillaC.N. et al.

    A survey of hierarchical classification across different application domains

    Data Min. Knowl. Discov.

    (2011)
  • MaoY. et al.

    Hierarchical text classification with reinforced label assignment

    (2019)
  • WuJ. et al.

    Learning to learn and predict: A meta-learning approach for multi-label classification

    (2019)
  • ZhouJ. et al.

    Hierarchy-aware global model for hierarchical text classification

    ACL

    (2020)
  • YangZ. et al.

    Hierarchical attention networks for document classification

    NAACL

    (2016)
  • Cited by (37)

    • Multi-Aspect co-Attentional Collaborative Filtering for extreme multi-label text classification

      2023, Knowledge-Based Systems
      Citation Excerpt :

      However, the fundamental textual information extractor used by [4] is too simple because of its excessive pursuit of model lightness. To build a relation graph in a label set, some other models [28–30] tried to embed labels to search to consider the similarity within their feature space. For example, AnneXML [31] treated this problem as a weak-supervised task and employed KNN [32] on the label to get fewer available label candidates.

    • Class-imbalanced positive instances augmentation via three-line hybrid

      2022, Knowledge-Based Systems
      Citation Excerpt :

      However, in the real world, datasets all exhibit various forms of irregularity, which cause the classifier to fail to learn useful knowledge in the training set, thereby reducing its classification performance [2,3]. In various fields, there have been some studies on the class-imbalance problem, including but not limited to fault diagnosis [4,5], network intrusion detection [6,7], text classification [8], and fraud detection [9,10]. Over the past decade, a number of solutions have been proposed to deal with the class-imbalance problem.

    • Hierarchical classification for account code suggestion

      2022, Knowledge-Based Systems
      Citation Excerpt :

      More advanced NLP techniques such as BERT (Bidirectional Encoder Representations from Transformers) have been widely successful for other invoice processing tasks i.e., extracting relevant data from images of invoices and payments [21,22]. Several studies focus on solving hierarchical multi-label classification problems due to its applicability to common tasks such as protein function prediction [14] and document classification or annotation [23]. Many of these methods can be also used in the hierarchical single-label context.

    View all citing articles on Scopus
    View full text