Deep neural network for hierarchical extreme multi-label text classification

doi:10.1016/j.asoc.2019.03.041

Applied Soft Computing

Volume 79, June 2019, Pages 125-138

https://doi.org/10.1016/j.asoc.2019.03.041 Get rights and content

Highlights

•
Deep Neural Network architecture for extreme multilabel text classification.
•
Multi-label classification problem with a huge label space hierarchically organized.
•
Comparison among different word-embeddings methods for text representation.
•
Definition of a method for label set expansion exploiting the label hierarchy.
•
Experimental assessment based on flat and hierarchical measures.

Abstract

The classification of natural language texts has gained a growing importance in many real world applications due to its significant implications in relation to crucial tasks, such as Information Retrieval, Question Answering, Text Summarization, Natural Language Understanding. In this paper we present an analysis of a Deep Learning architecture devoted to text classification, considering the extreme multi-class and multi-label text classification problem, when a hierarchical label set is defined. The paper presents a methodology named Hierarchical Label Set Expansion (HLSE), used to regularize the data labels, and an analysis of the impact of different Word Embedding (WE) models that explicitly incorporate grammatical and syntactic features. We evaluate the aforementioned methodologies on the PubMed scientific articles collection, where a multi-class and multi-label text classification problem is defined with the Medical Subject Headings (MeSH) label set, a hierarchical set of 27,775 classes. The experimental assessment proves the usefulness of the proposed HLSE methodology and also provides some interesting results relating to the impact of different uses and combinations of WE models as input to the neural network in this kind of application.

Graphical abstract

Introduction

The classification of natural language texts is a key aspect in many tasks and in different domains. This kind of classification problem consists in applying one or more labels to each document of a text collection. In literature this task has been approached by means of several different techniques, ranging from ontology-based methods to Machine Learning (ML) systems, or through the adoption of hybrid approaches integrating ontological knowledge and ML [1], [2]. The increase of computational power and the availability of huge amounts of data, along with the active research and developments in the field of Deep Neural Networks (DNN), have recently led to the definition of DNN models able to outperform the previous state of the art systems.

In accordance with the literature, it is possible to identify a taxonomy of Natural Language text classification problems, composed of the following four classes:

•
Binary Classification, where the labels belong to a binary set (Positive and Negative, True and False, etc.);
•
Multi-Class Classification, where the single classification label belongs to a set with more than two elements;
•
Multi-Label Classification, when the labels belong to a multi-class domain, but differently from the previous case, each document can be tagged with a variable number of labels, ranging from one to a total class number; and
•
Extreme Multi-Label Text Classification (XMTC) [3] refers to the automatic assignment of the most relevant subset of labels to a text document, but differently from the classic multi-label problem where the label set size is usually in the order of ten, in this case the labels belong to an extremely large set, in the order of thousands, or ten of thousands of elements. If the label set is hierarchically organized, a hierarchical XMTC problem is defined.

The huge XMTC label space raises many research challenges, such as data sparsity and scalability. The availability of Big Data and the application of XMTC to real world problems have attracted a growing attention of researchers from ML and Deep Learning (DL) fields. Significant advances in multi-label classification methodologies have been made in recent years, thanks to the development of specific ML methods, although DL methods have not yet been widely explored on account of this particular problem.

In this paper we analyze a DL approach based on a Convolutional Neural Network (CNN), devoted to the hierarchical XMTC problem. We define a methodology that expands the label set of each document integrating all the missing labels along the label hierarchy. This operation is necessary because usually only the leaves of the tree and a few labels along the hierarchy are considered for indexing purposes by human experts who manually label the documents. The lack of all the classes along the hierarchy can lead to an incorrect training of the DNN, due to label inconsistencies.

We also analyze the impact of the use and combination of different types of embeddings for the representation of the input training text. In more detail, we evaluate the impact of semi-supervised embedding models. These latter models are able to explicitly infer grammatical and syntactic information in the obtained word vectors and can provide a performance boost in other tasks, such as Word Analogy/Similarity Querying, Named Entity Recognition (NER), Relation Extraction and Sentence Classification [4], [5], [6], [7].

All the results have been evaluated using the PubMed¹ scientific articles collection as a test case. PubMed is a search engine maintained by the US National Library of Medicine (NLM), specifically devoted to medical and biological scientific papers. We have considered only the text of the title and the abstract of each paper, along with the corresponding labels, due to their free availability in PubMed. Each paper has been manually tagged by domain experts with a variable number of classes from the (MeSH) set, a hierarchical label set characterized by a total number of labels equal to 27,775. For these reasons, the automatic classification of PubMed papers with MeSH belongs to the hierarchical XMTC case.

The automatic classification of PubMed papers is also a task required by the NLM in order to help the domain experts in their tedious and time-consuming work. To achieve this objective, the NLM supports BioASQ,² a distributed challenge for the research community; one of the aims of BioASQ is the advance of the state of the art systems devoted to the automatic application of MeSH to PubMed indexed articles [8].

The XMTC problem is involved in many real world applications, such as the one above described, confirming the utility of any efforts focused on searching for new solutions. The results of our experiments prove the usefulness of the proposed HLSE method and provide many interesting findings resulting from the analysis of the different performances of the neural network in relation to the embedding models used. This analysis could also be considered as a starting point for an emerging problem, which may be addressed by what is called explainable-AI [9], [10], namely that of correlating the input data representation and the label structure with their impact on the DL model performance.

The paper is organized as follows: in the next section an overview of the current state of the art is presented; then, all the details of the DNN used and the proposed methodologies are explained, followed by the experimental results, where the dataset details, a description of the evaluation measures and the instruments used to implement the whole architecture are also included; and finally, the obtained results are discussed and analyzed, in comparison with the state of the art.

Section snippets

Related works

The text classification problem has been addressed in the literature with many different approaches [1]. Some of them are based on ML methods with manual feature engineering, such as Latent Dirichlet Allocation (LDA) or K-Nearest Neighborhood (K-NN) [11], [12]. More recently, various DNN approaches have been proposed, obtaining very promising results. In [13] a simple Neural Network (NN) approach for large-scale multi-label text classification has been presented, evidencing the usefulness of

Methodology

In this Section we first provide a brief overview of the DNN architecture that, for the sake of clarity, we have divided into three main modules: an Embeddings module for text encoding, a Feature Extraction module implemented through CNNs and a Classification module composed of fully connected neural networks. We also highlight the details of the loss function and the hyper-parameters used to train the network. Next, we describe the details of the proposed Hierarchical Label Set Expansion

Experimental results

In this Section we first describe all the characteristics of the datasets used for the experimental assessment. Next, we report the details of the systems used for the implementation of the proposed architecture are described, listing and explaining all the corresponding parameter settings. We also describe the methods used to obtain the unsupervised and semi-supervised embedding models described in the previous Section 3. Next, we provide a complete overview of the evaluation metrics for the

Conclusions

In this paper we have presented an analysis of a Deep Learning architecture devoted to text classification, considering the extreme multi-class and multi-label text classification problem, when a hierarchical label set is defined. We have described a methodology named Hierarchical Label Set Expansion (HLSE) used to regularize the data labels and have reported an analysis of the impact of the use and combination of different Word Embedding (WE) models that explicitly incorporate grammatical and

References (67)

MirończukM.M. et al.
A recent overview of the state-of-the-art elements of text classification
Expert Syst. Appl.
(2018)
WangJ. et al.
Biomedical event trigger detection by dependency-based word embedding
BMC Med. Genomics
(2016)
PavlinekM. et al.
Text classification method based on self-training and lda topic models
Expert Syst. Appl.
(2017)
LinJ. et al.
Pubmed related articles: a probabilistic topic-based model for content similarity
BMC Bioinformatics
(2007)
PolyakB.T.
Some methods of speeding up the convergence of iteration methods
USSR Comput. Math. Math. Phys.
(1964)
GargiuloF. et al.
A clustering based methodology to support the translation of medical specifications to software models
Appl. Soft Comput.
(2018)
SokolovaM. et al.
A systematic analysis of performance measures for classification tasks
Inf. Process. Manage.
(2009)
MoskovitchR. et al.
Multiple hierarchical classification of free-text clinical guidelines
Artif. Intell. Med.
(2006)
AlicanteA. et al.
A distributed architecture to integrate ontological knowledge into information extraction
Int. J. Grid Utility Comput.
(2016)
LiuJ. et al.
Deep learning for extreme multi-label text classification

KomninosA. et al.

Dependency based embeddings for sentence classification tasks

LevyO. et al.

Dependency-based word embeddings

A. Trask, P. Michalak, J. Liu, sense2vec - A fast and accurate method for word sense disambiguation in neural word...

NentidisA. et al.

Results of the fifth edition of the BioASQ Challenge

HolzingerA. et al.

Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI

GoebelR. et al.

Explainable AI: the new 42?

ZhangY. et al.

LF-LDA: a topic model for multi-label classification

NamJ. et al.

Large-scale multi-label text classification - revisiting neural networks

HughesM. et al.

Medical text classification using convolutional neural networks

CoRR

(2017)

D. Yogatama, C. Dyer, W. Ling, P. Blunsom, Generative and Discriminative Text Classification with Recurrent Neural...

SchwenkH. et al.

Very deep convolutional networks for text classification

YanY. et al.

LSTM $^{2}$ : multi-label ranking for document classification

Neural Process. Lett.

(2018)

WangY. et al.

Recurrent residual learning for sequence classification

ChenG. et al.

Ensemble application of convolutional and recurrent neural networks for multi-label text categorization

BaumelT. et al.

Multi-label classification of patient notes: case study on ICD code assignment

NigamP.

Applying deep learning to ICD-9 multi-label classification from medical records

(2017)

PengH. et al.

Large-scale hierarchical text classification with recursively regularized deep graph-cnn

MorkJ.G. et al.

Recent enhancements to the NLM medical text indexer

ZavorinI. et al.

Using learning-to-rank to enhance NLM medical text indexer results

AronsonA.R.

Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program

PengS. et al.

Deepmesh: deep semantic representation for improving large-scale mesh indexing

Bioinformatics

(2016)

PengS. et al.

Meshlabeler and DeepMeSH: recent progress in large-scale mesh indexing

LeQ.V. et al.

Distributed representations of sentences and documents

Cited by (115)

Strategies and conditions for crafting managerial responses to online reviews
2024, Tourism Management
The present study investigates how managerial responses to online reviews can help managers maintain relationships with past and future customers, exploring the question through the lens of the uncertainty reduction theory and the rapport management model. The present work crawled 446,663 customer reviews and 96,633 tour managerial responses on Ctrip.com using Python. Through randomly selecting 1000 responses, Study 1 manually identified nine managerial response strategies to customer online reviews. The Bidirectional Encoder Representations from Transformers (BERT) model was then adopted to automatically label the strategies used in all of the managerial responses. Employing negative binomial regression models, Study 2 then examined the interactions between attributes of customer reviews and managerial responses as a method for estimating helpfulness votes. The results indicate that excessively lengthy, highly templated, and unfocused managerial responses to customer reviews can dampen the relationship between customers’ information processing and their perception of the helpfulness of online reviews.
SGBA: A stealthy scapegoat backdoor attack against deep neural networks
2024, Computers and Security
Outsourced deep neural networks have been demonstrated to suffer from patch-based trojan attacks, in which an adversary poisons the training sets to inject a backdoor in the obtained model so that regular inputs can be still labeled correctly while those carrying a specific trigger are falsely given a target label. Due to the severity of such attacks, many backdoor detection and containment systems have recently, been proposed for deep neural networks. One major category among them are various model inspection schemes, which hope to detect backdoors before deploying models from non-trusted third-parties. In this paper, we show that such state-of-the-art schemes can be defeated by a so-called Scapegoat Backdoor Attack, which introduces a benign scapegoat trigger in data poisoning to prevent the defender from reversing the real abnormal trigger. In addition, it confines the values of network parameters within the same variances of those from clean model during training, which further significantly enhances the difficulty of the defender to learn the differences between legal and illegal models through machine-learning approaches. Our experiments on 3 popular datasets show that it can escape detection by all five state-of-the-art model inspection schemes. Moreover, this attack brings almost no side-effects on the attack effectiveness and guarantees the universal feature of the trigger compared with original patch-based trojan attacks.
Label correlations-based multi-label feature selection with label enhancement
2024, Engineering Applications of Artificial Intelligence
Feature selection, as an important pre-processing technique, can efficiently mitigate the issue of “the curse of dimensionality” by selecting discriminative features especially for multi-label learning, a discriminative feature subset can improve the classification accuracy. The existing feature selection methods for multi-label classification address the problem of label ambiguity by with logical labels. However, the significance of each label is often different in many practical applications. Using logical label to train the model may result in unsatisfactory performance due to not considering the importance of related labels with each sample. To address this issue, a novel multi-label feature selection algorithm is proposed with two-step: label enhancement and label correlations-based feature selection with label enhancement. In the step of label enhancement, a framework of label enhancement based on deep forest is utilized to transform the logical label to label distribution, which contains rich semantic information and then guides a more correct exploration of semantic correlations. In the step of feature selection, a novel multi-label feature selection algorithm is proposed based on label distribution data. Firstly, the samples are divided into multiple different clusters by using spectral clustering in the label space. Then, the label correlations can be reflected by multiple different clusters. Finally, the $l_{2, 1}$ -norm is used to construct an objective function to achieve multi-label feature selection. Experimental results demonstrate that competitiveness of the proposed algorithm over six state-of-the-art multi-label feature selection algorithms on eighteen benchmark datasets in terms of six widely accepted evaluation metrics.
High-accuracy recognition of gas–liquid two-phase flow patterns: A Flow–Hilbert–CNN hybrid model
2023, Geoenergy Science and Engineering
In multiphase flow pipelines, the timely and precise identification of flow pattern characteristics is crucial to guide industrial production, assess pipeline safety, and facilitate early risk warnings. Current research highlights that the combined application of flow imagery and convolutional neural networks (CNNs) can yield high accuracy flow pattern recognition. However, in the industrial sector, conveniently measurable signals such as pressure and flow rate present as one-dimensional signals rather than two-dimensional signals such as images and matrices, which CNNs are equipped to process. Therefore, the selection of a suitable data encapsulation type to convert one-dimensional signals into matrix or image form is vital to the successful application of CNNs in this field and to the precise identification of flow patterns in industrial production. In this research, we propose a Flow–Hilbert–CNN hybrid model, which integrates the one-dimensional liquid holdup signal, the Hilbert curve technique, and CNNs. The experimental data from multiphase flow in an undulating pipeline with low liquid holdup demonstrates that this hybrid model provides excellent classification accuracy, the model effectively identified four flow patterns—slug flow, pseudo-slug flow, wavy-stratified flow, and smooth-stratified flow—with recognition accuracies of 100.0%, 90.38%, 93.07%, and 100.0%, respectively, from a 16,384-length data source. Compared to models that employ standard data encapsulation methods (Space Folding (SF) and Fast Fourier Transform (FFT)) combination with CNNs, this hybrid model exhibits remarkable recognition accuracy and superior adaptability to data source scales. Notably, in the case of small-scale datasets, the accuracy improvement was 43.24% and 35.89% in comparison with SF- and FFT-based models, respectively. Moreover, the recognition accuracy of the hybrid model shows significant improvement when compared to results from traditional machine learning models and deep neural networks combined with the liquid holdup signal. The hybrid model, when used with different CNNs—MobileNetV2, Resnet18, and VGG16—yields recognition accuracies of 95.87%, 97.76%, and 96.64%, respectively, while the accuracies of machine learning models—deep neural network, k-nearest neighbors algorithm, and support vector machine—stand at 91.35%, 74.12%, and 90.57%, respectively. These results convincingly demonstrate the universality of combining the Hilbert curve with CNNs in the hybrid model, as well as the model's reliability and superiority in industrial application. This hybrid model can facilitate online, high-precision monitoring of flow patterns in the industrial sector and aid in preventing risk accidents within the production process.
MFSJMI: Multi-label feature selection considering join mutual information and interaction weight
2023, Pattern Recognition
Multi-label feature selection captures a reliable and informative feature subset from high-dimensional multi-label data, which plays an important role in pattern recognition. In conventional information-theoretical based multi-label feature selection methods, the high-order feature relevance between feature and label set is evaluated using low-order mutual information. However, existing methods do not establish the theoretical basis for the low-order approximation. To fill this gap, we first identify two underlying assumptions based on high-order label distribution: Label Independence Assumption (LIA) and Paired-label Independence Assumption (PIA). Second, we systematically analyze the strengths and weaknesses of two assumptions and introduce joint mutual information to satisfy more realistic label distribution. Furthermore, by decomposing joint mutual information, an interaction weight is proposed to consider multiple label correlations. Finally, a new method considering join mutual information and interaction weight is proposed. Comprehensive experiments demonstrate the effectiveness of the proposed method on various evaluation metrics.
Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach
2023, Applied Soft Computing
Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into multiple categories. We propose SPEMC-15K-E and SPEMC-15K-S, two novel datasets with approximately 15K emails each in English and Spanish, respectively, and we label them using agglomerative hierarchical clustering into 11 classes. We evaluate 16 pipelines, combining four text representation techniques -Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words, Word2Vec and BERT- and four classifiers: Support Vector Machine, Näive Bayes, Random Forest and Logistic Regression. Experimental results show that the highest performance is achieved with TF-IDF and LR for the English dataset, with a F1 score of 0.953 and an accuracy of 94.6%, and while for the Spanish dataset, TF-IDF with NB yields a F1 score of 0.945 and 98.5% accuracy. Regarding the processing time, TF-IDF with LR leads to the fastest classification, processing an English and Spanish spam email in $2 ms$ and $2.2 ms$ on average, respectively.

View all citing articles on Scopus

^☆: This paper is an extended and improved version of the paper: Francesco Gargiulo, Stefano Silvestri and Mario Ciampi, Deep Convolution Neural Network for Extreme Multi-label Text Classification, presented at the AI4Health 2018 workshop and published in: BIOSTEC 2018, Proceedings of the $1 1^{t h}$ International Joint Conference on Biomedical Engineering Systems and Technologies, Volume 5: HEALTHINF, Funchal, Madeira, Portugal, 19-21 January, 2018, pp. 641-650, ISBN: 978-989-758-281-3, INSTICC, 2018.

View full text

Deep neural network for hierarchical extreme multi-label text classification☆