Deep model with neighborhood-awareness for text tagging

https://doi.org/10.1016/j.knosys.2020.105750Get rights and content

Abstract

In recent years, many efforts based on deep learning have been made to address the issue of text tagging. However, these work generally neglect to consider the neighborhood effect which may help improve the accuracy of predictions. For this, we present a neighborhood-aware deep model for text tagging (NATT). A neural component which combines bi-directional recurrent neural network and self-attention mechanism, is firstly selected as the text encoder to encode the target document into one feature vector. Then, k-nearest-neighbor documents of the target document are identified and encoded into feature vectors one by one with the same text encoder. Simultaneously, an independent attention module is introduced to aggregate these neighboring documents into a special feature vector, which will represent features of the neighborhood. Finally, the two feature vectors are fused to match the embedding vectors of tags. To optimize the NATT model, we build the objective function with pairwise hinge loss and specially develop a neighborhood-aware negative sampling strategy to form training data. Experimental results on four datasets demonstrate that NATT outperforms some state-of-the-art neural models. Additionally, NATT is economical on achieving the best results with less training epochs and a smaller number of nearest neighbors.

Introduction

With the advent of Web 2.0 era, people can use a variety of application interfaces to generate a large amount of textual information. Textual documents are part of the prominent carriers to facilitate information sharing and propagation. The purpose of text tagging is to suggest tags for these documents through human or intelligent algorithms, so as to facilitate utilization. To achieve automatic emergence of document semantics, social tagging becomes popular with the help of social media [1], such as Delicious, CiteULike, Folksonomy and so on. However, the tags/labels generated by human may be casual and diverse, which is not conducive to information organization. Other ways concentrate on automatic text tagging, which can be divided into two major sub-categories: multilabel classification and collaborative representation learning. Multilabel classification is to transform sparse and high dimensional documents into dense vector representations, and then classify them to the given labels through an activation function [2]. Generally, the labels are ranked by their assigned probability values. This leads to that classification methods cannot distinguish the polarity of tags corresponding to the target documents, so it cannot effectively optimize the ranking of tags, which is a key issue regarding the top-n recommendation tasks [3]. In contrast, collaborative representation learning aims at mapping both documents and labels into a latent feature space and then performs matching using a distance or similarity metric [4], [5]. The model can utilize different loss functions in combination with negative sampling to fulfill ranking optimization [4], [6].

Document modeling techniques are usually shared by multilabel classification and collaborative representation learning. Traditional modeling approaches, such as vector space model and topic modeling, are built on “bag of words model (BoW)”. The BoW ignores the contexts of words, such as surrounding words of the word and word order, which are the important features to improve word or document representation learning [7]. To overcome the shortcomings of the BoW, a major trend is to explore document modeling with neural models, such as convolutional neural networks [8], [9], recurrent neural networks [10], hierarchical attention network [11], and direct document embedding [12]. These efforts follow the same research paradigm, that is, constantly improving neural models to strengthen the effect of document representation. The neighborhood effect (that is, similar documents tend to use similar or even the same tags), however, is always neglected when designing neural models.

The principle of neighborhood effect has been well studied in item-based collaborative filtering [13], and also adjusted in constructing the widely-used k-nearest neighbors classifier for the task of document classification [14]. It has shown advantages of robustness and simplicity in both domains [13]. Obviously, the potential benefits can be expected by introducing neighborhood effect into the existing neural classification models. In addition, the traditional k-nearest neighbors model does not involve a learning phase, leads to that the performance seriously depends on the weighting strategy of nearest neighbors, where a pre-defined similarity measurement is always required to measure closeness of the nearest neighbors [15].

In order to explore the neighborhood effect in neural classification models and solve the weighting issue of nearest-neighbors by the way, we propose a neighborhood-aware deep model to enhance the effectiveness of text tagging. To focus on the nature of work, we choose a popular neural network component, which combines the bi-directional recurrent neural network and self-attention mechanism [16] to encode a target document into one feature vector. To inject the neighborhood information into the model, k-nearest-neighbors of the target document is identified and encoded one by one with the same text encoder. Simultaneously, the feature vectors of neighboring documents are synthesized into another feature vector to represent features from the neighborhood. In particular, an independent attention module is employed to automatically assign weight to overcome the difficulty in weighting nearest neighbors. Finally, the two feature vectors are fused to match the embedding vectors of tags and then to generate recommendations. We construct the objective function with a pairwise hinge loss for model training. Especially, a simple yet effective neighborhood-aware strategy is proposed in sampling negative tags to optimize the ranking of tags. Intensive experiments have been conducted and revealed the merits of our proposed method.

The main contributions of this paper are summarized as below:

  • We propose NATT which exploits the neighborhood effect on both text encoding and negative sampling for text tagging.

  • We show that neighborhood-awareness strategy can significantly improve the accuracy of tag predictions through intensive experiments.

  • NATT is economical on achieving the best results with less training epochs and a smaller number of nearest neighbors.

The remainder of this paper is arranged as follows: we present related works in Section 2. In Section 3, we detail theneighborhood-aware neural model and discuss how to learn parameters with negative sampling. Experimental results and discussion are presented in Section 4. We draw conclusions and point out future works in Section 5.

Section snippets

Related works

As for social tagging, users apply public tags to online items, typically to make those items easier for themselves or others to find later. However, social tagging is expensive, and there are also cases of incorrect tags and inconsistencies caused by subjectivity, which brings difficulties to organize the documents. Also, without users’ participation, the cold-start problem cannot be solved. Instead, we focus on the related works of automatic text tagging. In our opinion, such work basically

Neural component for text encoding

Many models can be used to extract semantic representation of text (a.k.a encoding text) with the booming of deep learning. To focus on developing a neighborhood-aware prediction model for text tagging, we just choose a popular text-encoder, which combines bi-directional recurrent neural network (RNN) and self-attention mechanism [16]. On one hand, the semantics of words without importance can be ignored or kept small values with RNN, thus leaving more important semantics. Further,

Experiments

In this section, we conducted extensive experiments aiming to answer the following research questions:

  • 1.

    RQ1 What is the performance when comparing our NATT model with the state-of-the-art methods?

  • 2.

    RQ2 How the impact of training epochs is on the prediction performance w.r.t the NATT model?

  • 3.

    RQ3 How the impact of the number of nearest neighbors k is on the prediction performance?

Conclusions and future works

We have proposed NATT which exploits the neighborhood effect of documents on both text encoding and negative sampling for text tagging. Experimental results reflect that NATT is effective in the task of top-n recommendation of tags. Also, NATT shows its merits regarding computational efficiency, as it requires less training epochs and a smaller number of nearest neighbors whenever achieving the best results.

Unlike the existing works which focus on developing deep neural networks for text

CRediT authorship contribution statement

Shaowei Qin: Conceptualization, Methodology, Software, Data curation, Investigation, Formal analysis, Writing - original draft. Hao Wu: Supervision, Resources, Conceptualization, Methodology, Investigation, Project administration, Funding acquisition, Writing - original draft. Rencan Nie: Methodology, Writing - review & editing. Jun He: Resources, Writing - review & editing.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61962061, 61562090, U1802271), partially supported by the Yunnan Provincial Foundation for Leaders of Disciplines in Science and Technology, Top Young Talents of “Ten Thousand Plan” in Yunnan Province, China, the Program for Excellent Young Talents of Yunnan University, China, the Project of Innovative Research Team of Yunnan Province, China (2018HC019).

References (33)

  • Y. Kim, Convolutional neural networks for sentence classification, in: Proceedings of the 2014 Conference on Empirical...
  • LiuJ. et al.

    Deep learning for extreme multi-label text classification

  • P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, B. Xu, Text classification improved by integrating bidirectional LSTM with...
  • Z. Yang, D. Yang, C. Dyer, X. He, A.J. Smola, E.H. Hovy, Hierarchical attention networks for document classification,...
  • E. Grave, T. Mikolov, A. Joulin, P. Bojanowski, Bag of tricks for efficient text classification, in: Proceedings of the...
  • DacremaM.F. et al.

    Are we really making much progress? a worrying analysis of recent neural recommendation approaches

  • Cited by (5)

    • Enhancement of DNN-based multilabel classification by grouping labels based on data imbalance and label correlation

      2022, Pattern Recognition
      Citation Excerpt :

      In the proposed method, multiple BiGRU blocks were connected to a concatenation layer. Qin et al. [34] proposed using BiGRU and a self-attention mechanism to tag texts. Andrade et al. [35] suggested using a long short-term memory neural network (LSTM) to detect five different types of malware.

    • Effective Collaborative Representation Learning for Multilabel Text Categorization

      2022, IEEE Transactions on Neural Networks and Learning Systems

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105750.

    View full text