Deep model with neighborhood-awareness for text tagging☆
Introduction
With the advent of Web 2.0 era, people can use a variety of application interfaces to generate a large amount of textual information. Textual documents are part of the prominent carriers to facilitate information sharing and propagation. The purpose of text tagging is to suggest tags for these documents through human or intelligent algorithms, so as to facilitate utilization. To achieve automatic emergence of document semantics, social tagging becomes popular with the help of social media [1], such as Delicious, CiteULike, Folksonomy and so on. However, the tags/labels generated by human may be casual and diverse, which is not conducive to information organization. Other ways concentrate on automatic text tagging, which can be divided into two major sub-categories: multilabel classification and collaborative representation learning. Multilabel classification is to transform sparse and high dimensional documents into dense vector representations, and then classify them to the given labels through an activation function [2]. Generally, the labels are ranked by their assigned probability values. This leads to that classification methods cannot distinguish the polarity of tags corresponding to the target documents, so it cannot effectively optimize the ranking of tags, which is a key issue regarding the top-n recommendation tasks [3]. In contrast, collaborative representation learning aims at mapping both documents and labels into a latent feature space and then performs matching using a distance or similarity metric [4], [5]. The model can utilize different loss functions in combination with negative sampling to fulfill ranking optimization [4], [6].
Document modeling techniques are usually shared by multilabel classification and collaborative representation learning. Traditional modeling approaches, such as vector space model and topic modeling, are built on “bag of words model (BoW)”. The BoW ignores the contexts of words, such as surrounding words of the word and word order, which are the important features to improve word or document representation learning [7]. To overcome the shortcomings of the BoW, a major trend is to explore document modeling with neural models, such as convolutional neural networks [8], [9], recurrent neural networks [10], hierarchical attention network [11], and direct document embedding [12]. These efforts follow the same research paradigm, that is, constantly improving neural models to strengthen the effect of document representation. The neighborhood effect (that is, similar documents tend to use similar or even the same tags), however, is always neglected when designing neural models.
The principle of neighborhood effect has been well studied in item-based collaborative filtering [13], and also adjusted in constructing the widely-used -nearest neighbors classifier for the task of document classification [14]. It has shown advantages of robustness and simplicity in both domains [13]. Obviously, the potential benefits can be expected by introducing neighborhood effect into the existing neural classification models. In addition, the traditional -nearest neighbors model does not involve a learning phase, leads to that the performance seriously depends on the weighting strategy of nearest neighbors, where a pre-defined similarity measurement is always required to measure closeness of the nearest neighbors [15].
In order to explore the neighborhood effect in neural classification models and solve the weighting issue of nearest-neighbors by the way, we propose a neighborhood-aware deep model to enhance the effectiveness of text tagging. To focus on the nature of work, we choose a popular neural network component, which combines the bi-directional recurrent neural network and self-attention mechanism [16] to encode a target document into one feature vector. To inject the neighborhood information into the model, -nearest-neighbors of the target document is identified and encoded one by one with the same text encoder. Simultaneously, the feature vectors of neighboring documents are synthesized into another feature vector to represent features from the neighborhood. In particular, an independent attention module is employed to automatically assign weight to overcome the difficulty in weighting nearest neighbors. Finally, the two feature vectors are fused to match the embedding vectors of tags and then to generate recommendations. We construct the objective function with a pairwise hinge loss for model training. Especially, a simple yet effective neighborhood-aware strategy is proposed in sampling negative tags to optimize the ranking of tags. Intensive experiments have been conducted and revealed the merits of our proposed method.
The main contributions of this paper are summarized as below:
- •
We propose NATT which exploits the neighborhood effect on both text encoding and negative sampling for text tagging.
- •
We show that neighborhood-awareness strategy can significantly improve the accuracy of tag predictions through intensive experiments.
- •
NATT is economical on achieving the best results with less training epochs and a smaller number of nearest neighbors.
The remainder of this paper is arranged as follows: we present related works in Section 2. In Section 3, we detail theneighborhood-aware neural model and discuss how to learn parameters with negative sampling. Experimental results and discussion are presented in Section 4. We draw conclusions and point out future works in Section 5.
Section snippets
Related works
As for social tagging, users apply public tags to online items, typically to make those items easier for themselves or others to find later. However, social tagging is expensive, and there are also cases of incorrect tags and inconsistencies caused by subjectivity, which brings difficulties to organize the documents. Also, without users’ participation, the cold-start problem cannot be solved. Instead, we focus on the related works of automatic text tagging. In our opinion, such work basically
Neural component for text encoding
Many models can be used to extract semantic representation of text (a.k.a encoding text) with the booming of deep learning. To focus on developing a neighborhood-aware prediction model for text tagging, we just choose a popular text-encoder, which combines bi-directional recurrent neural network (RNN) and self-attention mechanism [16]. On one hand, the semantics of words without importance can be ignored or kept small values with RNN, thus leaving more important semantics. Further,
Experiments
In this section, we conducted extensive experiments aiming to answer the following research questions:
- 1.
RQ1 What is the performance when comparing our NATT model with the state-of-the-art methods?
- 2.
RQ2 How the impact of training epochs is on the prediction performance w.r.t the NATT model?
- 3.
RQ3 How the impact of the number of nearest neighbors is on the prediction performance?
Conclusions and future works
We have proposed NATT which exploits the neighborhood effect of documents on both text encoding and negative sampling for text tagging. Experimental results reflect that NATT is effective in the task of top-n recommendation of tags. Also, NATT shows its merits regarding computational efficiency, as it requires less training epochs and a smaller number of nearest neighbors whenever achieving the best results.
Unlike the existing works which focus on developing deep neural networks for text
CRediT authorship contribution statement
Shaowei Qin: Conceptualization, Methodology, Software, Data curation, Investigation, Formal analysis, Writing - original draft. Hao Wu: Supervision, Resources, Conceptualization, Methodology, Investigation, Project administration, Funding acquisition, Writing - original draft. Rencan Nie: Methodology, Writing - review & editing. Jun He: Resources, Writing - review & editing.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (61962061, 61562090, U1802271), partially supported by the Yunnan Provincial Foundation for Leaders of Disciplines in Science and Technology, Top Young Talents of “Ten Thousand Plan” in Yunnan Province, China, the Program for Excellent Young Talents of Yunnan University, China, the Project of Innovative Research Team of Yunnan Province, China (2018HC019).
References (33)
Neighbor-weighted k-nearest neighbor for unbalanced text corpus
Expert Syst. Appl.
(2005)- et al.
Dual-regularized matrix factorization with deep neural networks for recommender systems
Knowl.-Based Syst.
(2018) - et al.
Effective metric learning with co-occurrence embedding for collaborative recommendations
Neural Netw.
(2020) - et al.
Survey on social tagging techniques
ACM Sigkdd Explor. Newsl.
(2010) - et al.
Multilabel classification
- et al.
Performance of recommender algorithms on top-n recommendation tasks
- J. Weston, S. Chopra, K. Adams, # tagspace: Semantic embeddings from hashtags, in: Proceedings of the 2014 Conference...
- et al.
Deep learning based recommender system: A survey and new perspectives
ACM Comput. Surv.
(2019) - et al.
Collaborative deep learning for recommender systems
- et al.
Effective use of word order for text Categorization with convolutional neural networks
Deep learning for extreme multi-label text classification
Are we really making much progress? a worrying analysis of recent neural recommendation approaches
Cited by (5)
Enhancement of DNN-based multilabel classification by grouping labels based on data imbalance and label correlation
2022, Pattern RecognitionCitation Excerpt :In the proposed method, multiple BiGRU blocks were connected to a concatenation layer. Qin et al. [34] proposed using BiGRU and a self-attention mechanism to tag texts. Andrade et al. [35] suggested using a long short-term memory neural network (LSTM) to detect five different types of malware.
Learning metric space with distillation for large-scale multi-label text classification
2023, Neural Computing and ApplicationsEffective Collaborative Representation Learning for Multilabel Text Categorization
2022, IEEE Transactions on Neural Networks and Learning Systems
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105750.