Elsevier

Knowledge-Based Systems

Volume 189, 15 February 2020, 105151
Knowledge-Based Systems

Hybrid neural conditional random fields for multi-view sequence labeling

https://doi.org/10.1016/j.knosys.2019.105151Get rights and content

Highlights

  • We propose a hybrid neural CRF for multi-view sequence labeling, called MVCRF.

  • Our model combines multi-view learning by utilizing consensus and complementary principles.

  • We systematically compare the performance of MVCRF with other models.

  • The experimental results show MVCRF achieves state-of-the-art performance.

Abstract

In traditional machine learning, conditional random fields (CRF) is the mainstream probability model for sequence labeling problems. CRF considers the relation between adjacent labels other than decoding each label independently, and better performance is expected to achieve. However, there are few multi-view learning methods involving CRF which can be directly used for sequence labeling tasks. In this paper, we propose a novel multi-view CRF model to label sequential data, called MVCRF, which well exploits two principles for multi-view learning: consensus and complementary. We first use different neural networks to extract features from multiple views. Then, considering the consistency among the different views, we introduce a joint representation space for the extracted features and minimize the distance between the two views for regularization. Meanwhile, following the complementary principle, the features of multiple views are integrated into the framework of CRF. We train MVCRF in an end-to-end fashion and evaluate it on two benchmark data sets. The experimental results illustrate that MVCRF obtains state-of-the-art performance:  F1 score 95.44% for chunking on CoNLL-2000, 95.06% for chunking and 96.99% for named entity recognition (NER) on CoNLL-2003.

Introduction

Sequence labeling problems such as named entity recognition (NER) and syntactic chunking are classical tasks in the field of natural language processing (NLP). The sequential data used in these tasks mostly contains different features, such as word feature and part-of-speech (POS), which may be obtained by diverse measuring modes or come from different feature extractors. We usually call them as multi-view data. For multi-view data, a naive method is concatenating multiple views directly into a single view, then using the single-view algorithms for subsequent processing. However, this approach may lead to the overfitting problem, and the unique statistical characteristics of each view cannot be fully exploited. The alternative method is only using one of the multiple views. Generally, the best performance cannot be achieved, either.

Multi-view learning (MVL) is an emerging direction and has made great developments in machine learning for recent years, which can solve the above problems well. It aims to improve generalization performance by making full use of information from multiple views. There is an increasing number of MVL methods proposed, which can be divided into three major categories [1]: co-training style algorithms [2], [3], [4], co-regularization style algorithms [5], [6], and margin consistency style algorithms [7], [8]. Without loss of generality, we can usually obtain better performance by adopting these MVL methods. Even if there is only a natural single view available, it is possible to further improve the model performance by manually generating multiple views, which reflects the huge advantage of MVL [2]. Recently, MVL has been a highly concerned topic in machine learning [9], [10], [11], [12], [13]. However, to the best of our knowledge, among the existing MVL methods, few of them can directly hand with sequence labeling problems. In this paper, we intend to propose a new kind of multi-view method to label sequential data.

The existing sequence labeling models include two main categories. One is the linear statistical models, such as hidden Markov models [14], maximum entropy Markov models [15], and conditional random fields (CRF) [16], [17], [18], [19]. The other is the non-linear neural networks based models. For the linear statistical models, CRF [16] is a popular model for fitting time sequences and excellent at the sequential data labeling tasks. CRF does not make independence assumptions on the observations. It focuses on the information in the sentence level instead of individual position. Therefore, CRF can more correctly catch the relationship within a sequence and a higher tagging accuracy is expected to achieve. With good properties, CRF has been widely used in sequence labeling tasks and obtains respectably good performance [16], [17], [18], [19]. Among the non-linear neural networks based models, the model based on the convolutional neural network (CNN) [20] was first presented for sequence labeling. Later, some sequence labeling models based on long short-term memory (LSTM) were proposed and made great success in sequence labeling [21], [22], [23].

In this paper, we develop a hybrid neural CRF for multi-view sequence labeling, named MVCRF. The model is based on the traditional CRF and adopts diverse neural networks for feature extraction of multiple views. Different from the available models for sequence labeling, the proposed model not only considers the correlation between neighborhood labels and jointly decodes the best sequence of labels, but also combines MVL by utilizing consensus and complementary principles [24]. MVCRF first takes each view of the sequential data as the input. Since neural networks have the ability to automatically extract features from data [25], then we adopt them to respectively extract features from multiple views in the proposed model. The extracted features are projected into a joint representation space. Inspired by the idea of co-regularization method [26], we regularize the log-likelihood via minimizing the distance between two views. In other words, we enforce the features from different views to be as close as possible by minimizing the distance, which reflects the consensus principle. Moreover, considering that each view may contain some specific information that not in other views, the features from different views are taken as the input to the CRF layer. At the last, CRF takes the role of making a structural prediction and outputting the best sequence labels.

The main contributions in this paper can be summarized as follows. Our work is to propose a hybrid neural CRF for multi-view sequence labeling. The key idea in our model is combining MVL to perform the sequence labeling. We first construct a joint representation space of different features, based on the consensus principle. Then we regularize the conditional probability distribution by the consistency of diverse views. Meanwhile, we also consider the complementary principle to take full use of the specific information for each view. We systematically compare the performance of MVCRF with other models. The experimental results show MVCRF achieves state-of-the-art performance on CoNLL-2000 and CoNLL-2003 benchmark data sets.

The rest of this paper is organized as follows. Section 2 reviews the related research. Section 3 proposes our MVCRF model. Section 4 reports experimental results on benchmark data sets and makes systematical comparisons. Finally, we draw conclusions and point out possible future work in Section 5.

Section snippets

Related work

For the sequence labeling tasks, each label is not only related to the current input, but also has a correlation with the previous label. That is, the predicted labels in the sequence have strong dependence and follow specific pattern rules. For example, in the NER task with standard IOB2 labeling scheme [27], label “I-ORG” can follow “B-ORG” or other “I-ORG”, but “I-ORG” and “I-PER” is not allowed. In this case, it is not appropriate to make independence assumption. Instead of independently

Model representation

This section proposes a hybrid neural CRF model MVCRF. To illustrate our model more clearly, we first briefly introduce the basic framework used in this paper: CRF and Bi-LSTM for feature extraction. Then we describe MVCRF model and show the corresponding inference and parameter optimization in detail.

Experiments

In this section, we test our MVCRF model on CoNLL-2000 and CoNLL-2003 English data set, and report the experimental results for dealing with the two sequence tagging tasks: chunking and NER.

Conclusion

In this paper, we propose a novel hybrid neural CRF model for multi-view sequence labeling. The model uses the multi-view consistency to regularize the conditional likelihood and fully leverages the information from multiple views. Experimental results show that the proposed model achieves the best performance on two benchmark sequence labeling data sets.

In the proposed model, we exploit the Bi-LSTM and linear network to extract features from distinctive views. For the future, we will further

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Project 61673179, and Natural Science Foundation of Shanghai, PR China under Grant No. 19ZR1415800.

Xuli Sun received the B.S. degree from Shanxi University, Shanxi, China. She is currently pursuing the M.S. degree with the School of Computer Science and Technology, East China Normal University, Shanghai, China. Her current research interests include pattern recognition and machine learning.

References (44)

  • SunS. et al.

    Robust co-training

    Int. J. Pattern Recognit. Artif. Intell.

    (2011)
  • ChenN. et al.

    Predictive subspace learning for multi-view data: a large margin approach

  • SalzmannM. et al.

    Factorized orthogonal latent spaces

    J. Mach. Learn. Res.

    (2010)
  • MaoL. et al.

    Soft margin consistency based scalable multi-view maximum entropy discrimination

  • ChenN. et al.

    Large-margin predictive latent subspace learning for multiview data analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • SunS.

    A survey of multi-view machine learning

    Neural Comput. Appl.

    (2013)
  • McCallumA. et al.

    Maximum entropy markov models for information extraction and segmentation

  • J. Lafferty, A. Mccallum, F.C.N. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling...
  • L. Ratinov, D. Roth, Design challenges and misconceptions in named entity recognition, in: Proceedings of the...
  • PassosA. et al.

    Lexicon infused phrase embeddings for named entity resolution

    (2014)
  • G. Luo, X. Huang, C.Y. Lin, Z. Nie, Joint Entity Recognition and Disambiguation, in: Proceedings of the 2015 Conference...
  • CollobertR. et al.

    Natural language processing (almost) from scratch

    J. Mach. Learn. Res.

    (2011)
  • Cited by (20)

    • A hybrid deep-learning approach for complex biochemical named entity recognition[Formula presented]

      2021, Knowledge-Based Systems
      Citation Excerpt :

      These computational tools make reaction analysis faster than manual approaches and allow efficient predictions of reactions of possible reagent combinations. There is no doubt that artificial intelligence, particularly deep learning, is revolutionizing our understanding of chemistry [2,3]. Despite the advances, in the field of chemical drugs, there are still several important scientific activities and processes for the extraction of information that are done manually, taking plenty of experts’ time.

    • Adversarial robustness and attacks for multi-view deep models

      2021, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      The multi-view models include three main categories (Zhao et al., 2017): (i) Co-training style models (Nigam and Ghani, 2000; Muslea et al., 2006; Sun and Jin, 2011) maximize the consistency between different views by training alternately, which were inspired by co-training (Blum and Mitchell, 1998); (ii) Co-regularization style models (Sun and Shawe-Taylor, 2010; Sun, 2011; Xie and Sun, 2014) maximize the likelihood on the single view and constrain the predictions of different views to be as consistent as possible, which were achieved by adding a regularization term in the objective function; iii) Margin consistency style models (Mao and Sun, 2016; Chao and Sun, 2016) leverage the latent consistency among multiple views and model the margin variables on each view. Recently, some multi-view deep models were developed, such as multi-view convolutional neural network (MVCNN) (Su et al., 2015), Show-and-Tell (Vinyals et al., 2015), Show-Attend-and-Tell (Xu et al., 2015), NeuralTalk (Karpathy and Fei-Fei, 2015), and MVCRF (Sun et al., 2019), which have made remarkable progress in 3D shape classification, image captioning, sequence labeling and other popular tasks. Although multi-view models have wide applications and superior performance, it is an open problem whether multi-view models are more robust to adversarial examples than single-view models.

    • Enhanced sequence labeling based on latent variable conditional random fields

      2020, Neurocomputing
      Citation Excerpt :

      Sun and Tsujii [33] described the latent-dynamic inference (LDI), which produces the optimal label sequence of the latent conditional models by using efficient search strategy and dynamic programming. Sun et al. [34] combined multi-view CRF learning by utilizing consensus and complementary principles for sequence labeling. It uses different neural networks for feature extraction from multiple views.

    • Enhancing deep neural networks via multiple kernel learning

      2020, Pattern Recognition
      Citation Excerpt :

      A relevant instance is given in the context of graph processing by the work in [31], in which graph kernels are used to pre-train a siamese network for graph classification, showing promising results. Finally, we note that the concept (and benefit) of combining different representations computed by neural networks has been explored in a recent contribution [32]. In this case, authors have extracted different feature sets by means of multiple neural networks, and then they have defined a joint representation which is integrated into the framework of Conditional Random Field for sequence labeling tasks.

    • A survey on syntactic processing techniques

      2023, Artificial Intelligence Review
    View all citing articles on Scopus

    Xuli Sun received the B.S. degree from Shanxi University, Shanxi, China. She is currently pursuing the M.S. degree with the School of Computer Science and Technology, East China Normal University, Shanghai, China. Her current research interests include pattern recognition and machine learning.

    Shiliang Sun is a profess or at the School of Computer Science and Technology and the head of the Pattern Recognition and Machine Learning Research Group, East China Normal University. He received the B.E. degree in automatic control from the Department of Automatic Control, Beijing University of Aeronautics and Astronautics in 2002, and the Ph.D. degree in pattern recognition and intelligent systems from the Department of Automation and the State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing, China, in 2007. In 2004, he was entitled Microsoft Fellow. From 2009 to 2010, he was a visiting researcher at the Department of Computer Science, University College London, working within the Centre for Computational Statistics and Machine Learning. From March to April 2012, he was a visiting researcher at the Department of Statistics, Rutgers University. He is a member of the PASCAL (Pattern Analysis, Statistical Modelling, and Computational Learning) network of excellence, and on the editorial boards of multiple international journals. His research interests include multi-view learning, approximate inference, Gaussian process, sequential modeling, kernel methods, and their applications.

    Minzhi Yin is a profess or at the Department of Pathology, and the Director of the Department of Pathology, Shanghai Jiaotong University Medical School affiliated Shanghai Children’s Medical Center. She graduated from Shanghai Second Medical University and received the MD degree in 1993, and received the Master Degree from Shanghai Jiao Tong University School of Medicine in 2011. She spent one year as a clinical visiting scholar training in Royal Children Hospital, Melbourne, Australia from 2004 to 2005; and 3-month respective training in St. Jude Children’s Research Hospital in 2000 and Los Angeles Children’s Hospital in 2016 as an observer. Her research interests include artificial intelligence and its applications including pathology diagnosis.

    Hao Yang is a senior researcher at 2012 Lab of Huawei Company Limited. He became a Member (M) of IEEE in 2005 and a Senior Member (SM) in 2009. He achieved Ph.d. degree from Beijing University of Posts and Telecommunications in 2009. His major research fields include natural language processing, neural machine translation, and deep learning in text areas.

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.105151.

    View full text