research-article

A Discriminative Convolutional Neural Network with Context-aware Attention

Authors:
Yuxiang Zhou

Beijing Institute of Technology, China

Beijing Institute of Technology, China

0000-0002-6484-745X
View Profile

,
Lejian Liao

Beijing Institute of Technology, China

Beijing Institute of Technology, China
View Profile

,
Yang Gao

Beijing Institute of Technology, China

Beijing Institute of Technology, China
View Profile

,
Heyan Huang

Beijing Institute of Technology, China

Beijing Institute of Technology, China
View Profile

,
Xiaochi Wei

Baidu Inc.

Baidu Inc.
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 11 Issue 5Article No.: 57pp 1–21https://doi.org/10.1145/3397464

Published:26 July 2020Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Feature representation and feature extraction are two crucial procedures in text mining. Convolutional Neural Networks (CNN) have shown overwhelming success for text-mining tasks, since they are capable of efficiently extracting n-gram features from source data. However, vanilla CNN has its own weaknesses on feature representation and feature extraction. A certain amount of filters in CNN are inevitably duplicate and thus hinder to discriminatively represent a given text. In addition, most existing CNN models extract features in a fixed way (i.e., max pooling) that either limit the CNN to local optimum nor without considering the relation between all features, thereby unable to learn a contextual n-gram features adaptively. In this article, we propose a discriminative CNN with context-aware attention to solve the challenges of vanilla CNN. Specifically, our model mainly encourages discrimination across different filters via maximizing their earth mover distances and estimates the salience of feature candidates by considering the relation between context features. We validate carefully our findings against baselines on five benchmark datasets of classification and two datasets of summarization. The results of the experiments verify the competitive performance of our proposed model.

References

Charu C. Aggarwal. 2018. Machine Learning for Text. Springer.Google Scholar
Ignacio Arroyo-Fernández, Arturo Curiel, and Carlos-Francisco Méndez-Cruz. 2019. Language features in extractive summarization: Humans vs. Machines. Knowledge-Based Syst. 180 (2019), 1--11.Google ScholarCross Ref
Y.-Lan Boureau, Jean Ponce, and Yann LeCun. 2010. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 111--118.Google ScholarDigital Library
Sung-Hyuk Cha and Sargur N. Srihari. 2002. On measuring the distance between histograms. Pattern Recogn. 35, 6 (2002), 1355--1370.Google ScholarCross Ref
Huan Chen, Licheng Jiao, Miaomiao Liang, Fang Liu, Shuyuan Yang, and Biao Hou. 2019. Fast unsupervised deep fusion network for change detection of multitemporal SAR images. Neurocomputing 332 (2019), 56--70.Google ScholarDigital Library
Qian Chen, Xiao-Dan Zhu, Zhen-Hua Ling, Si Wei, and Hui Jiang. 2016. Distraction-based neural networks for modeling document. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’16). 2754--2760.Google Scholar
Yushi Chen, Hanlu Jiang, Chunyang Li, Xiuping Jia, and Pedram Ghamisi. 2016. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54, 10 (2016), 6232--6251.Google ScholarCross Ref
Gong Cheng, Ceyuan Yang, Xiwen Yao, Lei Guo, and Junwei Han. 2018. When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 56, 5 (2018), 2811--2821.Google ScholarCross Ref
Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. Arxiv Preprint Arxiv:1603.07252.Google Scholar
Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, and Shi-Fu Chang. 2015. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision. 2857--2865.Google ScholarDigital Library
Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems. MIT Press, 2292--2300.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Arxiv Preprint Arxiv:1810.04805.Google Scholar
Ronen Feldman and James Sanger. 2007. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.Google Scholar
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. MIT Press, 1693--1701.Google Scholar
Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google Scholar
Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 562--570.Google ScholarCross Ref
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. Arxiv Preprint Arxiv:1607.01759.Google Scholar
Mikael Kågebäck, Olof Mogren, Nina Tahmasebi, and Devdatt Dubhashi. 2014. Extractive summarization using continuous vector space models. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC’14). 31--39.Google Scholar
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. Arxiv Preprint Arxiv:1404.2188.Google Scholar
Yoon Kim. 2014. Convolutional neural networks for sentence classification. Arxiv Preprint Arxiv:1408.5882.Google Scholar
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. Arxiv Preprint Arxiv:1412.6980.Google Scholar
Sachin Kumar, Soumen Chakrabarti, and Shourya Roy. 2017. Earth mover’s distance pooling over siamese LSTMs for automatic short answer grading. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’17). 2046--2052.Google ScholarCross Ref
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. 957--966.Google Scholar
Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2015. Molding cnns for text: Non-linear, non-consecutive convolutions. Arxiv Preprint Arxiv:1508.04112.Google Scholar
Elizaveta Levina and Peter Bickel. 2001. The earth mover’s distance is the mallows distance: Some insights from statistics. In Proceedings Eighth IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. IEEE, 251--256.Google ScholarCross Ref
Tomer Levinboim, Ashish Vaswani, and David Chiang. 2015. Model invertibility regularization: Sequence alignment with or without parallel data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 609--618.Google ScholarCross Ref
Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, and Tong Zhang. 2018. Multi-head attention with disagreement regularization. Arxiv Preprint Arxiv:1810.10183.Google Scholar
Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 1--7.Google ScholarDigital Library
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out (2004), 74--81.Google Scholar
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. Arxiv Preprint Arxiv:1703.03130.Google Scholar
Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 806--814.Google Scholar
Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4244--4250.Google ScholarCross Ref
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
Shashi Narayan, Shay B. Cohen, and Mirella Lapata. 2018. Ranking sentences for extractive summarization with reinforcement learning. Arxiv Preprint Arxiv:1802.08636.Google Scholar
Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 271.Google ScholarDigital Library
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’05). 115--124.Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarCross Ref
Ji-Peng Qiang, Ping Chen, Wei Ding, Fei Xie, and Xindong Wu. 2016. Multi-document summarization using closed patterns. Knowledge-Based Syst. 99 (2016), 28--38.Google ScholarDigital Library
Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, Vol. 242. Piscataway, NJ, 133--142.Google Scholar
Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. Citeseer.Google Scholar
Aruni RoyChowdhury, Prakhar Sharma, Erik Learned-Miller, and Aruni Roy. 2017. Reducing duplicate filters in deep neural networks. In Proceedings of the NIPS Workshop on Deep Learning: Bridging Theory and Practice.Google Scholar
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vision 40, 2 (2000), 99--121.Google ScholarDigital Library
Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 92--101.Google ScholarCross Ref
Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (1997), 2673--2681.Google ScholarDigital Library
Ervin Sejdić, Igor Djurović, and Jin Jiang. 2009. Time-frequency feature representation using energy concentration: An overview of recent advances. Digital Signal Process. 19, 1 (2009), 153--183.Google ScholarDigital Library
Changxing Shang, Min Li, Shengzhong Feng, Qingshan Jiang, and Jianping Fan. 2013. Feature selection via maximizing global information gain for text classification. Knowledge-Based Syst. 54 (2013), 298--309.Google ScholarDigital Library
Ge Shi, Chong Feng, Lifu Huang, Boliang Zhang, Heng Ji, Lejian Liao, and Heyan Huang. 2018. Genre separation network with adversarial training for cross-genre relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1018--1023.Google ScholarCross Ref
Roberta A. Sinoara, Jose Camacho-Collados, Rafael G. Rossi, Roberto Navigli, and Solange O. Rezende. 2019. Knowledge-enhanced document embeddings for text classification. Knowledge-Based Syst. 163 (2019), 955--971.Google ScholarCross Ref
Heung-Il Suk, Seong-Whan Lee, Dinggang Shen, Alzheimer’s Disease Neuroimaging Initiative, et al. 2014. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101 (2014), 569--582.Google ScholarCross Ref
Heung-Il Suk and Dinggang Shen. 2013. Deep learning-based feature representation for AD/MCI classification. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 583--590.Google ScholarCross Ref
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. Arxiv Preprint Arxiv:1503.00075.Google Scholar
Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1171--1181.Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. MIT Press, 5998--6008.Google Scholar
Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017. Combining knowledge with deep convolutional neural networks for short text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’17), Vol. 350.Google ScholarCross Ref
Shiyao Wang, Minlie Huang, and Zhidong Deng. 2018. Densely connected CNN with multi-scale feature attention for text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4468--4474.Google ScholarCross Ref
Yaming Wang, Vlad I. Morariu, and Larry S. Davis. 2018. Learning a discriminative filter bank within a CNN for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4148--4157.Google Scholar
Thomas Wiatowski and Helmut Bölcskei. 2017. A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Trans. Info. Theory 64, 3 (2017), 1845--1866.Google ScholarCross Ref
Travis Williams and Robert Li. 2018. Wavelet pooling for convolutional neural networks. In International Conference On Learning Representation.Google Scholar
Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang, and Yaohui Jin. 2018. Transformable convolutional neural network for text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4496--4502.Google ScholarCross Ref
Wei Xue and Tao Li. 2018. Aspect based sentiment analysis with gated convolutional networks. Arxiv Preprint Arxiv:1805.07043.Google Scholar
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480--1489.Google ScholarCross Ref
Dingjun Yu, Hanli Wang, Peiqiu Chen, and Zhihua Wei. 2014. Mixed pooling for convolutional neural networks. In Proceedings of the International Conference on Rough Sets and Knowledge Technology. Springer, 364--375.Google ScholarCross Ref
Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.Google ScholarCross Ref
Heng Zhang and Guoqiang Zhong. 2016. Improving short text classification by learning vector representations of both words and hidden topics. Knowledge-Based Syst. 102 (2016), 76--86.Google ScholarDigital Library
Tianyang Zhang, Minlie Huang, and Li Zhao. 2018. Learning structured representation for text classification via reinforcement learning. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI’18). AAAI.Google Scholar
Wenyue Zhang, Yang Li, and Suge Wang. 2019. Learning document representation via topic-enhanced LSTM model. Knowledge-Based Syst. 174 (2019), 194--204.Google ScholarCross Ref
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. MIT Press, 649--657.Google Scholar
Ye Zhang, Matthew Lease, and Byron C. Wallace. 2017. Active discriminative text representation learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar

Index Terms

A Discriminative Convolutional Neural Network with Context-aware Attention
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining

Recommendations

Multi-Branch Convolutional Network for Context-Aware Recommendation
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Factorization Machine (FM)-based models can only reveal the relationship between a pair of features. With all feature embeddings fed to a MLP, DNN-based factorization models which combine FM with multi-layer perceptron (MLP) can only reveal the ...
Read More
Edge-preserving image denoising using a deep convolutional neural network
Highlights
- This paper makes use of a deep CNN for image denoising.
- The network is trained ...
Abstract
This paper introduces a novel denoising approach making use of a deep convolutional neural network to preserve image edges. The network is trained by using the edge map obtained from the well-known Canny algorithm and aims at ...
Read More
Protein Secondary Structure Prediction based on Wavelets and 2D Convolutional Neural Network
CSBio '16: Proceedings of the 7th International Conference on Computational Systems-Biology and Bioinformatics

In this paper we propose an approach to use wavelets and 2D convolutional neural network (CNN) to extract features for the prediction of protein secondary structure. A wavelet feature matrix extracted from PSSM profiles is input into convolutional ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 11, Issue 5
Survey Paper and Regular Paper
October 2020
325 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3409643
Editor:
Yu Zheng
JD Digits, China
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 July 2020
- Online AM: 7 May 2020
- Accepted: 1 April 2020
- Revised: 1 March 2020
- Received: 1 August 2019
Published in tist Volume 11, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Text mining
attention method
convolution neural networks
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 252
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Discriminative Convolutional Neural Network with Context-aware Attention

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Multi-Branch Convolutional Network for Context-Aware Recommendation

Edge-preserving image denoising using a deep convolutional neural network

Protein Secondary Structure Prediction based on Wavelets and 2D Convolutional Neural Network