Abstract
Feature representation and feature extraction are two crucial procedures in text mining. Convolutional Neural Networks (CNN) have shown overwhelming success for text-mining tasks, since they are capable of efficiently extracting n-gram features from source data. However, vanilla CNN has its own weaknesses on feature representation and feature extraction. A certain amount of filters in CNN are inevitably duplicate and thus hinder to discriminatively represent a given text. In addition, most existing CNN models extract features in a fixed way (i.e., max pooling) that either limit the CNN to local optimum nor without considering the relation between all features, thereby unable to learn a contextual n-gram features adaptively. In this article, we propose a discriminative CNN with context-aware attention to solve the challenges of vanilla CNN. Specifically, our model mainly encourages discrimination across different filters via maximizing their earth mover distances and estimates the salience of feature candidates by considering the relation between context features. We validate carefully our findings against baselines on five benchmark datasets of classification and two datasets of summarization. The results of the experiments verify the competitive performance of our proposed model.
- Charu C. Aggarwal. 2018. Machine Learning for Text. Springer.Google Scholar
- Ignacio Arroyo-Fernández, Arturo Curiel, and Carlos-Francisco Méndez-Cruz. 2019. Language features in extractive summarization: Humans vs. Machines. Knowledge-Based Syst. 180 (2019), 1--11.Google ScholarCross Ref
- Y.-Lan Boureau, Jean Ponce, and Yann LeCun. 2010. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 111--118.Google ScholarDigital Library
- Sung-Hyuk Cha and Sargur N. Srihari. 2002. On measuring the distance between histograms. Pattern Recogn. 35, 6 (2002), 1355--1370.Google ScholarCross Ref
- Huan Chen, Licheng Jiao, Miaomiao Liang, Fang Liu, Shuyuan Yang, and Biao Hou. 2019. Fast unsupervised deep fusion network for change detection of multitemporal SAR images. Neurocomputing 332 (2019), 56--70.Google ScholarDigital Library
- Qian Chen, Xiao-Dan Zhu, Zhen-Hua Ling, Si Wei, and Hui Jiang. 2016. Distraction-based neural networks for modeling document. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’16). 2754--2760.Google Scholar
- Yushi Chen, Hanlu Jiang, Chunyang Li, Xiuping Jia, and Pedram Ghamisi. 2016. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54, 10 (2016), 6232--6251.Google ScholarCross Ref
- Gong Cheng, Ceyuan Yang, Xiwen Yao, Lei Guo, and Junwei Han. 2018. When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 56, 5 (2018), 2811--2821.Google ScholarCross Ref
- Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. Arxiv Preprint Arxiv:1603.07252.Google Scholar
- Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, and Shi-Fu Chang. 2015. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision. 2857--2865.Google ScholarDigital Library
- Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems. MIT Press, 2292--2300.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Arxiv Preprint Arxiv:1810.04805.Google Scholar
- Ronen Feldman and James Sanger. 2007. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.Google Scholar
- Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. MIT Press, 1693--1701.Google Scholar
- Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google Scholar
- Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 562--570.Google ScholarCross Ref
- Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. Arxiv Preprint Arxiv:1607.01759.Google Scholar
- Mikael Kågebäck, Olof Mogren, Nina Tahmasebi, and Devdatt Dubhashi. 2014. Extractive summarization using continuous vector space models. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC’14). 31--39.Google Scholar
- Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. Arxiv Preprint Arxiv:1404.2188.Google Scholar
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. Arxiv Preprint Arxiv:1408.5882.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. Arxiv Preprint Arxiv:1412.6980.Google Scholar
- Sachin Kumar, Soumen Chakrabarti, and Shourya Roy. 2017. Earth mover’s distance pooling over siamese LSTMs for automatic short answer grading. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’17). 2046--2052.Google ScholarCross Ref
- Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. 957--966.Google Scholar
- Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2015. Molding cnns for text: Non-linear, non-consecutive convolutions. Arxiv Preprint Arxiv:1508.04112.Google Scholar
- Elizaveta Levina and Peter Bickel. 2001. The earth mover’s distance is the mallows distance: Some insights from statistics. In Proceedings Eighth IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. IEEE, 251--256.Google ScholarCross Ref
- Tomer Levinboim, Ashish Vaswani, and David Chiang. 2015. Model invertibility regularization: Sequence alignment with or without parallel data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 609--618.Google ScholarCross Ref
- Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, and Tong Zhang. 2018. Multi-head attention with disagreement regularization. Arxiv Preprint Arxiv:1810.10183.Google Scholar
- Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 1--7.Google ScholarDigital Library
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out (2004), 74--81.Google Scholar
- Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. Arxiv Preprint Arxiv:1703.03130.Google Scholar
- Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 806--814.Google Scholar
- Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4244--4250.Google ScholarCross Ref
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
- Shashi Narayan, Shay B. Cohen, and Mirella Lapata. 2018. Ranking sentences for extractive summarization with reinforcement learning. Arxiv Preprint Arxiv:1802.08636.Google Scholar
- Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 271.Google ScholarDigital Library
- Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’05). 115--124.Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarCross Ref
- Ji-Peng Qiang, Ping Chen, Wei Ding, Fei Xie, and Xindong Wu. 2016. Multi-document summarization using closed patterns. Knowledge-Based Syst. 99 (2016), 28--38.Google ScholarDigital Library
- Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, Vol. 242. Piscataway, NJ, 133--142.Google Scholar
- Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. Citeseer.Google Scholar
- Aruni RoyChowdhury, Prakhar Sharma, Erik Learned-Miller, and Aruni Roy. 2017. Reducing duplicate filters in deep neural networks. In Proceedings of the NIPS Workshop on Deep Learning: Bridging Theory and Practice.Google Scholar
- Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vision 40, 2 (2000), 99--121.Google ScholarDigital Library
- Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 92--101.Google ScholarCross Ref
- Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (1997), 2673--2681.Google ScholarDigital Library
- Ervin Sejdić, Igor Djurović, and Jin Jiang. 2009. Time-frequency feature representation using energy concentration: An overview of recent advances. Digital Signal Process. 19, 1 (2009), 153--183.Google ScholarDigital Library
- Changxing Shang, Min Li, Shengzhong Feng, Qingshan Jiang, and Jianping Fan. 2013. Feature selection via maximizing global information gain for text classification. Knowledge-Based Syst. 54 (2013), 298--309.Google ScholarDigital Library
- Ge Shi, Chong Feng, Lifu Huang, Boliang Zhang, Heng Ji, Lejian Liao, and Heyan Huang. 2018. Genre separation network with adversarial training for cross-genre relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1018--1023.Google ScholarCross Ref
- Roberta A. Sinoara, Jose Camacho-Collados, Rafael G. Rossi, Roberto Navigli, and Solange O. Rezende. 2019. Knowledge-enhanced document embeddings for text classification. Knowledge-Based Syst. 163 (2019), 955--971.Google ScholarCross Ref
- Heung-Il Suk, Seong-Whan Lee, Dinggang Shen, Alzheimer’s Disease Neuroimaging Initiative, et al. 2014. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101 (2014), 569--582.Google ScholarCross Ref
- Heung-Il Suk and Dinggang Shen. 2013. Deep learning-based feature representation for AD/MCI classification. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 583--590.Google ScholarCross Ref
- Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. Arxiv Preprint Arxiv:1503.00075.Google Scholar
- Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1171--1181.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. MIT Press, 5998--6008.Google Scholar
- Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017. Combining knowledge with deep convolutional neural networks for short text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’17), Vol. 350.Google ScholarCross Ref
- Shiyao Wang, Minlie Huang, and Zhidong Deng. 2018. Densely connected CNN with multi-scale feature attention for text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4468--4474.Google ScholarCross Ref
- Yaming Wang, Vlad I. Morariu, and Larry S. Davis. 2018. Learning a discriminative filter bank within a CNN for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4148--4157.Google Scholar
- Thomas Wiatowski and Helmut Bölcskei. 2017. A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Trans. Info. Theory 64, 3 (2017), 1845--1866.Google ScholarCross Ref
- Travis Williams and Robert Li. 2018. Wavelet pooling for convolutional neural networks. In International Conference On Learning Representation.Google Scholar
- Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang, and Yaohui Jin. 2018. Transformable convolutional neural network for text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4496--4502.Google ScholarCross Ref
- Wei Xue and Tao Li. 2018. Aspect based sentiment analysis with gated convolutional networks. Arxiv Preprint Arxiv:1805.07043.Google Scholar
- Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480--1489.Google ScholarCross Ref
- Dingjun Yu, Hanli Wang, Peiqiu Chen, and Zhihua Wei. 2014. Mixed pooling for convolutional neural networks. In Proceedings of the International Conference on Rough Sets and Knowledge Technology. Springer, 364--375.Google ScholarCross Ref
- Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.Google ScholarCross Ref
- Heng Zhang and Guoqiang Zhong. 2016. Improving short text classification by learning vector representations of both words and hidden topics. Knowledge-Based Syst. 102 (2016), 76--86.Google ScholarDigital Library
- Tianyang Zhang, Minlie Huang, and Li Zhao. 2018. Learning structured representation for text classification via reinforcement learning. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI’18). AAAI.Google Scholar
- Wenyue Zhang, Yang Li, and Suge Wang. 2019. Learning document representation via topic-enhanced LSTM model. Knowledge-Based Syst. 174 (2019), 194--204.Google ScholarCross Ref
- Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. MIT Press, 649--657.Google Scholar
- Ye Zhang, Matthew Lease, and Byron C. Wallace. 2017. Active discriminative text representation learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar
Index Terms
- A Discriminative Convolutional Neural Network with Context-aware Attention
Recommendations
Multi-Branch Convolutional Network for Context-Aware Recommendation
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalFactorization Machine (FM)-based models can only reveal the relationship between a pair of features. With all feature embeddings fed to a MLP, DNN-based factorization models which combine FM with multi-layer perceptron (MLP) can only reveal the ...
Edge-preserving image denoising using a deep convolutional neural network
Highlights- This paper makes use of a deep CNN for image denoising.
- The network is trained ...
AbstractThis paper introduces a novel denoising approach making use of a deep convolutional neural network to preserve image edges. The network is trained by using the edge map obtained from the well-known Canny algorithm and aims at ...
Protein Secondary Structure Prediction based on Wavelets and 2D Convolutional Neural Network
CSBio '16: Proceedings of the 7th International Conference on Computational Systems-Biology and BioinformaticsIn this paper we propose an approach to use wavelets and 2D convolutional neural network (CNN) to extract features for the prediction of protein secondary structure. A wavelet feature matrix extracted from PSSM profiles is input into convolutional ...
Comments