skip to main content
research-article

A Discriminative Convolutional Neural Network with Context-aware Attention

Published:26 July 2020Publication History
Skip Abstract Section

Abstract

Feature representation and feature extraction are two crucial procedures in text mining. Convolutional Neural Networks (CNN) have shown overwhelming success for text-mining tasks, since they are capable of efficiently extracting n-gram features from source data. However, vanilla CNN has its own weaknesses on feature representation and feature extraction. A certain amount of filters in CNN are inevitably duplicate and thus hinder to discriminatively represent a given text. In addition, most existing CNN models extract features in a fixed way (i.e., max pooling) that either limit the CNN to local optimum nor without considering the relation between all features, thereby unable to learn a contextual n-gram features adaptively. In this article, we propose a discriminative CNN with context-aware attention to solve the challenges of vanilla CNN. Specifically, our model mainly encourages discrimination across different filters via maximizing their earth mover distances and estimates the salience of feature candidates by considering the relation between context features. We validate carefully our findings against baselines on five benchmark datasets of classification and two datasets of summarization. The results of the experiments verify the competitive performance of our proposed model.

References

  1. Charu C. Aggarwal. 2018. Machine Learning for Text. Springer.Google ScholarGoogle Scholar
  2. Ignacio Arroyo-Fernández, Arturo Curiel, and Carlos-Francisco Méndez-Cruz. 2019. Language features in extractive summarization: Humans vs. Machines. Knowledge-Based Syst. 180 (2019), 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  3. Y.-Lan Boureau, Jean Ponce, and Yann LeCun. 2010. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 111--118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sung-Hyuk Cha and Sargur N. Srihari. 2002. On measuring the distance between histograms. Pattern Recogn. 35, 6 (2002), 1355--1370.Google ScholarGoogle ScholarCross RefCross Ref
  5. Huan Chen, Licheng Jiao, Miaomiao Liang, Fang Liu, Shuyuan Yang, and Biao Hou. 2019. Fast unsupervised deep fusion network for change detection of multitemporal SAR images. Neurocomputing 332 (2019), 56--70.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Qian Chen, Xiao-Dan Zhu, Zhen-Hua Ling, Si Wei, and Hui Jiang. 2016. Distraction-based neural networks for modeling document. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’16). 2754--2760.Google ScholarGoogle Scholar
  7. Yushi Chen, Hanlu Jiang, Chunyang Li, Xiuping Jia, and Pedram Ghamisi. 2016. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54, 10 (2016), 6232--6251.Google ScholarGoogle ScholarCross RefCross Ref
  8. Gong Cheng, Ceyuan Yang, Xiwen Yao, Lei Guo, and Junwei Han. 2018. When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 56, 5 (2018), 2811--2821.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. Arxiv Preprint Arxiv:1603.07252.Google ScholarGoogle Scholar
  10. Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, and Shi-Fu Chang. 2015. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision. 2857--2865.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems. MIT Press, 2292--2300.Google ScholarGoogle Scholar
  12. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Arxiv Preprint Arxiv:1810.04805.Google ScholarGoogle Scholar
  13. Ronen Feldman and James Sanger. 2007. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.Google ScholarGoogle Scholar
  14. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. MIT Press, 1693--1701.Google ScholarGoogle Scholar
  15. Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google ScholarGoogle Scholar
  16. Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 562--570.Google ScholarGoogle ScholarCross RefCross Ref
  17. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. Arxiv Preprint Arxiv:1607.01759.Google ScholarGoogle Scholar
  18. Mikael Kågebäck, Olof Mogren, Nina Tahmasebi, and Devdatt Dubhashi. 2014. Extractive summarization using continuous vector space models. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC’14). 31--39.Google ScholarGoogle Scholar
  19. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. Arxiv Preprint Arxiv:1404.2188.Google ScholarGoogle Scholar
  20. Yoon Kim. 2014. Convolutional neural networks for sentence classification. Arxiv Preprint Arxiv:1408.5882.Google ScholarGoogle Scholar
  21. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. Arxiv Preprint Arxiv:1412.6980.Google ScholarGoogle Scholar
  22. Sachin Kumar, Soumen Chakrabarti, and Shourya Roy. 2017. Earth mover’s distance pooling over siamese LSTMs for automatic short answer grading. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’17). 2046--2052.Google ScholarGoogle ScholarCross RefCross Ref
  23. Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. 957--966.Google ScholarGoogle Scholar
  24. Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2015. Molding cnns for text: Non-linear, non-consecutive convolutions. Arxiv Preprint Arxiv:1508.04112.Google ScholarGoogle Scholar
  25. Elizaveta Levina and Peter Bickel. 2001. The earth mover’s distance is the mallows distance: Some insights from statistics. In Proceedings Eighth IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. IEEE, 251--256.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tomer Levinboim, Ashish Vaswani, and David Chiang. 2015. Model invertibility regularization: Sequence alignment with or without parallel data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 609--618.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, and Tong Zhang. 2018. Multi-head attention with disagreement regularization. Arxiv Preprint Arxiv:1810.10183.Google ScholarGoogle Scholar
  28. Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out (2004), 74--81.Google ScholarGoogle Scholar
  30. Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. Arxiv Preprint Arxiv:1703.03130.Google ScholarGoogle Scholar
  31. Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 806--814.Google ScholarGoogle Scholar
  32. Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4244--4250.Google ScholarGoogle ScholarCross RefCross Ref
  33. Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  34. Shashi Narayan, Shay B. Cohen, and Mirella Lapata. 2018. Ranking sentences for extractive summarization with reinforcement learning. Arxiv Preprint Arxiv:1802.08636.Google ScholarGoogle Scholar
  35. Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 271.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’05). 115--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  38. Ji-Peng Qiang, Ping Chen, Wei Ding, Fei Xie, and Xindong Wu. 2016. Multi-document summarization using closed patterns. Knowledge-Based Syst. 99 (2016), 28--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, Vol. 242. Piscataway, NJ, 133--142.Google ScholarGoogle Scholar
  40. Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. Citeseer.Google ScholarGoogle Scholar
  41. Aruni RoyChowdhury, Prakhar Sharma, Erik Learned-Miller, and Aruni Roy. 2017. Reducing duplicate filters in deep neural networks. In Proceedings of the NIPS Workshop on Deep Learning: Bridging Theory and Practice.Google ScholarGoogle Scholar
  42. Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vision 40, 2 (2000), 99--121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 92--101.Google ScholarGoogle ScholarCross RefCross Ref
  44. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (1997), 2673--2681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ervin Sejdić, Igor Djurović, and Jin Jiang. 2009. Time-frequency feature representation using energy concentration: An overview of recent advances. Digital Signal Process. 19, 1 (2009), 153--183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Changxing Shang, Min Li, Shengzhong Feng, Qingshan Jiang, and Jianping Fan. 2013. Feature selection via maximizing global information gain for text classification. Knowledge-Based Syst. 54 (2013), 298--309.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ge Shi, Chong Feng, Lifu Huang, Boliang Zhang, Heng Ji, Lejian Liao, and Heyan Huang. 2018. Genre separation network with adversarial training for cross-genre relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1018--1023.Google ScholarGoogle ScholarCross RefCross Ref
  48. Roberta A. Sinoara, Jose Camacho-Collados, Rafael G. Rossi, Roberto Navigli, and Solange O. Rezende. 2019. Knowledge-enhanced document embeddings for text classification. Knowledge-Based Syst. 163 (2019), 955--971.Google ScholarGoogle ScholarCross RefCross Ref
  49. Heung-Il Suk, Seong-Whan Lee, Dinggang Shen, Alzheimer’s Disease Neuroimaging Initiative, et al. 2014. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101 (2014), 569--582.Google ScholarGoogle ScholarCross RefCross Ref
  50. Heung-Il Suk and Dinggang Shen. 2013. Deep learning-based feature representation for AD/MCI classification. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 583--590.Google ScholarGoogle ScholarCross RefCross Ref
  51. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. Arxiv Preprint Arxiv:1503.00075.Google ScholarGoogle Scholar
  52. Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1171--1181.Google ScholarGoogle ScholarCross RefCross Ref
  53. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. MIT Press, 5998--6008.Google ScholarGoogle Scholar
  54. Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017. Combining knowledge with deep convolutional neural networks for short text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’17), Vol. 350.Google ScholarGoogle ScholarCross RefCross Ref
  55. Shiyao Wang, Minlie Huang, and Zhidong Deng. 2018. Densely connected CNN with multi-scale feature attention for text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4468--4474.Google ScholarGoogle ScholarCross RefCross Ref
  56. Yaming Wang, Vlad I. Morariu, and Larry S. Davis. 2018. Learning a discriminative filter bank within a CNN for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4148--4157.Google ScholarGoogle Scholar
  57. Thomas Wiatowski and Helmut Bölcskei. 2017. A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Trans. Info. Theory 64, 3 (2017), 1845--1866.Google ScholarGoogle ScholarCross RefCross Ref
  58. Travis Williams and Robert Li. 2018. Wavelet pooling for convolutional neural networks. In International Conference On Learning Representation.Google ScholarGoogle Scholar
  59. Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang, and Yaohui Jin. 2018. Transformable convolutional neural network for text classification. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 4496--4502.Google ScholarGoogle ScholarCross RefCross Ref
  60. Wei Xue and Tao Li. 2018. Aspect based sentiment analysis with gated convolutional networks. Arxiv Preprint Arxiv:1805.07043.Google ScholarGoogle Scholar
  61. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480--1489.Google ScholarGoogle ScholarCross RefCross Ref
  62. Dingjun Yu, Hanli Wang, Peiqiu Chen, and Zhihua Wei. 2014. Mixed pooling for convolutional neural networks. In Proceedings of the International Conference on Rough Sets and Knowledge Technology. Springer, 364--375.Google ScholarGoogle ScholarCross RefCross Ref
  63. Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.Google ScholarGoogle ScholarCross RefCross Ref
  64. Heng Zhang and Guoqiang Zhong. 2016. Improving short text classification by learning vector representations of both words and hidden topics. Knowledge-Based Syst. 102 (2016), 76--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Tianyang Zhang, Minlie Huang, and Li Zhao. 2018. Learning structured representation for text classification via reinforcement learning. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI’18). AAAI.Google ScholarGoogle Scholar
  66. Wenyue Zhang, Yang Li, and Suge Wang. 2019. Learning document representation via topic-enhanced LSTM model. Knowledge-Based Syst. 174 (2019), 194--204.Google ScholarGoogle ScholarCross RefCross Ref
  67. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. MIT Press, 649--657.Google ScholarGoogle Scholar
  68. Ye Zhang, Matthew Lease, and Byron C. Wallace. 2017. Active discriminative text representation learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar

Index Terms

  1. A Discriminative Convolutional Neural Network with Context-aware Attention

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Intelligent Systems and Technology
        ACM Transactions on Intelligent Systems and Technology  Volume 11, Issue 5
        Survey Paper and Regular Paper
        October 2020
        325 pages
        ISSN:2157-6904
        EISSN:2157-6912
        DOI:10.1145/3409643
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 July 2020
        • Online AM: 7 May 2020
        • Accepted: 1 April 2020
        • Revised: 1 March 2020
        • Received: 1 August 2019
        Published in tist Volume 11, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format