skip to main content
10.1145/3308558.3313563acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

From Small-scale to Large-scale Text Classification

Published: 13 May 2019 Publication History

Abstract

Neural network models have achieved impressive results in the field of text classification. However, existing approaches often suffer from insufficient training data in a large-scale text classification involving a large number of categories (e.g., several thousands of categories). Several neural network models have utilized multi-task learning to overcome the limited amount of training data. However, these approaches are also limited to small-scale text classification. In this paper, we propose a novel neural network-based multi-task learning framework for large-scale text classification. To this end, we first treat the different scales of text classification (i.e., large and small numbers of categories) as multiple, related tasks. Then, we train the proposed neural network, which learns small- and large-scale text classification tasks simultaneously. In particular, we further enhance this multi-task learning architecture by using a gate mechanism, which controls the flow of features between the small- and large-scale text classification tasks. Experimental results clearly show that our proposed model improves the performance of the large-scale text classification task with the help of the small-scale text classification task. The proposed scheme exhibits significant improvements of as much as 14% and 5% in terms of micro-averaging and macro-averaging F1-score, respectively, over state-of-the-art techniques.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation(OSDI'16). 265-283.
[2]
Bahram Amini, Roliana Ibrahim, Mohd Shahizan Othman, and Mohammad Ali Nematbakhsh. 2015. A Reference Ontology for Profiling Scholar's Background Knowledge in Recommender Systems. Expert Syst. Appl. 42, 2 (Feb. 2015), 913-928.
[3]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135-146.
[4]
Andrei Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. 2007. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'07). 231-238.
[5]
Andrei Broder, Marcus Fontoura, Vanja Josifovski, and Lance Riedel. 2007. A Semantic Approach to Contextual Advertising. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'07). 559-566.
[6]
Paul Alexandru Chirita, Wolfgang Nejdl, Raluca Paiu, and Christian Kohlschütter. 2005. Using ODP metadata to personalize search. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'05). 178-185.
[7]
Ronan Collobert and Jason Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In Proceedings of the 25th International Conference on Machine Learning(ICML'08). 160-167.
[8]
Ronan Collobert, Jason Weston, Le´on Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 12 (Nov. 2011), 2493-2537.
[9]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics(AISTATS'10). 249-256.
[10]
Jongwoo Ha, Jung-Hyun Lee, Won-Jun Jang, Yong-Ku Lee, and SangKeun Lee. 2014. Toward Robust Classification Using the Open Directory Project. In Proceedings of the 2014 International Conference on Data Science and Advanced Analytics(DSAA'14). 607-612.
[11]
Haibo He and Edwardo A. Garcia. 2009. Learning from Imbalanced Data. IEEE Trans. on Knowl. and Data Eng. 21, 9 (Sept. 2009), 1263-1284.
[12]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997), 1735-1780.
[13]
Rie Johnson and Tong Zhang. 2016. Supervised and Semi-supervised Text Categorization Using LSTM for Region Embeddings. In Proceedings of the 33rd International Conference on International Conference on Machine Learning(ICML'16). 526-534.
[14]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics(EACL'17). 427-431.
[15]
Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An Empirical Exploration of Recurrent Network Architectures. In Proceedings of the 32nd International Conference on Machine Learning(ICML'15). 2342-2350.
[16]
Kang-Min Kim, Dinara Aliyeva, Byung-Ju Choi, and SangKeun Lee. 2018. Incorporating Word Embeddings into Open Directory Project based Large-scale Classification. In Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD'18). 376-388.
[17]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP'14). 1746-1751.
[18]
Yeachan Kim, Kang-Min Kim, Ji-Min Lee, and SangKeun Lee. 2018. Learning to Generate Word Representations using Subword Information. In Proceedings of the 27th International Conference on Computational Linguistics(COLING'18). 2551-2561.
[19]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980(2014).
[20]
Bartosz Krawczyk. 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence 5 (2016), 221-232.
[21]
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence(AAAI'15). 2267-2273.
[22]
Quoc V Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on Machine Learning(ICML'14). 1188-1196.
[23]
Jung-Hyun Lee, JongWoo Ha, Jin-Yong Jung, and SangKeun Lee. 2013. Semantic Contextual Advertising based on the Open Directory Project. ACM Trans. on the Web 7, 4 (Nov. 2013), 24:1-24:22.
[24]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Deep Multi-Task Learning with Shared Memory for Text Classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP'16). 118-127.
[25]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent Neural Network for Text Classification with Multi-Task Learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence(IJCAI'16). 2873-2879.
[26]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(ACL'17). 1-10.
[27]
Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye-Yi Wang. 2015. Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'15). 912-921.
[28]
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP'15). 1412-1421.
[29]
Andrew McCallum, Ronald Rosenfeld, Tom M. Mitchell, and Andrew Y. Ng. 1998. Improving Text Classification by Shrinkage in a Hierarchy of Classes. In Proceedings of the 15th International Conference on Machine Learning(ICML'98). 359-367.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 27th Conference on Neural Information Processing Systems(NIPS'13). 3111-3119.
[31]
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning(ICML'10). 807-814.
[32]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP'14). 1532-1543.
[33]
Woo-Jong Ryu, Jung-Hyun Lee, Kang-Min Kim, and SangKeun Lee. 2017. meCurate: Personalized Curation Service Using a Tiny Text Intelligence. In Proceedings of the 26th International Conference on World Wide Web Companion(WWW'17 Companion). 269-272.
[34]
HaeYong Shin, GeunJae Lee, Woo-Jong Ryu, and SangKeun Lee. 2017. Utilizing Wikipedia Knowledge in Open Directory Project-based Text Classification. In Proceedings of the 32nd Symposium on Applied Computing(SAC'17). 309-314.
[35]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 15, 1 (Jan. 2014), 1929-1958.
[36]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 28th Conference on Neural Information Processing Systems(NIPS'14). 3104-3112.
[37]
Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017. Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence(IJCAI'17). 2915-2921.
[38]
Zhongyuan Wang and Haixun Wang. 2016. Understanding Short Texts. In the 54th Annual Meeting of the Association for Computational Linguistics (Tutorial)(ACL'16).
[39]
Liqiang Xiao, Honglun Zhang, and Wenqing Chen. 2018. Gated Multi-Task Network for Text Classification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'18). 726-731.
[40]
Yiming Yang. 1999. An evaluation of statistical approaches to text categorization. Inf. Retr. 1, 1 (May 1999), 69-90.
[41]
Honglun Zhang, Liqiang Xiao, Yongkun Wang, and Yaohui Jin. 2017. A generalized recurrent neural architecture for text classification with multi-task learning. In Proceedings of the 26th International Joint Conference on Artificial Intelligence(IJCAI'17). 3385-3391.
[42]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In Proceedings of the 29th Conference on Neural Information Processing Systems(NIPS'15). 649-657.

Cited By

View all
  • (2023)Evolutionary learning of selection hyper-heuristics for text classificationApplied Soft Computing10.1016/j.asoc.2023.110721(110721)Online publication date: Aug-2023
  • (2022)Modeling and Analysis of Blockchain Trading Network Based on Directed Time Weighted Random WalkBlockchain and Trustworthy Systems10.1007/978-981-16-7993-3_21(275-286)Online publication date: 1-Jan-2022
  • (2021)A Literature Review on Text Classification and Sentiment Analysis ApproachesComputational Science and Technology10.1007/978-981-33-4069-5_26(305-323)Online publication date: 16-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep Neural Networks
  2. Large-scale Text Classification
  3. Multi-task Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Evolutionary learning of selection hyper-heuristics for text classificationApplied Soft Computing10.1016/j.asoc.2023.110721(110721)Online publication date: Aug-2023
  • (2022)Modeling and Analysis of Blockchain Trading Network Based on Directed Time Weighted Random WalkBlockchain and Trustworthy Systems10.1007/978-981-16-7993-3_21(275-286)Online publication date: 1-Jan-2022
  • (2021)A Literature Review on Text Classification and Sentiment Analysis ApproachesComputational Science and Technology10.1007/978-981-33-4069-5_26(305-323)Online publication date: 16-Mar-2021
  • (2020)Mis-shapes, Mistakes, MisfitsProceedings of the ACM Internet Measurement Conference10.1145/3419394.3423660(598-618)Online publication date: 27-Oct-2020
  • (2020)Enhancing Text Classification via Discovering Additional Semantic Clues from LogogramsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401107(1201-1210)Online publication date: 25-Jul-2020
  • (2020)A Neural-based Architecture For Small Datasets ClassificationProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 202010.1145/3383583.3398535(319-327)Online publication date: 1-Aug-2020
  • (2020)An Approach for Process Model Extraction by Multi-grained Text ClassificationAdvanced Information Systems Engineering10.1007/978-3-030-49435-3_17(268-282)Online publication date: 3-Jun-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media