research-article

From Small-scale to Large-scale Text Classification

Authors:

SangKeun LeeAuthors Info & Claims

WWW '19: The World Wide Web Conference

Pages 853 - 862

https://doi.org/10.1145/3308558.3313563

Published: 13 May 2019 Publication History

Abstract

Neural network models have achieved impressive results in the field of text classification. However, existing approaches often suffer from insufficient training data in a large-scale text classification involving a large number of categories (e.g., several thousands of categories). Several neural network models have utilized multi-task learning to overcome the limited amount of training data. However, these approaches are also limited to small-scale text classification. In this paper, we propose a novel neural network-based multi-task learning framework for large-scale text classification. To this end, we first treat the different scales of text classification (i.e., large and small numbers of categories) as multiple, related tasks. Then, we train the proposed neural network, which learns small- and large-scale text classification tasks simultaneously. In particular, we further enhance this multi-task learning architecture by using a gate mechanism, which controls the flow of features between the small- and large-scale text classification tasks. Experimental results clearly show that our proposed model improves the performance of the large-scale text classification task with the help of the small-scale text classification task. The proposed scheme exhibits significant improvements of as much as 14% and 5% in terms of micro-averaging and macro-averaging F1-score, respectively, over state-of-the-art techniques.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation(OSDI'16). 265-283.

Digital Library

[2]

Bahram Amini, Roliana Ibrahim, Mohd Shahizan Othman, and Mohammad Ali Nematbakhsh. 2015. A Reference Ontology for Profiling Scholar's Background Knowledge in Recommender Systems. Expert Syst. Appl. 42, 2 (Feb. 2015), 913-928.

Digital Library

[3]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135-146.

[4]

Andrei Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. 2007. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'07). 231-238.

Digital Library

[5]

Andrei Broder, Marcus Fontoura, Vanja Josifovski, and Lance Riedel. 2007. A Semantic Approach to Contextual Advertising. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'07). 559-566.

Digital Library

[6]

Paul Alexandru Chirita, Wolfgang Nejdl, Raluca Paiu, and Christian Kohlschütter. 2005. Using ODP metadata to personalize search. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'05). 178-185.

Digital Library

[7]

Ronan Collobert and Jason Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In Proceedings of the 25th International Conference on Machine Learning(ICML'08). 160-167.

Digital Library

[8]

Ronan Collobert, Jason Weston, Le´on Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 12 (Nov. 2011), 2493-2537.

Digital Library

[9]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics(AISTATS'10). 249-256.

[10]

Jongwoo Ha, Jung-Hyun Lee, Won-Jun Jang, Yong-Ku Lee, and SangKeun Lee. 2014. Toward Robust Classification Using the Open Directory Project. In Proceedings of the 2014 International Conference on Data Science and Advanced Analytics(DSAA'14). 607-612.

[11]

Haibo He and Edwardo A. Garcia. 2009. Learning from Imbalanced Data. IEEE Trans. on Knowl. and Data Eng. 21, 9 (Sept. 2009), 1263-1284.

Digital Library

[12]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997), 1735-1780.

Digital Library

[13]

Rie Johnson and Tong Zhang. 2016. Supervised and Semi-supervised Text Categorization Using LSTM for Region Embeddings. In Proceedings of the 33rd International Conference on International Conference on Machine Learning(ICML'16). 526-534.

Digital Library

[14]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics(EACL'17). 427-431.

[15]

Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An Empirical Exploration of Recurrent Network Architectures. In Proceedings of the 32nd International Conference on Machine Learning(ICML'15). 2342-2350.

Digital Library

[16]

Kang-Min Kim, Dinara Aliyeva, Byung-Ju Choi, and SangKeun Lee. 2018. Incorporating Word Embeddings into Open Directory Project based Large-scale Classification. In Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD'18). 376-388.

Digital Library

[17]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP'14). 1746-1751.

[18]

Yeachan Kim, Kang-Min Kim, Ji-Min Lee, and SangKeun Lee. 2018. Learning to Generate Word Representations using Subword Information. In Proceedings of the 27th International Conference on Computational Linguistics(COLING'18). 2551-2561.

[19]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980(2014).

[20]

Bartosz Krawczyk. 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence 5 (2016), 221-232.

[21]

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence(AAAI'15). 2267-2273.

Digital Library

[22]

Quoc V Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on Machine Learning(ICML'14). 1188-1196.

Digital Library

[23]

Jung-Hyun Lee, JongWoo Ha, Jin-Yong Jung, and SangKeun Lee. 2013. Semantic Contextual Advertising based on the Open Directory Project. ACM Trans. on the Web 7, 4 (Nov. 2013), 24:1-24:22.

Digital Library

[24]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Deep Multi-Task Learning with Shared Memory for Text Classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP'16). 118-127.

[25]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent Neural Network for Text Classification with Multi-Task Learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence(IJCAI'16). 2873-2879.

Digital Library

[26]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(ACL'17). 1-10.

[27]

Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye-Yi Wang. 2015. Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'15). 912-921.

[28]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP'15). 1412-1421.

[29]

Andrew McCallum, Ronald Rosenfeld, Tom M. Mitchell, and Andrew Y. Ng. 1998. Improving Text Classification by Shrinkage in a Hierarchy of Classes. In Proceedings of the 15th International Conference on Machine Learning(ICML'98). 359-367.

Digital Library

[30]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 27th Conference on Neural Information Processing Systems(NIPS'13). 3111-3119.

Digital Library

[31]

Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning(ICML'10). 807-814.

Digital Library

[32]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP'14). 1532-1543.

[33]

Woo-Jong Ryu, Jung-Hyun Lee, Kang-Min Kim, and SangKeun Lee. 2017. meCurate: Personalized Curation Service Using a Tiny Text Intelligence. In Proceedings of the 26th International Conference on World Wide Web Companion(WWW'17 Companion). 269-272.

Digital Library

[34]

HaeYong Shin, GeunJae Lee, Woo-Jong Ryu, and SangKeun Lee. 2017. Utilizing Wikipedia Knowledge in Open Directory Project-based Text Classification. In Proceedings of the 32nd Symposium on Applied Computing(SAC'17). 309-314.

Digital Library

[35]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 15, 1 (Jan. 2014), 1929-1958.

Digital Library

[36]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 28th Conference on Neural Information Processing Systems(NIPS'14). 3104-3112.

Digital Library

[37]

Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. 2017. Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence(IJCAI'17). 2915-2921.

Digital Library

[38]

Zhongyuan Wang and Haixun Wang. 2016. Understanding Short Texts. In the 54th Annual Meeting of the Association for Computational Linguistics (Tutorial)(ACL'16).

[39]

Liqiang Xiao, Honglun Zhang, and Wenqing Chen. 2018. Gated Multi-Task Network for Text Classification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT'18). 726-731.

[40]

Yiming Yang. 1999. An evaluation of statistical approaches to text categorization. Inf. Retr. 1, 1 (May 1999), 69-90.

Digital Library

[41]

Honglun Zhang, Liqiang Xiao, Yongkun Wang, and Yaohui Jin. 2017. A generalized recurrent neural architecture for text classification with multi-task learning. In Proceedings of the 26th International Joint Conference on Artificial Intelligence(IJCAI'17). 3385-3391.

Digital Library

[42]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In Proceedings of the 29th Conference on Neural Information Processing Systems(NIPS'15). 649-657.

Digital Library

Cited By

de Jesús Estrella Ramírez JGomez J(2023)Evolutionary learning of selection hyper-heuristics for text classificationApplied Soft Computing10.1016/j.asoc.2023.110721(110721)Online publication date: Aug-2023
https://doi.org/10.1016/j.asoc.2023.110721
Wang MSun RMu H(2022)Modeling and Analysis of Blockchain Trading Network Based on Directed Time Weighted Random WalkBlockchain and Trustworthy Systems10.1007/978-981-16-7993-3_21(275-286)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-981-16-7993-3_21
Dawei WAlfred RObit JOn C(2021)A Literature Review on Text Classification and Sentiment Analysis ApproachesComputational Science and Technology10.1007/978-981-33-4069-5_26(305-323)Online publication date: 16-Mar-2021
https://doi.org/10.1007/978-981-33-4069-5_26
Show More Cited By

Recommendations

Personalizing large-scale text classification by modeling individual differences
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

Large-scale text classification is used to organize and subsequently, analyze textual information into a variety of topics effectively. However, most of existing large-scale text classification models tend to draw similar classification results without ...
Large-scale hierarchical text classification without labelled data
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

The traditional machine learning approaches for text classification often require labelled data for learning classifiers. However, when applied to large-scale classification involving thousands of categories, creating such labelled data is extremely ...
Deep classification in large-scale text hierarchies
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Most classification algorithms are best at categorizing the Web documents into a few categories, such as the top two levels in the Open Directory Project. Such a classification method does not give very detailed topic-related class information for the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '19: The World Wide Web Conference

May 2019

3620 pages

ISBN:9781450366748

DOI:10.1145/3308558

Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '19

WWW '19: The Web Conference

May 13 - 17, 2019

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
842
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

de Jesús Estrella Ramírez JGomez J(2023)Evolutionary learning of selection hyper-heuristics for text classificationApplied Soft Computing10.1016/j.asoc.2023.110721(110721)Online publication date: Aug-2023
https://doi.org/10.1016/j.asoc.2023.110721
Wang MSun RMu H(2022)Modeling and Analysis of Blockchain Trading Network Based on Directed Time Weighted Random WalkBlockchain and Trustworthy Systems10.1007/978-981-16-7993-3_21(275-286)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-981-16-7993-3_21
Dawei WAlfred RObit JOn C(2021)A Literature Review on Text Classification and Sentiment Analysis ApproachesComputational Science and Technology10.1007/978-981-33-4069-5_26(305-323)Online publication date: 16-Mar-2021
https://doi.org/10.1007/978-981-33-4069-5_26
Vallina PLe Pochat VFeal ÁParaschiv MGamba JBurke THohlfeld OTapiador JVallina-Rodriguez N(2020)Mis-shapes, Mistakes, MisfitsProceedings of the ACM Internet Measurement Conference10.1145/3419394.3423660(598-618)Online publication date: 27-Oct-2020
https://dl.acm.org/doi/10.1145/3419394.3423660
Qian CFeng FWen LLin LChua THuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Enhancing Text Classification via Discovering Additional Semantic Clues from LogogramsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401107(1201-1210)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401107
Rexha ADragoni MKern RHuang RWu DMarchionini GHe DCunningham SHansen P(2020)A Neural-based Architecture For Small Datasets ClassificationProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 202010.1145/3383583.3398535(319-327)Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.1145/3383583.3398535
Qian CWen LKumar ALin LLin LZong ZLi SWang J(2020)An Approach for Process Model Extraction by Multi-grained Text ClassificationAdvanced Information Systems Engineering10.1007/978-3-030-49435-3_17(268-282)Online publication date: 3-Jun-2020
https://doi.org/10.1007/978-3-030-49435-3_17

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten