Abstract
Hierarchical text classification has a wide application prospect on the Internet, which aims to classify texts into a given hierarchy. Supervised methods require a large amount of labeled data and are thus costly. For this purpose, the task of dataless hierarchical text classification has attracted more and more attention of researchers in recent years, which only requires a few relevant seed words for given categories. However, existing approaches mainly focus on long texts without considering the characteristics of short texts, so are not suitable in many scenarios. In this paper, we tackle dataless hierarchical short text classification for the first time, and propose an innovative model named Hierarchical Seeded Biterm Topic Model (HierSeedBTM), which effectively leverages seed words in Biterm Topic Model (BTM) to guide the hierarchical topic labeling. Specifically, our model introduces iterative distribution propagation mechanism among topic models in different levels to incorporate the hierarchical structure information. Experiments on two public datasets show that the proposed model is more effective than the state-of-the-art methods of dataless hierarchical text classification designed for long texts.
This work is supported by National Key Research and Development Program of China (2018YFC0116703), Strategic Priority Research Program of Chinese Academy of Sciences (XDC02060500), and Zhejiang Lab (2020NF0AC02).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Bennett, P., Nguyen, N.: Refined experts: improving classification in large taxonomies. In: SIGIR, pp. 11–18. ACM (2009)
Chen, H., Dumais, S.T.: Bringing order to the web: automatically categorizing search results. In: CHI, pp. 145–152. ACM (2000)
Chen, W., Wang, J., Zhang, Y., Yan, H., Li, X.: User based aggregation for biterm topic model. In: ACL, vol. 2 (Short Papers), pp. 489–494 (2015)
Chen, X., Xia, Y., Jin, P., Carroll, J.: Dataless text classification with descriptive LDA. In: AAAI, pp. 2224–2231 (2015)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186. Association for Computational Linguistics (2019)
Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: SIGIR, pp. 595–602. ACM (2008)
Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: SIGIR, pp. 256–263. ACM (2000)
Jiang, L., Lu, H., Xu, M., Wang, C.: Biterm pseudo document topic model for short text. In: ICTAI, pp. 865–872. IEEE (2016)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: ICML, pp. 170–178. Morgan Kaufmann (1997)
Lang, K.: Newsweeder: learning to filter netnews. In: ICML, pp. 331–339. Morgan Kaufmann (1995)
Li, C., Chen, S., Qi, Y.: Filtering and classifying relevant short text with a few seed words. Data Inf. Manag. 3(3), 165–186 (2019)
Li, C., Chen, S., Xing, J., Sun, A., Ma, Z.: Seed-guided topic model for document filtering and classification. ACM Trans. Inf. Syst. 37(1), 9:1–9:37 (2019)
Li, C., Xing, J., Sun, A., Ma, Z.: Effective document labeling with very few seed words: a topic model approach. In: CIKM, pp. 85–94. ACM (2016)
Li, X., Zhang, A., Li, C., Guo, L., Wang, W., Ouyang, J.: Relational biterm topic model: short-text topic modeling using word embeddings. Comput. J. 62(3), 359–372 (2018)
Liu, B., Li, X., Lee, W.S., Yu, P.S.: Text classification by labeling words. In: AAAI, vol. 4, pp. 425–430 (2004)
Mekala, D., Shang, J.: Contextualized weak supervision for text classification. In: ACL, pp. 323–333. Association for Computational Linguistics (2020)
Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: CIKM, pp. 983–992. ACM (2018)
Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised hierarchical text classification, vol. 33, no. 01, pp. 6826–6833 (2019)
Misra, R.: News category dataset (2018). https://doi.org/10.13140/RG.2.2.20331.18729
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: EMNLP, pp. 79–86 (2002)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543. Association for Computational Linguistics (2014)
Quan, X., Kit, C., Ge, Y., Pan, S.J.: Short and sparse text topic modeling via self-aggregation. In: IJCAI, pp. 2270–2276 (2015)
Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI, pp. 1579–1585. AAAI Press (2014)
Tang, D., Qin, B., Liu, T.: EMNLP, pp. 1422–1432. The Association for Computational Linguistics (2015)
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential Twitterers. In: WSDM, pp. 261–270. ACM (2010)
Xiao, H., Liu, X., Song, Y.: Efficient path prediction for semi-supervised and weakly supervised hierarchical text classification. In: WWW, pp. 3370–3376. ACM (2019)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: WWW, pp. 1445–1456. ACM (2013)
Yang, Y., et al.: Dataless short text classification based on biterm topic model and word embeddings. In: Bessiere, C. (ed.) International Joint Conferences on Artificial Intelligence Organization, IJCAI, pp. 3969–3975 (2020)
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242. ACM (2014)
Zhao, W.X., et al.: Comparing twitter and traditional media using topic models. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Y., Wang, H., Zhu, J., Shi, W., Guo, W., Zhang, J. (2021). Effective Seed-Guided Topic Labeling for Dataless Hierarchical Short Text Classification. In: Brambilla, M., Chbeir, R., Frasincar, F., Manolescu, I. (eds) Web Engineering. ICWE 2021. Lecture Notes in Computer Science(), vol 12706. Springer, Cham. https://doi.org/10.1007/978-3-030-74296-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-74296-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74295-9
Online ISBN: 978-3-030-74296-6
eBook Packages: Computer ScienceComputer Science (R0)