Abstract
Every day, an enormous amount of text data is produced. Sources of text data include news, social media, emails, text messages, medical reports, scientific publications and fiction. To keep track of this data, there are categories, key words, tags or labels that are assigned to each text. Automatically predicting such labels is the task of multi-label text classification. Often however, we are interested in more than just the pure classification: rather, we would like to understand which parts of a text belong to the label, which words are important for the label or which labels occur together. Because of this, topic models may be used for multi-label classification as an interpretable model that is flexible and easily extensible. This survey demonstrates the manifold possibilities and flexibility of the topic model framework for the complex setting of multi-label text classification by categorizing different variants of models.
- C. E. Antoniak. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Annals of Statistics, 2(6):1152--1174, 11 1974.Google ScholarCross Ref
- A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI '09, pages 27--34, Arlington, Virginia, United States, 2009. AUAI Press.Google Scholar
- P. Bhattacharya, M. B. Zafar, N. Ganguly, S. Ghosh, and K. P. Gummadi. Inferring user interests in the twitter social network. In Proceedings of the 8th ACM Conference on Recommender systems, pages 357--360. ACM, 2014.Google ScholarDigital Library
- C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, January 2003.Google ScholarDigital Library
- M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757--1771, March 2004.Google ScholarCross Ref
- W. Buntine and M. Hutter. A Bayesian view of the Poisson-Dirichlet process. arXiv preprint arXiv:1007.0296, 2010.Google Scholar
- S. Burkhardt. Online Multi-label Text Classification Using Topic Models. PhD thesis, Johannes Gutenberg- Universit¨at Mainz, 2018.Google Scholar
- S. Burkhardt and S. Kramer. Multi-label classification using stacked hierarchical dirichlet processes with reduced sampling complexity. In ICBK 2017 - International Conference on Big Knowledge, pages 1--8, Hefei, China, 2017. IEEE.Google ScholarCross Ref
- S. Burkhardt and S. Kramer. Online sparse collapsed hybrid variational-gibbs algorithm for hierarchical dirichlet process topic models. In M. Ceci, J. Hollm´en, L. Todorovski, C. Vens, and S. D'zeroski, editors, Machine Learning and Knowledge Discovery in Databases, pages 189--204, Cham, 2017. Springer International Publishing.Google Scholar
- S. Burkhardt and S. Kramer. Multi-label classification using stacked hierarchical dirichlet processes with reduced sampling complexity. Knowledge and Information Systems, pages 1--23, 2018.Google Scholar
- S. Burkhardt and S. Kramer. Multi-label classification using stacked hierarchical dirichlet processes with reduced sampling complexity. Knowledge and Information Systems, pages 1--23, 2018.Google Scholar
- S. Burkhardt and S. Kramer. Online multi-label dependency topic models for text classification. Machine Learning, 107(5):859--886, May 2018.Google ScholarDigital Library
- S. Burkhardt and S. Kramer. Online multi-label dependency topic models for text classification. Machine Learning, 107(5):859--886, May 2018. SIGKDD Explorations Volume 21, Issue 1 Page 77Google ScholarDigital Library
- S. Burkhardt and S. Kramer. Decoupling sparsity and smoothness in the dirichlet variational autoencoder topic model. Journal of Machine Learning Research, 20(131):1--27, 2019.Google Scholar
- S. Burkhardt, J. Siekiera, J. Glodde, M. A. Andrade- Navarro, and S. Kramer. Towards identifying drug side effects from social media using active learning and crowd sourcing. In Pacific Symposium of Biocomputing (PSB), page accepted, 2020.Google Scholar
- S. Burkhardt, J. Siekiera, and S. Kramer. Semisupervised bayesian active learning for text classification. In Bayesian Deep Learning Workshop at NeurIPS, 2018.Google Scholar
- K. Canini, L. Shi, and T. Griffiths. Online inference of topics with latent dirichlet allocation. In D. van Dyk and M.Welling, editors, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 65-- 72, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16--18 Apr 2009. PMLR.Google Scholar
- O. Capp´e and E. Moulines. On-line expectationmaximization algorithm for latent data models. Journal of the Royal Statistical Society Series B, 71(3):593--613, 2009.Google ScholarCross Ref
- C. Chen, L. Du, andW. Buntine. Sampling table configurations for the hierarchical Poisson-Dirichlet process. In D. Gunopulos, T. Hofmann, D. Malerba, and M. Vazirgiannis, editors, Proc. of ECML-PKDD, pages 296--311, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.Google Scholar
- J. Chuang, D. Ramage, C. Manning, and J. Heer. Interpretation and trust: Designing model-driven visualizations for text analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '12, pages 443--452, New York, NY, USA, 2012. ACM.Google ScholarDigital Library
- R. Cohen and D. Ruths. Classifying political orientation on twitter: It's not easy! In Seventh International AAAI Conference on Weblogs and Social Media, 2013.Google Scholar
- J. Foulds, L. Boyles, C. DuBois, P. Smyth, and M.Welling. Stochastic collapsed variational bayesian inference for latent dirichlet allocation. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '13, pages 446--454, New York, NY, USA, 2013. ACM.Google ScholarDigital Library
- J. F¨ urnkranz, E. H¨ ullermeier, E. Loza Menc´a, and K. Brinker. Multilabel classification via calibrated label ranking. Machine Learning, 73(2):133--153, Nov 2008.Google ScholarDigital Library
- H. Gouk, B. Pfahringer, and M. J. Cree. Learning distance metrics for multi-label classification. In 8th Asian Conference on Machine Learning, volume 63, pages 318--333, 2016.Google Scholar
- T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proc. of the National Academy of Sciences of the United States of America, volume 101, pages 5228--5235. National Acad Sciences, 2004.Google Scholar
- S. Gururangan, T. Dang, D. Card, andN.A. Smith.Variational pretraining for semi-supervised text classification. arXiv preprint arXiv:1906.02242, 2019.Google Scholar
- M. D. Hoffman, D. M. Blei, and F. R. Bach. Online learning for latent dirichlet allocation. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 856--864. Curran Associates, Inc., 2010.Google Scholar
- M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic variational inference. Journal of Machine Learning Research, 14:1303--1347, May 2013.Google ScholarDigital Library
- Y. Hu, J. Boyd-Graber, B. Satinoff, and A. Smith. Interactive topic modeling. Machine Learning, 95(3):423--469, Jun 2014.Google ScholarDigital Library
- J. Jagarlamudi, H. Daum´e III, and R. Udupa. Incorporating lexical priors into topic models. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 204--213. Association for Computational Linguistics, 2012.Google ScholarDigital Library
- T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning, pages 137--142. Springer, 1998.Google ScholarDigital Library
- N. Johri, D. Ramage, D. A. McFarland, and D. Jurafsky. A study of academic collaboration in computational linguistics with latent mixtures of authors. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH '11, pages 124--132, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.Google ScholarDigital Library
- I. Katakis, G. Tsoumakas, and I. Vlahavas. Multilabel text classification for automated tag suggestion. In ECML-PKDD discovery challenge, volume 75, 2008.Google Scholar
- S. Lacoste-Julien, F. Sha, and M. I. Jordan. Disclda: Discriminative learning for dimensionality reduction and classification. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 897--904. Curran Associates, Inc., 2009.Google Scholar
- J. D. Lafferty and D. M. Blei. Correlated topic models. In Advances in neural information processing systems, pages 147--154, 2006.Google Scholar
- A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 891--900, New York, NY, USA, 2014. ACM.Google ScholarDigital Library
- W. Li. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. PhD thesis, University of Massachusetts Amherst, April 2007.Google Scholar
- W. Li and A. McCallum. Pachinko allocation: Dagstructured mixture models of topic correlations. In Proceedings of the 23rd international conference on Machine learning, ICML '06, pages 577--584, New York, NY, USA, 2006. ACM.Google ScholarDigital Library
- X. Li, J. Ouyang, and X. Zhou. Supervised topic models for multi-label classification. Neurocomput., 149(PB):811-- 819, Feb. 2015. SIGKDD Explorations Volume 21, Issue 1 Page 78Google Scholar
- E. Loza Menc´a and J. F¨ urnkranz. Efficient multilabel classification algorithms for large-scale problems in the legal domain. In E. Francesconi, S. Montemagni, W. Peters, and D. Tiscornia, editors, Semantic Processing of Legal Texts -- Where the Language of Law Meets the Law of Language, volume 6036 of Lecture Notes in Artificial Intelligence, pages 192--215. Springer-Verlag, 1 edition, May 2010.Google Scholar
- X.-L. Mao, Z.-Y. Ming, T.-S. Chua, S. Li, H. Yan, and X. Li. Sshlda: a semi-supervised hierarchical topic model. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pages 800--809. Association for Computational Linguistics, 2012.Google ScholarDigital Library
- J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172. ACM, 2013.Google ScholarDigital Library
- J. D. Mcauliffe and D. M. Blei. Supervised topic models. In Advances in neural information processing systems, pages 121--128, 2008.Google ScholarDigital Library
- D. A. McFarland, D. Ramage, J. Chuang, J. Heer, C. D. Manning, and D. Jurafsky. Differentiating language usage through topic models. Poetics, 41(6):607 -- 625, 2013.Google ScholarCross Ref
- D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI'08, pages 411--418, Arlington, Virginia, United States, 2008. AUAI Press.Google ScholarDigital Library
- A. Mukherjee and B. Liu. Aspect extraction through semi-supervised modeling. In Proceedings of the 50th annual meeting of the association for computational linguistics: Long papers-volume 1, pages 339--348. Association for Computational Linguistics, 2012.Google ScholarDigital Library
- K. P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.Google ScholarDigital Library
- J. Nam, J. Kim, E. Loza Menc´a, I. Gurevych, and J. F¨ urnkranz. Large-scale multi-label text classification - revisiting neural networks. In T. Calders, F. Esposito, E.H¨ ullermeier, and R. Meo, editors, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pages 437--452, Berlin Heidelberg, 2014. Springer.Google ScholarDigital Library
- C.-T. Nguyen, D.-C. Zhan, and Z.-H. Zhou. Multi-modal image annotation with multi-instance multi-label lda. In Twenty-Third International Joint Conference on Artificial Intelligence, 2013.Google ScholarDigital Library
- D. Padmanabhan, S. Bhat, S. Shevade, and Y. Narahari. Topic model based multi-label classification. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pages 996--1003, Nov 2016.Google ScholarCross Ref
- R. Panda, A. Pensia,N. Mehta, M. Zhou, and P. Rai. Deep topic models for multi-label learning. In K. Chaudhuri and M. Sugiyama, editors, Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 2849--2857. PMLR, 16--18 Apr 2019.Google Scholar
- E. Papagiannopoulou, Y. Papanikolaou, D. Dimitriadis, S. Lagopoulos, G. Tsoumakas, M. Laliotis, N. Markantonatos, and I. Vlahavas. Large-scale semantic indexing and question answering in biomedicine. In Proceedings of the Fourth BioASQ workshop, pages 50--54, Berlin, Germany, Aug. 2016. Association for Computational Linguistics.Google ScholarCross Ref
- Y. Papanikolaou, J. R. Foulds, T. N. Rubin, and G. Tsoumakas. Dense distributions from sparse samples: Improved gibbs sampling parameter estimators for lda. Journal of Machine Learning Research, 18(62):1--58, 2017.Google Scholar
- A. J. Perotte, F.Wood, N. Elhadad, and N. Bartlett. Hierarchically supervised latent dirichlet allocation. In Advances in Neural Information Processing Systems, pages 2609--2617, 2011.Google Scholar
- Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 263--272, New York, NY, USA, 2014. ACM.Google ScholarDigital Library
- D. Quercia, H. Askham, and J. Crowcroft. Tweetlda: supervised topic classification and link prediction in twitter. In Proceedings of the 4th Annual ACMWeb Science Conference, pages 247--250. ACM, 2012.Google ScholarDigital Library
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. 2019.Google Scholar
- D. Ramage, S. Dumais, and D. Liebling. Characterizing microblogs with topic models. In Fourth International AAAI Conference on Weblogs and Social Media, 2010.Google Scholar
- D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP '09, pages 248--256, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.Google ScholarDigital Library
- D. Ramage, P. Heymann, C. D. Manning, and H. Garcia- Molina. Clustering the tagged web. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM '09, pages 54--63, New York, NY, USA, 2009. ACM.Google ScholarDigital Library
- D. Ramage, C. D. Manning, and S. Dumais. Partially Labeled Topic Models for Interpretable Text Mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 457--465, New York, NY, USA, 2011. ACM.Google ScholarDigital Library
- D. Ramage, C. D. Manning, and D. A. Mcfarland. Which universities lead and lag? toward university rankings based on scholarly output. In In Proc. of NIPSWorkshop on Computational Social Science and theWisdom of the Crowds, 2010.Google Scholar
- A. Rodr´guez, D. B. Dunson, and A. E. Gelfand. The Nested Dirichlet Process. Journal of the American Statistical Association, 103(483):1131--1154, 2008. SIGKDD Explorations Volume 21, Issue 1 Page 79Google ScholarCross Ref
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P.Smyth. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI '04, pages 487--494, Arlington, Virginia, United States, 2004. AUAI Press.Google ScholarDigital Library
- T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers. Statistical topic models for multi-label document classification. Machine Learning, 88(1--2):157--208, July 2012.Google Scholar
- R. Salakhutdinov, J. B. Tenenbaum, and A. Torralba. Learning with hierarchical-deep models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1958--1971, 2013.Google ScholarDigital Library
- F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34(1):1--47, Mar. 2002.Google ScholarDigital Library
- M. Shimosaka, T. Tsukiji, S. Tominaga, and K. Tsubouchi. Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling. In P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken, editors, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), Lecture Notes in Computer Science, vol. 9852, pages 230-- 246, Cham, 2016. Springer International Publishing.Google ScholarCross Ref
- A. Srivastava and C. Sutton. Autoencoding variational inference for topic models. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.Google Scholar
- Y.W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.Google ScholarCross Ref
- Y.W. Teh, D.Newman, and M.Welling.Acollapsed variational bayesian inference algorithm for latent dirichlet allocation. In B. Sch¨olkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 1353--1360. MIT Press, 2007.Google Scholar
- G. Tsoumakas and I. Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 3(3):1--13, 2007.Google Scholar
- G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multilabel data. In O. Maimon and L. Rokach, editors, Data Mining and Knowledge Discovery Handbook, pages 667-- 685. Springer US, 2010.Google Scholar
- G. Tsoumakas, I. Katakis, and I. P. Vlahavas. Effective and Efficient Multilabel Classification in Domains with Large Number of Labels. In ECML/PKDD 2008Workshop on Mining Multidimensional Data, 2008.Google Scholar
- H. Wang, M. Huang, and X. Zhu. A generative probabilistic model for multi-label classification. In Eighth IEEE International Conference on Data Mining, pages 628-- 637. IEEE, Dec 2008.Google ScholarDigital Library
- J.Wicker, B. Pfahringer, and S. Kramer. Multi-label classification using boolean matrix decomposition. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC '12, pages 179--186, NewYork, NY, USA, 2012. ACM.Google ScholarDigital Library
- J. Wicker, A. Tyukin, and S. Kramer. A Nonlinear Label Compression and Transformation Method for Multilabel Classification Using Autoencoders, pages 328--340. Springer International Publishing, Cham, 2016.Google Scholar
- J. Zhang, Z. Ghahramani, and Y. Yang. A probabilistic model for online document clustering with application to novelty detection. In Advances in neural information processing systems, pages 1617--1624, 2005.Google ScholarDigital Library
- L. Zhang, S. K. Shah, and I. A. Kakadiaris. Hierarchical multi-label classification using fully associative ensemble learning. Pattern Recognition, 70:89--103, 2017.Google ScholarCross Ref
- M.-L. Zhang and Z.-H. Zhou. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering, 26(8):1819--1837, 2013.Google Scholar
- M.-L. Zhang and Z.-H. Zhou. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering, 26(8):1819--1837, 2014.Google Scholar
- Y. Zhang, J. Ma, Z. Wang, and B. Chen. Lf-lda: A topic model for multi-label classification. In International Conference on Emerging Internetworking, Data &Web Technologies, pages 618--628. Springer, 2017.Google Scholar
- J. Zhu, A. Ahmed, and E. P. Xing. Medlda: Maximum margin supervised topic models for regression and classification. In Proceedings of the 26th International Conference on Machine Learning, ICML '09, pages 1257--1264, New York, NY, USA, 2009. ACM.Google ScholarDigital Library
Index Terms
- A Survey of Multi-Label Topic Models
Recommendations
Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementExtracting parts of a text document relevant to a class label is a critical information retrieval task. We propose a semi-supervised multi-label topic model for jointly achieving document and sentence-level class inferences. Under our model, each ...
Statistical topic models for multi-label document classification
Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total ...
Supervised topic models for multi-label classification
Recently, some publications indicated that the generative modeling approaches, i.e., topic models, achieved appreciated performance on multi-label classification, especially for skewed data sets. In this paper, we develop two supervised topic models for ...
Comments