Skip to main content

Enhance AdaBoost Algorithm by Integrating LDA Topic Model

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9714))

Included in the following conference series:

  • 2991 Accesses

Abstract

AdaBoost is an ensemble method, which is considered to be one of the most influential algorithms for multi-label classification. It has been successfully applied to diverse domains for its tremendous simplicity and accurate prediction. To choose the weak hypotheses, AdaBoost has to examine the whole features individually, which will dramatically increase the computational time of classification, especially for large scale datasets. In order to tackle this problem, we a introduce Latent Dirichlet Allocation (LDA) model to improve the efficiency and effectiveness of AdaBoost by mapping word-matrix into topic-matrix. In this paper, we propose a framework integrating LDA and AdaBoost, and test it with two Chinese Language corpora. Experiments show that our method outperforms the traditional AdaBoost using BOW model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.searchforum.org.cn/tansongbo/corpus.htm.

  2. 2.

    http://web.ist.utl.pt/acardoso/datasets/.

  3. 3.

    https://garnize.latin.dcc.ufmg.br/savannah/projects/cadecol.

  4. 4.

    http://sewm.pku.edu.cn/QA/reference/ICTCLAS/FreeICTCLAS/.

  5. 5.

    http://www.esuli.it/software/mpboost/.

  6. 6.

    http://jgibblda.sourceforge.net/.

References

  1. Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 163–222. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Esuli, A., Fagni, T., Sebastiani, F.: MP-Boost: a multiple-pivot boosting algorithm and its application to text categorization. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Ferreira, A.J., Figueiredo, M.A.: Boosting algorithms: a review of methods, theory, and applications. In: Zhang, C., Yunqian, M. (eds.) Ensemble Machine Learning, pp. 35–85. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J. Jap. Soc. Artif. Intell. 14(771–780), 771–780 (1999)

    Google Scholar 

  6. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  7. Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. Pattern Anal. Mach. Intell., IEEE Trans. 6, 721–741 (1984)

    Article  MATH  Google Scholar 

  8. Iwakura, T., Saitou, T., Okamoto, S.: An AdaBoost for efficient use of confidences of weak hypotheses on text categorization. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 782–794. Springer, Heidelberg (2014)

    Google Scholar 

  9. Lee, C., Lee, G.G.: Information gain and divergence-based feature selection for machine learning-based text categorization. Inf. Process. Manage. 42(1), 155–165 (2006)

    Article  Google Scholar 

  10. Morchid, M., Dufour, R., Linares, G.: A lda-based topic classification approach from highly imperfect automatic transcriptions. In: LREC 2014 (2014)

    Google Scholar 

  11. Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)

    Article  MATH  Google Scholar 

  12. Tan, S., Cheng, X., Ghanem, M.M., Wang, B., Xu, H.: A novel refinement approach for text categorization. In: Proceedings of the 14th ACM International Conference on Information and knowledge Management, pp. 469–476. ACM (2005)

    Google Scholar 

  13. Uğuz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl. Based Syst. 24(7), 1024–1032 (2011)

    Article  Google Scholar 

  14. Wang, Y., Guo, Q.: Multi-lda hybrid topic model with boosting strategy and its application in text classification. In: 2014 33rd Chinese Control Conference (CCC), pp. 4802–4806. IEEE (2014)

    Google Scholar 

  15. Xiong, W., Wan, Z., Bai, X., Xing, H., Zuo, H., Zhu, K., Yang, S.: Adaboost-based multi-attribute classification technology and its application. In: 76th EAGE Conference and Exhibition 2014 (2014)

    Google Scholar 

  16. Zhu, J., Zou, H., Rosset, S., Hastie, T.: Multi-class AdaBoost. Stat. Interface 2(3), 349–360 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Science Foundation of China under Grants 61272010.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fangyu Gai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gai, F., Li, Z., Jiang, X., Guo, H. (2016). Enhance AdaBoost Algorithm by Integrating LDA Topic Model. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2016. Lecture Notes in Computer Science(), vol 9714. Springer, Cham. https://doi.org/10.1007/978-3-319-40973-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40973-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40972-6

  • Online ISBN: 978-3-319-40973-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics