Skip to main content

A Fast Subspace Text Categorization Method Using Parallel Classifiers

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2012)

Abstract

In today’s world, the number of electronic documents made available to us is increasing day by day. It is therefore important to look at methods which speed up document search and reduce classifier training times. The data available to us is frequently divided into several broad domains with many sub-category levels. Each of these domains of data constitutes a subspace which can be processed separately. In this paper, separate classifiers of the same type are trained on different subspaces and a test vector is assigned to a subspace using a fast novel method of subspace detection. This parallel classifier architecture was tested with a wide variety of basic classifiers and the performance compared with that of a single basic classifier on the full data space. It was observed that the improvement in subspace learning was accompanied by a very significant reduction in training times for all types of classifiers used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Friedman, J.H.: On Bias, Variance, 0/1—Loss, and the Curse-of- Dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)

    Article  Google Scholar 

  2. Parsons, L., Haque, E., Liu, H.: Subspace Clustering for High Dimensional Data: A Review. ACM SIGKDD Explorations Newsletter 6(1), 90–105 (2004)

    Article  Google Scholar 

  3. Varshney, K.R., Willsky, A.S. : Learning dimensionality-reduced classifiers for information fusion. In: Proceedings of the 12th International Conference on Information Fusion, pp. 1881–1888 (July 2009)

    Google Scholar 

  4. Fradkin, D., Madigan, D.: Experiments with Random Projections for Machine Learning. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522 (2003)

    Google Scholar 

  5. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    Article  Google Scholar 

  6. Yaslan, Y., Cataltepe, Z.: Co-training with relevant random subspaces. Neurocomputing 73, 1652–1661 (2010)

    Article  Google Scholar 

  7. Garcia-Pedrajas, N., Ortiz-Boyer, D.: Boosting Random Subspace Method, vol. 21, pp. 1344–1362 (2008)

    Google Scholar 

  8. Kotsiantis, S.B.: Local Random Subspace Method for constructing multiple decision stumps. In: International Conference on Information and Financial Engineering, pp. 125–129 (2009)

    Google Scholar 

  9. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  10. Schapire, R.E.: The boosting approach to machine learning: An overview. In: Nonlinear Estimation and Classification. Lecture Notes in Statist., vol. 171, pp. 149–171. Springer, New York (2003)

    Google Scholar 

  11. Al-Kofahi, K., et al.: Combining multiple classifiers for text categorization. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 97–104 (2001)

    Google Scholar 

  12. Ruiz, M.G., Srinivasan, P.: Hierarchical Neural Networks for Text Categorization. In: SIGIR 1999 (1999)

    Google Scholar 

  13. Estabrooks, A., Japkowicz, N.: A mixture-of-experts framework for text classification. In: Proceedings of the 2001 Workshop on Computational Natural Language Learning, Toulouse, France, July 6-7, vol. 7, pp. 1–8 (2001)

    Google Scholar 

  14. Tripathi, N., et al.: Semantic Subspace Learning with Conditional Significance Vectors. In: Proceedings of the IEEE International Joint Conference on Neural Networks, Barcelona, pp. 3670–3677 (July 2010)

    Google Scholar 

  15. Wermter, S., Panchev, C., Arevian, G.: Hybrid Neural Plausibility Networks for News Agents. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence, pp. 93–98 (1999)

    Google Scholar 

  16. Wermter, S.: Hybrid Connectionist Natural Language Processing. Chapman and Hall (1995)

    Google Scholar 

  17. Rose, T., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1 - from Yesterday’s News to Tomorrow’s Language Resources. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002), pp. 827–833 (2002)

    Google Scholar 

  18. Zeimpekis, D., Gallopoulos, E.: Generating Term Document Matrices from Text Collections. In: Kogan, J., Nicholas, C. (eds.) Grouping Multidimensional Data: Recent Advances in Clustering, Springer, Heidelberg (2005)

    Google Scholar 

  19. Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  20. Hall, M., et al.: The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)

    Article  Google Scholar 

  21. Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  22. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  23. Verma, B.: Fast training of multilayer perceptrons. IEEE Transactions on Neural Networks 8(6), 1314–1320 (1997)

    Article  Google Scholar 

  24. Zhang, H., Su, J.: Naive Bayes for Optimal ranking. Journal of Experimental and Theoretical Artificial Intelligence 20(2), 79–93 (2008)

    Article  MATH  Google Scholar 

  25. Pernkopf, F.: Discriminative learning of Bayesian network classifiers. In: Proceedings of the 25th IASTED International Multi-Conference: Artificial Intelligence and Applications, pp. 422–427 (2007)

    Google Scholar 

  26. Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: Shavlik, J. (ed.) Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann Publishers (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tripathi, N., Oakes, M., Wermter, S. (2012). A Fast Subspace Text Categorization Method Using Parallel Classifiers. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28601-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28600-1

  • Online ISBN: 978-3-642-28601-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics