Abstract
In today’s world, the number of electronic documents made available to us is increasing day by day. It is therefore important to look at methods which speed up document search and reduce classifier training times. The data available to us is frequently divided into several broad domains with many sub-category levels. Each of these domains of data constitutes a subspace which can be processed separately. In this paper, separate classifiers of the same type are trained on different subspaces and a test vector is assigned to a subspace using a fast novel method of subspace detection. This parallel classifier architecture was tested with a wide variety of basic classifiers and the performance compared with that of a single basic classifier on the full data space. It was observed that the improvement in subspace learning was accompanied by a very significant reduction in training times for all types of classifiers used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Friedman, J.H.: On Bias, Variance, 0/1—Loss, and the Curse-of- Dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)
Parsons, L., Haque, E., Liu, H.: Subspace Clustering for High Dimensional Data: A Review. ACM SIGKDD Explorations Newsletter 6(1), 90–105 (2004)
Varshney, K.R., Willsky, A.S. : Learning dimensionality-reduced classifiers for information fusion. In: Proceedings of the 12th International Conference on Information Fusion, pp. 1881–1888 (July 2009)
Fradkin, D., Madigan, D.: Experiments with Random Projections for Machine Learning. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522 (2003)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Yaslan, Y., Cataltepe, Z.: Co-training with relevant random subspaces. Neurocomputing 73, 1652–1661 (2010)
Garcia-Pedrajas, N., Ortiz-Boyer, D.: Boosting Random Subspace Method, vol. 21, pp. 1344–1362 (2008)
Kotsiantis, S.B.: Local Random Subspace Method for constructing multiple decision stumps. In: International Conference on Information and Financial Engineering, pp. 125–129 (2009)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Schapire, R.E.: The boosting approach to machine learning: An overview. In: Nonlinear Estimation and Classification. Lecture Notes in Statist., vol. 171, pp. 149–171. Springer, New York (2003)
Al-Kofahi, K., et al.: Combining multiple classifiers for text categorization. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 97–104 (2001)
Ruiz, M.G., Srinivasan, P.: Hierarchical Neural Networks for Text Categorization. In: SIGIR 1999 (1999)
Estabrooks, A., Japkowicz, N.: A mixture-of-experts framework for text classification. In: Proceedings of the 2001 Workshop on Computational Natural Language Learning, Toulouse, France, July 6-7, vol. 7, pp. 1–8 (2001)
Tripathi, N., et al.: Semantic Subspace Learning with Conditional Significance Vectors. In: Proceedings of the IEEE International Joint Conference on Neural Networks, Barcelona, pp. 3670–3677 (July 2010)
Wermter, S., Panchev, C., Arevian, G.: Hybrid Neural Plausibility Networks for News Agents. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence, pp. 93–98 (1999)
Wermter, S.: Hybrid Connectionist Natural Language Processing. Chapman and Hall (1995)
Rose, T., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1 - from Yesterday’s News to Tomorrow’s Language Resources. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002), pp. 827–833 (2002)
Zeimpekis, D., Gallopoulos, E.: Generating Term Document Matrices from Text Collections. In: Kogan, J., Nicholas, C. (eds.) Grouping Multidimensional Data: Recent Advances in Clustering, Springer, Heidelberg (2005)
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Hall, M., et al.: The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Verma, B.: Fast training of multilayer perceptrons. IEEE Transactions on Neural Networks 8(6), 1314–1320 (1997)
Zhang, H., Su, J.: Naive Bayes for Optimal ranking. Journal of Experimental and Theoretical Artificial Intelligence 20(2), 79–93 (2008)
Pernkopf, F.: Discriminative learning of Bayesian network classifiers. In: Proceedings of the 25th IASTED International Multi-Conference: Artificial Intelligence and Applications, pp. 422–427 (2007)
Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: Shavlik, J. (ed.) Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann Publishers (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tripathi, N., Oakes, M., Wermter, S. (2012). A Fast Subspace Text Categorization Method Using Parallel Classifiers. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-28601-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)