Abstract
The pervasiveness of information available on the Internet means that increasing numbers of documents must be classified. Text categorization is not only undertaken by domain experts, but also by automatic text categorization systems. Therefore, a text categorization system with a multi-label classifier is necessary to process the large number of documents. In this study, a proposed multi-label text categorization system is developed to classify multi-label documents. Data mapping is performed to transform data from a high-dimensional space to a lower-dimensional space with paired SVM output values, thus lower the complexity of the computation. A pair-wise comparison approach is applied to set the membership function in each predicted class to judge all possible classified classes. Finally, the overlapped area of two classes is obtained from the decision function to determine where a document is classified. A comparative study is performed on multi-label approaches using Reuter’s data sets. The results of the empirical experiment indicate that the proposed multi-label text categorization system performs better than other methods in terms of overall performance indices. Additionally, the probability of 0.5 for model membership function is a good criterion to judge between correctly and incorrectly classified documents from the results of the empirical experiment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abe, S., Inoue,T. (2002). Fuzzy support vector machines for multiclass problems. In proceedings of 10th European symposium on artificial neural networks (pp. 113-118). Bruges, Belgium, April.
Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757-1771.
Cawley G. (2000). MATLAB Support Vector Machine Toolbox (svm_v0.54).
Chiang, D. A. & Lin, N. P. (1999). Correlation of fuzzy sets. Fuzzy Set and Systems, 102, 221-226.
Elisseeff, A., & Weston, J. (2002). A kernel method for multi-labelled classification. Advances in Neural Information Processing Systems, 14, 681-687.
Egghe, L. & Michel, C. (2003). Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques. Information Processing and Management, 39, 771-807.
Friedman, J. (1996). Another approach to polychotomus classification, Technical report, Department of Statistics, Stanford University, available at http://www-stat.standford.edu.tw/report/friedman/poly.ps.Z.
Haykin, S., 1999. Neural Networks. New Jersey: Practice-Hall Press.
Joachims, T. (1998). Text categorization with support machines: learning with many features. In proceedings 10th Europen Conference on machine learning (ECML) Chemnitz: Springer-Verlag (pp. 137-142).
Kao, T. H. (2006) Advanced parametric mixture model for multi-label text categorization, a dissertation submitted in partial fulfillment of the requirements for the degree of master of science on national Taiwan University.
McCallum, A. K. (1999). Multi-label text classification with classification with a mixture model trained by EM. In proceedings of the AAAI’ 99 Workshop on Text Learning (pp.1-7).
Mill, J. & Inoue, A. (2003). An application of fuzzy support vector machines. Proceeding of the 22nd North American Fuzzy Information Processing Society (pp.302-306). Chicago, Illinois, July 24-26,
Saito, K. (2005). Multiple topic detection by parametric mixture models (PMM)—Automatic web page categorization for browsing. NTT Technical Review, 3(3), 15-18.
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24, 513–523.
Salton, G. (1991). Developments in automatic text retrieval. Science, 30, 974-980.
Schapire, R. & Singer, Y. (2000). BoosTexter : A Boosting-based System for Text Categorization. Machine Learning, 39, 135-168.
Takahashi, F., Abe, S. (2002). Decision-tree-based multiclass support vector machines. Proceedings of the 9th international conference on neural information processing (pp. 1418-1422).
Tanaka, H.,Sakano, H. & Ohtsuka, S. (2004). Retrieval Method for Multi-category Images. Proceedings of the 17th International Conference on Pattern Recognition (pp. 1051-1054).
Tsoumakas, G. & Katakis, I. (2007). Multi-Label Classification: An Overview. International Journal of Data Warehousing and Mining, 3(3), 1-13.
Tsujinishi, D. & Abe, S. (2003). Fuzzy least squares support vector machines for multiclass problems. Neural Networks, 16(5), 785-792.
Ueda, N. & Saito, K. (2003). Parametric mixture models for multi-labeled text. Advances in Neural Information Processing Systems, 15, 721-728.
Wang, X. & Wu, C. (2004). Using membership functions to improve multiclass SVM. In Proceedings 7th International Conference on Signal Processing (pp.1459-1462).
Wang, L., Chang, M. & Feng, J. (2005). Parallel and sequential support vector machines for multi-label classification. International Journal of Information Technology, 11(9), 11-18.
Zhang, M.L. & Zhou, Z.H. (2006). Multi-label neural networks with applications to functional genomics and text categorization. IEEE transactions on knowledge and data engineering, 18(10), 1338-1351.
Zhang, M.L. (2006). The MATLAB package source of BPMLL. http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/annex/BPMLL.htm.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chiang, HM., Wang, TY., Chiang, YM. (2011). Multi-Label Text Categorization Forecasting Probability Problem Using Support Vector Machine Techniques. In: Golinska, P., Fertsch, M., Marx-Gómez, J. (eds) Information Technologies in Environmental Engineering. Environmental Science and Engineering(), vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19536-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-19536-5_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19535-8
Online ISBN: 978-3-642-19536-5
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)