An empirical comparison of min–max-modular k-NN with different voting methods to large-scale text categorization

Wu, Ke; Lu, Bao-Liang; Utiyama, Masao; Isahara, Hitoshi

doi:10.1007/s00500-007-0242-3

An empirical comparison of min–max-modular k-NN with different voting methods to large-scale text categorization

Focus
Published: 16 October 2007

Volume 12, pages 647–655, (2008)
Cite this article

Soft Computing Aims and scope Submit manuscript

Ke Wu¹,
Bao-Liang Lu¹,
Masao Utiyama² &
…
Hitoshi Isahara²

114 Accesses
6 Citations
Explore all metrics

Abstract

Text categorization refers to the task of assigning the pre-defined classes to text documents based on their content. k-NN algorithm is one of top performing classifiers on text data. However, there is little research work on the use of different voting methods over text data. Also, when a huge number of training data is available online, the response speed slows down, since a test document has to obtain the distance with each training data. On the other hand, min–max-modular k-NN (M³-k-NN) has been applied to large-scale text categorization. M³-k-NN achieves a good performance and has faster response speed in a parallel computing environment. In this paper, we investigate five different voting methods for k-NN and M³-k-NN. The experimental results and analysis show that the Gaussian voting method can achieve the best performance among all voting methods for both k-NN and M³-k-NN. In addition, M³-k-NN uses less k-value to achieve the better performance than k-NN, and thus is faster than k-NN in a parallel computing environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selection of Relevant Features for Text Classification with K-NN

A parallel feature selection method study for text classification

Article 01 June 2016

Text Categorization with Diversity Random Forests

References

Bergo A (2007) Text categorization and prototypes. (In: http://www.illc.uva.nl/Publications/ResearchReports/MoL-2001-08.text.pdf)
Cover T and Hart P (1967). Nearest neighbor pattern classification. IEEE Trans Inform Theory IT-13(1): 21–27
Article Google Scholar
Dudani S (1976). The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC-6: 325–327
Google Scholar
Fan ZG, Lu BL (2005) Multi-view face recognition with min–max modular svms. In: ICNC (2), pp 396–399
Fix E, Hodges J (1951) Discriminatory analysis, nonparametric discrimination: consistency properties. Technical report, USAF Scholl of aviation and medicine, Randolph Field 4
Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Fisher DH (ed) Proceedings of ICML-97, 14th international conference on machine learning, Morgan Kaufmann Publishers, San Francisco, USA, pp 143–151
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, Springer, Heidelberg, DE, pp 137–142 (Published in the “Lecture Notes in Computer Science” series, number 1398)
Lian HC, Lu BL, Takikawa E, Hosoi S (2005) Gender recognition using a min–max modular support vector machine. In: ICNC (2), pp 438–441
Lewis DD, Yang Y, Rose TG and Li F (2004). Rcv1: A new benchmark collection for text categorization research. J Mach Learn Res 5: 361–397
Google Scholar
Liu FY, Wu K, Zhao H, Lu BL (2005a) Fast text categorization with min–max modular support vector machines. In: IEEE international joint conference on neural networks, vol 1, pp 570–575
Liu TY, Yang Y, Wan H, Zhou Q, Gao B, Zeng HJ, Chen Z, Ma WY (2005b) An experimental study on large-scale web categorization. In: WWW ’05: special interest tracks and posters of the 14th international conference on World Wide Web, ACM Press, New York, NY, USA, pp 1106–1107
Lu BL, Ichikawa M (2000) A Gaussian zero-crossing discriminat function for min–max modular neural networks. In: Proceedings of 5th international conference on knowledge-based intelligent information engineering systems and allied technologies (KES’01), pp 298–302
Lu BL, Ito M (1997) Task decomposition based on class relations: a modular neural network architecture for pattern classification. In: Mira J, Moreno-Diaz R, Cabestany J (eds) Biological and artificial computation: from neuroscience to technology, Lecture Notes in Computer Science, vol 1240. Springer, Heidelberg, pp 330–339
Lu BL and Ito M (1999). Task decomposition and module combination based on class relations: A modular neural network for pattern classification. IEEE Trans Neural Netw 10(5): 1244–1256
Article Google Scholar
Lu BL, Wang KA, Utiyama M, Isahara H (2004a) A part-versus-part method for massively parallel training of support vector machines. In: Proceedings of 2004 IEEE international joint conference on neural networks, pp 735–740
Lu BL, Shin J and Ichikawa M (2004b). Massively parallel classification of single-trial EEG signals using a min–max-modular neural network. IEEE Trans Biomed Eng 3(51): 551–558
Article Google Scholar
Luo J, Lu BL (2006) Gender recognition using a min–max modular support vector machine with equal clustering. In: ISNN (2), pp 210–215
Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering, pp 61–67
Sebastiani F (2002). Machine learning in automated text categorization. ACM Comput Surv 34(1): 1–47
Article Google Scholar
Wang K, Zhao H, Lu BL (2005) Task decomposition using geometric relation for min–max-modular svms. In: ISNN (1), pp 887–892
Yang Y (1999). An evaluation of statistical approaches to text categorization. Inf Retrieval 1(1/2): 69–90
Article Google Scholar
Yang Y and Chute CG (1994). An example-based mapping method for text categorization and retrieval. ACM Trans Inf Syst 12(3): 252–277
Article Google Scholar
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Hearst MA, Gey F, Tong R (eds) Proceedings of SIGIR-99, 22nd ACM international conference on research and development in information retrieval, ACM Press, New York, USA, pp 42–49
Yang Y, Lu BL (2006) Prediction of protein subcellular multi-locations with a min–max modular support vector machine. In: ISNN (2), pp 667–673
Zhao H, Lu BL (2004) A modular k-nearest neighbor classification method for massively parallel text categorization. In: International symposium on computational and information sciences (CIS’04), LNCS, vol 3314, pp 867–872
Zhao H, Lu BL (2006) A modular reduction method for k-nn algorithm with self-recombination learning. In: ISNN (1), pp 537–544

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai, 200240, China
Ke Wu & Bao-Liang Lu
Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hilaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan
Masao Utiyama & Hitoshi Isahara

Authors

Ke Wu
View author publications
You can also search for this author in PubMed Google Scholar
Bao-Liang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Masao Utiyama
View author publications
You can also search for this author in PubMed Google Scholar
Hitoshi Isahara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bao-Liang Lu.

Additional information

The work of K. Wu and B. L. Lu was supported in part by the National Natural Science Foundation of China under the grants NSFC 60375022 and NSFC 60473040, and the Microsoft Laboratory for Intelligent Computing and Intelligent Systems of Shanghai Jiao Tong University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, K., Lu, BL., Utiyama, M. et al. An empirical comparison of min–max-modular k-NN with different voting methods to large-scale text categorization. Soft Comput 12, 647–655 (2008). https://doi.org/10.1007/s00500-007-0242-3

Download citation

Published: 16 October 2007
Issue Date: May 2008
DOI: https://doi.org/10.1007/s00500-007-0242-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical comparison of min–max-modular k-NN with different voting methods to large-scale text categorization

Abstract

Access this article

Similar content being viewed by others

Selection of Relevant Features for Text Classification with K-NN

A parallel feature selection method study for text classification

Text Categorization with Diversity Random Forests

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An empirical comparison of min–max-modular k-NN with different voting methods to large-scale text categorization

Abstract

Access this article

Similar content being viewed by others

Selection of Relevant Features for Text Classification with K-NN

A parallel feature selection method study for text classification

Text Categorization with Diversity Random Forests

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation