Skip to main content
Log in

An empirical comparison of min–max-modular k-NN with different voting methods to large-scale text categorization

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Text categorization refers to the task of assigning the pre-defined classes to text documents based on their content. k-NN algorithm is one of top performing classifiers on text data. However, there is little research work on the use of different voting methods over text data. Also, when a huge number of training data is available online, the response speed slows down, since a test document has to obtain the distance with each training data. On the other hand, min–max-modular k-NN (M3-k-NN) has been applied to large-scale text categorization. M3-k-NN achieves a good performance and has faster response speed in a parallel computing environment. In this paper, we investigate five different voting methods for k-NN and M3-k-NN. The experimental results and analysis show that the Gaussian voting method can achieve the best performance among all voting methods for both k-NN and M3-k-NN. In addition, M3-k-NN uses less k-value to achieve the better performance than k-NN, and thus is faster than k-NN in a parallel computing environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bergo A (2007) Text categorization and prototypes. (In: http://www.illc.uva.nl/Publications/ResearchReports/MoL-2001-08.text.pdf)

  2. Cover T and Hart P (1967). Nearest neighbor pattern classification. IEEE Trans Inform Theory IT-13(1): 21–27

    Article  Google Scholar 

  3. Dudani S (1976). The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC-6: 325–327

    Google Scholar 

  4. Fan ZG, Lu BL (2005) Multi-view face recognition with min–max modular svms. In: ICNC (2), pp 396–399

  5. Fix E, Hodges J (1951) Discriminatory analysis, nonparametric discrimination: consistency properties. Technical report, USAF Scholl of aviation and medicine, Randolph Field 4

  6. Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Fisher DH (ed) Proceedings of ICML-97, 14th international conference on machine learning, Morgan Kaufmann Publishers, San Francisco, USA, pp 143–151

  7. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, Springer, Heidelberg, DE, pp 137–142 (Published in the “Lecture Notes in Computer Science” series, number 1398)

  8. Lian HC, Lu BL, Takikawa E, Hosoi S (2005) Gender recognition using a min–max modular support vector machine. In: ICNC (2), pp 438–441

  9. Lewis DD, Yang Y, Rose TG and Li F (2004). Rcv1: A new benchmark collection for text categorization research. J Mach Learn Res 5: 361–397

    Google Scholar 

  10. Liu FY, Wu K, Zhao H, Lu BL (2005a) Fast text categorization with min–max modular support vector machines. In: IEEE international joint conference on neural networks, vol 1, pp 570–575

  11. Liu TY, Yang Y, Wan H, Zhou Q, Gao B, Zeng HJ, Chen Z, Ma WY (2005b) An experimental study on large-scale web categorization. In: WWW ’05: special interest tracks and posters of the 14th international conference on World Wide Web, ACM Press, New York, NY, USA, pp 1106–1107

  12. Lu BL, Ichikawa M (2000) A Gaussian zero-crossing discriminat function for min–max modular neural networks. In: Proceedings of 5th international conference on knowledge-based intelligent information engineering systems and allied technologies (KES’01), pp 298–302

  13. Lu BL, Ito M (1997) Task decomposition based on class relations: a modular neural network architecture for pattern classification. In: Mira J, Moreno-Diaz R, Cabestany J (eds) Biological and artificial computation: from neuroscience to technology, Lecture Notes in Computer Science, vol 1240. Springer, Heidelberg, pp 330–339

  14. Lu BL and Ito M (1999). Task decomposition and module combination based on class relations: A modular neural network for pattern classification. IEEE Trans Neural Netw 10(5): 1244–1256

    Article  Google Scholar 

  15. Lu BL, Wang KA, Utiyama M, Isahara H (2004a) A part-versus-part method for massively parallel training of support vector machines. In: Proceedings of 2004 IEEE international joint conference on neural networks, pp 735–740

  16. Lu BL, Shin J and Ichikawa M (2004b). Massively parallel classification of single-trial EEG signals using a min–max-modular neural network. IEEE Trans Biomed Eng 3(51): 551–558

    Article  Google Scholar 

  17. Luo J, Lu BL (2006) Gender recognition using a min–max modular support vector machine with equal clustering. In: ISNN (2), pp 210–215

  18. Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering, pp 61–67

  19. Sebastiani F (2002). Machine learning in automated text categorization. ACM Comput Surv 34(1): 1–47

    Article  Google Scholar 

  20. Wang K, Zhao H, Lu BL (2005) Task decomposition using geometric relation for min–max-modular svms. In: ISNN (1), pp 887–892

  21. Yang Y (1999). An evaluation of statistical approaches to text categorization. Inf Retrieval 1(1/2): 69–90

    Article  Google Scholar 

  22. Yang Y and Chute CG (1994). An example-based mapping method for text categorization and retrieval. ACM Trans Inf Syst 12(3): 252–277

    Article  Google Scholar 

  23. Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Hearst MA, Gey F, Tong R (eds) Proceedings of SIGIR-99, 22nd ACM international conference on research and development in information retrieval, ACM Press, New York, USA, pp 42–49

  24. Yang Y, Lu BL (2006) Prediction of protein subcellular multi-locations with a min–max modular support vector machine. In: ISNN (2), pp 667–673

  25. Zhao H, Lu BL (2004) A modular k-nearest neighbor classification method for massively parallel text categorization. In: International symposium on computational and information sciences (CIS’04), LNCS, vol 3314, pp 867–872

  26. Zhao H, Lu BL (2006) A modular reduction method for k-nn algorithm with self-recombination learning. In: ISNN (1), pp 537–544

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bao-Liang Lu.

Additional information

The work of K. Wu and B. L. Lu was supported in part by the National Natural Science Foundation of China under the grants NSFC 60375022 and NSFC 60473040, and the Microsoft Laboratory for Intelligent Computing and Intelligent Systems of Shanghai Jiao Tong University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, K., Lu, BL., Utiyama, M. et al. An empirical comparison of min–max-modular k-NN with different voting methods to large-scale text categorization. Soft Comput 12, 647–655 (2008). https://doi.org/10.1007/s00500-007-0242-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-007-0242-3

Keywords

Navigation