Skip to main content
Log in

Automatic folder allocation system using Bayesian-support vector machines hybrid classification approach

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper proposes an automatic folder allocation system for text documents through the implementation of a hybrid classification method which combines the Bayesian (Bayes) approach and the Support Vector Machines (SVMs). Folder allocation for text documents in computer is typically executed manually by the user. Every time the user creates text documents by using text editors or downloads the documents from the internet, and wishes to store these documents on the computer, the user needs to determine and allocate the appropriate folder in which to store these new documents. This situation is inconvenient as repeating the folder allocation each time a text document is stored becomes tedious especially when the numbers and layers of folders are huge and the structure is complex and continuously growing. This problem can be overcome by implementing Artificial Intelligence machine learning methods to classify the new text documents and allocate the most appropriate folder as the storage for them. In this paper we propose the Bayes-SVMs hybrid classification framework to perform the tedious task of automatically allocating the right folder for text documents in computers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Al-Mubaid H, Umair SA (2006) A new text categorization technique using distributional clustering and learning logic. IEEE Trans Knowl Data Eng 18(9):1156–1165

    Article  Google Scholar 

  2. Androutsopoulos I, Koutsias J, Chandrinos KV, Spyropoulos CD (2000) An experimental comparison of Naïve Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, Athens, Greece, pp 160–167

    Chapter  Google Scholar 

  3. Apte C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst (TOIS) 12(3):233–251

    Article  Google Scholar 

  4. Apte C, Damerau F, Weiss SM (1994) Towards language independent automated learning of text categorization models. In: Proceedings of the 17th annual international ACM-SIGIR conference on research and development in information retrieval, pp 23–30

    Google Scholar 

  5. Brücher H, Knolmayer G, Mittermayer MA (2002) Document classification methods for organizing explicit knowledge. Technical Report, Research Group Information Engineering, Institute of Information Systems, University of Bern, Bern, Switzerland

  6. Chakrabarti S, Roy S, Soundalgekar MV (2003) Fast and accurate text classification via multiple linear discriminant projection. Int J Very Large Data Bases (VLDB) 170–185

  7. Chen CM, Lee HM, Hwang CW (2005) A hierarchical neural network document classifier with linguistic feature selection. Appl Intell 23(3):5423–5435

    Article  Google Scholar 

  8. Chen JN, Huang HK, Tian SF, Qu YL (2009) Feature selection for text classification with Naïve Bayes. Expert Syst Appl (ESWA) 36:5423–5435

    Google Scholar 

  9. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130

    Article  MATH  Google Scholar 

  10. Eyheramendy S, Genkin A, Ju WH, Lewis D, Madigan D (2003) Sparse Bayesian classifiers for text categorization. Technical Report, Department of Statistics, Rutgers University. http://www.stat.rutgers.edu/~madigan/PAPERS/jicrd-v13.pdf

  11. Greiner R, Schaffer J (2001) AIxploratorium—decision trees. Department of Computing Science, University of Alberta, Edmonton, Canada. http://www.cs.ualberta.ca/~aixplore/learning/DecisionTrees

  12. Han EH, Karypis G, Kumar V (1999) Text categorization using weighted adjusted k-nearest neighbor classification. Technical Report, Department of Computer Science and Engineering, Army HPC Research Centre, University of Minnesota, Minneapolis, USA

  13. He J, Tan AH, Tan CL (2003) On machine learning methods for Chinese document categorization. Appl Intell 18(3):311–322

    Article  MATH  Google Scholar 

  14. Isa D, Kallimani VP, Lee LH (2009) Using self-organizing map for clustering of text document. Expert Syst Appl 36(5):9584–9591

    Article  Google Scholar 

  15. Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document pre-processing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng (TKDE) 20(9):1264–1272

    Article  Google Scholar 

  16. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning (ECML-98), pp 137–142

    Google Scholar 

  17. Joachims T (1999) Making large-scale SVM learning practical. In: Advances in kernel methods—support vector learning, pp 169–184

    Google Scholar 

  18. Kim SB, Rim HC, Yook DS, Lim HS (2002) Effective methods for improving Naïve Bayes text classification. In: Proceedings of the 7th Pacific Rim international conference on artificial intelligence. Springer, Heidelberg, pp 414–423

    Google Scholar 

  19. Lee LH, Isa D, Choo WO, Chue WY (2010) Tournament structure ranking techniques for Bayesian text classification with highly similar categories. J Appl Sci—Asian Netw Sci Inf (ANSINET) 10(13):1243–1254

    Google Scholar 

  20. Lee LH, Isa D (2010) Automatically computed document dependent weighting factor facility for Naïve Bayes classification. Expert Syst Appl 37(12):8471–8478

    Article  Google Scholar 

  21. Lee CH, Yang HC (2003) A multilingual text mining approach based on self-organizing maps. Appl Intell 18(3):295–310

    Article  MathSciNet  MATH  Google Scholar 

  22. McCallum A, Nigam K (1998) A comparison of event models for Naïve Bayes text classification. In: AAAI-98 workshop on learning for text categorization, pp 41–48

  23. Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: Proceedings of the IJCAI-99 workshop on machine learning for information filtering, pp 61–67

    Google Scholar 

  24. O’Brien C, Vogel C (2003) Spam filters: Bayes vs chi-squared. Letters vs words. In: Proceedings of the 1st international symposium on information and communication technologies, pp 298–303

    Google Scholar 

  25. Ramos J (2001) Using TF-IDF to determine word relevance in document queries. Technical Report, Department of Computer Science, Rutgers University, Piscataway, NJ. http://www.cs.rutgers.edu/~mlittman/courses/ml03/iCML03/papers/ramos.pdf

  26. Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In: Learning for text categorization, AAAI-98 workshop, Madison, Wisconsin, pp 55–62

    Google Scholar 

  27. Soucy P, Mimeau GW (2005) Beyond TF-IDF weighting for text categorization in the vector space model. In: Proceedings of the 19th international joint conference on artificial intelligence, pp 1130–1135

    Google Scholar 

  28. Takamura H (2003) Clustering approaches to text categorization. Ph.D. Dissertation, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, Japan

  29. Yang YM, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’99), pp 42–49

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lam Hong Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, L.H., Rajkumar, R. & Isa, D. Automatic folder allocation system using Bayesian-support vector machines hybrid classification approach. Appl Intell 36, 295–307 (2012). https://doi.org/10.1007/s10489-010-0261-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-010-0261-0

Keywords

Navigation