Skip to main content

A Term Weighting Scheme Approach for Vietnamese Text Classification

  • Conference paper
  • First Online:
Future Data and Security Engineering (FDSE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9446))

Included in the following conference series:

  • 912 Accesses

Abstract

The term weighting scheme, which is used to convert the documents to vectors in the term space, is a vital step in automatic text categorization. The previous studies showed that term weighting schemes dominate the performance. There have been extensive studies on term weighting for English text classification. However, not many works have been studied on Vietnamese text classification.. In this paper, we proposed a term weighting scheme called normalize(tf.rf max ), which is based on tf.rf term weighting scheme – one of the most effective term weighting schemes to date. We conducted experiments to compare our proposed normalize(tf.rf max ) term weighting scheme to tf.rf and tf.idf on Vietnamese text classification benchmark. The results showed that our proposed term weighting scheme can achieve about 3 %–5 % accuracy better than other term weighting schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Chang, C.C., Chih, J.L.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  2. Debole, F., Fabrizio, S.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and Its Applications, pp. 81–97. Springer, Berlin, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Deng, Z.-H., Tang, S.-W., Yang, D.-Q., Li, M.Z.L.-Y., Xie, K.-Q.: A comparative study on feature weight in text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 588–597. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Hoang, V.C.D., et al.: A comparative study on Vietnamese text classification methods. In: 2007 IEEE International Conference on Research, Innovation and Vision for the Future. IEEE (2007)

    Google Scholar 

  5. Hsu, C.W., Chih, J.L.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  6. Phuong, L.H., Huyên, N.T.M., Roussanaly, A., Vinh, H.T.: A hybrid approach to word segmentation of vietnamese texts. In: Martín-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 240–249. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Berlin, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Lei, H., Govindaraju, V.: Half-against-half multi-class support vector machines. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 156–164. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Leopold, E., Jörg, K.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46(1–3), 423–444 (2002)

    Article  MATH  Google Scholar 

  10. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  11. Yang, Y., Jan, O.P.: A comparative study on feature selection in text categorization. In: ICML, vol. 97 (1997)

    Google Scholar 

Download references

Acknowledgment

This research is funded by Vietnam National University, Ho Chi Minh City (VNU-HCM) under grant number C2014-26-04.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vu Thanh Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, V.T., Hai, N.T., Nghia, N.H., Le, T.D. (2015). A Term Weighting Scheme Approach for Vietnamese Text Classification. In: Dang, T., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds) Future Data and Security Engineering. FDSE 2015. Lecture Notes in Computer Science(), vol 9446. Springer, Cham. https://doi.org/10.1007/978-3-319-26135-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26135-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26134-8

  • Online ISBN: 978-3-319-26135-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics