Using Non-extensive Entropy for Text Classification

Fu, Lin; Hou, Yuexian

doi:10.1007/978-3-642-04070-2_96

Using Non-extensive Entropy for Text Classification

Lin Fu²¹ &
Yuexian Hou²¹

Conference paper

1047 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5754))

Abstract

This paper proposes the use of non-extensive entropy for text classification. Non-extensive entropy technique is used for text classification by estimating the conditional distribution of the class variable given the document. The underlying principle of non-extensive entropy is that without external knowledge, one should prefer distributions that are uniform. This paper proposes two models for text classification based on maximum entropy principle. The first model extends Shannon entropy into non-extensive entropy to simplify the form of classifier, the other one introduces high-level constraints into non-extensive model to impose constraints on the pairs of entities. Model with high-level constraints constructs relations between word pairs which builds semantic constraints, for the sake of advancing accuracy of text classification. Experiments on the 20_newsgroup set demonstrate the advantage of non-extensive model and non-extensive model with high-level constraints.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Nigam, K., Lafferty, J., McCallum, A.: Using Maximum Entropy for Text Classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, Stockholm, Sweden, pp. 61–67 (1999)
Google Scholar
Berger, A.: The Improved Iterative Scaling Algorithm: A Gentle Introduction (unpublished manuscript) (1997)
Google Scholar
Pietra, S., Pietra, V., Lafferty, J.: Inducing Features of Random Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 380–393 (1997)
Article Google Scholar
Kazama, J., Tsujii, J.: Evaluation and Extension of Maximum Entropy Models with Inequality Constraints. In: Proceedings of EMNLP 2003, Sapporo, Japan, pp. 137–144 (2003)
Google Scholar
Ratnaparkhi, A.: A Maximum Entropy Model for Part-of-Speech Tagging. In: Proceedings of EMNLP 1996, Philadelphia, pp. 133–142 (1996)
Google Scholar
Chen, S.F., Goodman, J.: An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report TR-10-98. Harvard University (1998)
Google Scholar
Tsallisa, C., Baldovina, F., Cerbinob, R., Pierobon, P.: Introduction to Non-extensive Statistical Mechanics and Thermodynamics. Physica A: Statistical Mechanics and its Applications 305(1), 129–136 (2004)
Google Scholar
Chen, S.F., Rosenfeld, R.: A Survey of Smoothing Techniques for ME Models. IEEE Transactions on Speech and Audio Processing 8(1), 37–50 (2000)
Article Google Scholar
Tsallis, C.: Possible Generalization of Boltzmann-Gibbs Statistics. Journal of Statistical Physics 52(1-2), 479–487 (1998)
Article MathSciNet Google Scholar
Abe, S., Rajagopal, A.K.: Nonadditive Condition Entropy and its Significance for Local Realism. Physical A: Statistical Mechanics and its Applications 289(1-2), 157–164 (2001)
Article MATH MathSciNet Google Scholar
Tsallis, C.: Entropic Nonextensivity: A Possible Measure of Complexity. Chaos, Solitons & Fractals 13(3), 371–391 (2002)
Article MATH MathSciNet Google Scholar
Sven, M., Hermann, N., Jrg, Z.: Smoothing Methods in Maximum Entropy Language Modeling. In: Acoustics, Speech and Signal Processing, IEEE International Conference, Phoenix, Arizona (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Tianjin University, China
Lin Fu & Yuexian Hou

Authors

Lin Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yuexian Hou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Intelligent Machines, Chinese Academy of Sciences, China
De-Shuang Huang
Graduate School of Electrical Engineering, University of Ulsan, Korea, San 29, Mugeo-Dong, Nam-Ku, 680 - 749, Ulsan, Korea
Kang-Hyun Jo
School of Electrical Engineering, University of Ulsan, Ulsan, South Korea
Hong-Hee Lee
School of Electrical Engineering, University of Ulsan, South Korea
Hee-Jun Kang
e.B.I.S. s.r.l. (electronic Business in Security), Spin-Off of Polytechnic of Bari, Str. Prov. per Casamassima Km., 3-70010, Valenzano, (BA), Italy
Vitoantonio Bevilacqua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, L., Hou, Y. (2009). Using Non-extensive Entropy for Text Classification. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. ICIC 2009. Lecture Notes in Computer Science, vol 5754. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04070-2_96

Download citation

DOI: https://doi.org/10.1007/978-3-642-04070-2_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04069-6
Online ISBN: 978-3-642-04070-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics