An Estimate Method of the Minimum Entropy of Natural Languages

Ren, Fuji; Mitsuyoshi, Shunji; Yen, Kang; Zong, Chengqing; Zhu, Hongbing

doi:10.1007/3-540-36456-0_39

Fuji Ren⁵,
Shunji Mitsuyoshi⁶,
Kang Yen⁷,
Chengqing Zong⁸ &
…
Hongbing Zhu⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2588))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

916 Accesses

Abstract

The study of minimum entropy of English has a long history and has made a great progress, but only a few studies on other languages have been reported in literature so far. In this paper, we present a new method to estimate the minimum entropy of character in natural languages, based on two hypotheses of conservation of information quantity. We also verified the hypotheses empirically through experiments with two natural languages, Japanese and Chinese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Maximum Entropy Models for Natural Language Processing

Construction and Application of the English Corpus Based on the Statistical Language Model

Comparative Study of Methods Measuring Lexicographic Similarity Among Tamazight Language Variants

References

K. Kita, T. Nakamura, M. Nagata, “Voice Language Processing,” Morikita Publishing, Inc., 1996
Google Scholar
N. Mori, O. Yamaji, “Estimating the Upper Limit of Information Quantity in Japanese Language,” Information Processing Society Journal, Vol. 38, No. 11, pp. 2192–2199, 1997
Google Scholar
K. Asai, “About Entropy in Japanese,” Measuring Japanese Language, pp. 4–7, 1965
Google Scholar
C.E. Shannon, “Prediction and Entropy of Printed English,” Bell System Technical Journal, Vol. 30, pp. 50–64, 1951.
Google Scholar
P.F. Brown, S.A.D. Pietra, R.L. Mercer, “An Estimate of an Upper Bound for the Entropy of English,” Computational Linguistics, Vol.18, No. 1, pp. 31–20, 1992.
Google Scholar
P.F. Brown, V.J.D. Pietra, P.V. deSouza, J.C. Lai, R.L. Mercer, “Class-Based ngram Models of Natural Language,” Computational Linguistics, Vol. 18, No. 4, pp. 467–479, 1992.
Google Scholar
T.M. Cover, R.C. King, “A Convergent Gambling Estimate of the Entropy of English,” IEEE Trans. on Information Theory, Vol.-IT-24, No.-4, pp. 413–421, 1978.
Article MathSciNet Google Scholar
F. Ren, J. Nie, “The Concept of Sensitive Word in Chinese-Survey in a Machine-Readable Dictionary,” Journal of Natural Language Processing, Vol. 6, No. 1, pp. 59–78, 1999.
Google Scholar
L. Fan, F. Ren, Y. Miyanaga, K. Tochinai, “Automatic Composition of Chinese Compound Words for Chinese-Japanese Machine Translation,” Transactions of Information Processing Society of Japan, Vol. 33, No. 9, pp. 1103–1113, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, The University of Tokushima, 770-8506, Tokushima, Japan
Fuji Ren
Department of Research, A.G.I. Inc., 106-0043, Tokyo, Japan
Shunji Mitsuyoshi
Dept. of Electrical and Computer Eng., Florida Int’l University, 33174, Miami Florida
Kang Yen
Institute of Automation, Chinese Academy of Science, 100080, Beijing, China
Chengqing Zong
Faculty of Engineering, Hiroshima Kokusai Gakuin University, 739-0321, Hiroshima, Japan
Hongbing Zhu

Authors

Fuji Ren
View author publications
You can also search for this author in PubMed Google Scholar
Shunji Mitsuyoshi
View author publications
You can also search for this author in PubMed Google Scholar
Kang Yen
View author publications
You can also search for this author in PubMed Google Scholar
Chengqing Zong
View author publications
You can also search for this author in PubMed Google Scholar
Hongbing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Col. Zacatenco, CP 07738, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, F., Mitsuyoshi, S., Yen, K., Zong, C., Zhu, H. (2003). An Estimate Method of the Minimum Entropy of Natural Languages. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2003. Lecture Notes in Computer Science, vol 2588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36456-0_39

Download citation

DOI: https://doi.org/10.1007/3-540-36456-0_39
Published: 30 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00532-2
Online ISBN: 978-3-540-36456-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

An Estimate Method of the Minimum Entropy of Natural Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Maximum Entropy Models for Natural Language Processing

Construction and Application of the English Corpus Based on the Statistical Language Model

Comparative Study of Methods Measuring Lexicographic Similarity Among Tamazight Language Variants

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Estimate Method of the Minimum Entropy of Natural Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Maximum Entropy Models for Natural Language Processing

Construction and Application of the English Corpus Based on the Statistical Language Model

Comparative Study of Methods Measuring Lexicographic Similarity Among Tamazight Language Variants

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation