Automatic Language Identification Using Multivariate Analysis

Vinosh Babu, J.; Baskaran, S.

doi:10.1007/978-3-540-30586-6_89

J. Vinosh Babu¹⁷ &
S. Baskaran¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2307 Accesses

Abstract

Identifying the language of an e-text is complicated by the existence of a number of character sets for a single language. We present a language identification system that uses the Multivariate Analysis (MVA) for dimensionality reduction and classification. We compare its performance with existing schemes viz., the N-grams and compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic Language Identification for Celtic Texts

Automatic language identification: a case study of Pahari languages

Article 12 May 2023

Sparse Principal Component Analysis for Natural Language Processing

Article Open access 18 May 2020

References

Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Physical Review Letters 88(4) (January 2002)
Google Scholar
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Article MATH MathSciNet Google Scholar
Manly, B.F.J.: Multivariate Statistical Methods. A Primer. Chapman & Hall, Boca Raton
Google Scholar
Singular value decomposition and principal component analysis. In: Berrar, D.P., Dubitzky, W., Granzow, M. (eds.) A Practical Approach to Microarray Data Analysis, pp. 91–109. Kluwer, Norwell (2003); LANL LA-UR-02-4001
Google Scholar
Dunning, T.: Statistical identification of language. Computing Research Laboratory Technical Memo MCCS 94-273, New Mexico State University, Las Cruces, NM (1994)
Google Scholar
Canvar, W., Trenkle, J.: N-gram based text categorization. In: Proceedings of 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 161–176 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

AU-KBC Research Centre,
J. Vinosh Babu & S. Baskaran

Authors

J. Vinosh Babu
View author publications
You can also search for this author in PubMed Google Scholar
S. Baskaran
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vinosh Babu, J., Baskaran, S. (2005). Automatic Language Identification Using Multivariate Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_89

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Language Identification Using Multivariate Analysis

Abstract

Access this chapter

Preview

Similar content being viewed by others

Automatic Language Identification for Celtic Texts

Automatic language identification: a case study of Pahari languages

Sparse Principal Component Analysis for Natural Language Processing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Language Identification Using Multivariate Analysis

Abstract

Access this chapter

Preview

Similar content being viewed by others

Automatic Language Identification for Celtic Texts

Automatic language identification: a case study of Pahari languages

Sparse Principal Component Analysis for Natural Language Processing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation