Abstract
The development of Internet promotes the usage of Internet chat lingo. This type of language is diversified and irregular for natural language processing. In this paper, according to the characteristics of the Chinese and Internet chat lingo, we proposed a method for lingo detection and correction based on the statistic and pinyin similarity. This method applied Bigram model to detect the boundary of lingos, and then corrected them by using pinyin similarity. According to the experimental results and analysis, our method can effectively detect and correct chat lingos.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Xie, Y.: Research for Internet Language. Medical Information 20(5) (2007) (Chinese)
Wu, Z.H.: The Study of Phonetic Metaphors in Cyber Language. Journal of Henan Institute of Engineering (Social Science Edition) 25(1) (2010) (Chinese)
Chen, Z.P., Lv, Y.Q., Liu, H.S., et al.: Chinese Spelling Correction in Search Engines Based on N-gram Model. Journal of CAEIT 4(3) (2009) (Chinese)
Ma, J.S., Zhang, Y., Liu, T., et al.: Detecting Chinese Text Errors Based on Trigram and Dependency Parsing. Journal of the China Society for Scientific and Technical Information 23(6) (2004) (Chinese)
Chen, T.Y., Chen, R., Pan, L.L., et al.: Archaic Chinese Punctuating Sentences Based on Context N-gram Model. Computer Engineering 33(3) (2007) (Chinese)
Zhang, Y.S., Cao, D.Y., Yu, S.W.: A Hybrid Model of Combining Rule-based and Statistics-based Approaches for Automatic Detecting Errors in Chinese Text. Journal of Chinese Information Processing 20(4) (2006) (Chinese)
Zhang, Y.S., Ding, B.Q.: Automatic Errors Detecting of Chinese Texts Based on the Bi-neighborship. Journal of Chinese Information Processing 15(3) (2000) (Chinese)
Feng, J.H., Gulila, A., Mayra, H.K.: Organization Name Recognition based on N-gram Model. Computer Engineering and Application 46(31), 135–138 (2010)
Diane, M.N., Amanda, S.: TechWriter: An Evolving System for Writing Assistance for Advanced Learners of English. CALICO Journal 26(3), 611–625 (2009)
Liu, Y., Yu, S.W., Zhu, X.F.: Construction of the Contemporary Chinese Compound Words Database (Chinese)
Zhou, H.P.: Study on Application of Levenshtein Distance in Programming Test Automatic Scoring. Computer Applications and Software 28(5) (2011) (Chinese)
Mayire, Y., Mijiti, A., Askar, H.: A Minimum Edit Distance Based Uighur Spelling Check. Journal of Chinese Information Processing 22(3) (2008) (Chinese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, B., Li, Z. (2012). Detection and Correction Scheme of Internet Chat Lingo Based on Statistic and Pinyin Similarity. In: Liu, C., Wang, L., Yang, A. (eds) Information Computing and Applications. ICICA 2012. Communications in Computer and Information Science, vol 307. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34038-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-34038-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34037-6
Online ISBN: 978-3-642-34038-3
eBook Packages: Computer ScienceComputer Science (R0)