Correcting Misspelled Words in Twitter Text

Kim, Jeongin; Lee, Eunji; Hong, Taekeun; Kim, Pankoo

doi:10.1007/978-3-319-58967-1_10

Jeongin Kim¹⁷,
Eunji Lee¹⁷,
Taekeun Hong¹⁸ &
…
Pankoo Kim¹⁷

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 194))

Included in the following conference series:

International Conference on Big Data Technologies and Applications

675 Accesses
1 Citations

Abstract

The SNS became popularized by computer, mobile devices, and tablets that are accessible to the Internet. Among SNS, Twitter posts the words of short texts and, it shares information. Twitter texts are the optimal data to extract new information, but as it may contain the information within the limited number of words, there are various limitations. To improve accuracy of extracting information within Twitter texts, the process of calibrating misspelled words shall be taken in advance. In conventional studies to correct the misspelled words of Twitter texts, the relationship between misspelled words and correcting words was resolved by concerning the dependency of co-occurrence words with misspelled words within sentences and morphophonemic similarity, but since the frequency of co-occurrence words of misspelled words is not concerned, it has not resolved to correct misspelled words completely. In this paper, to correct misspelled words in Twitter texts, the use of the character n-gram method concerning spelling information and the word n-gram method concerning frequency of co-occurrence words are to be proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 60.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wilson, C., Boe, B., Sala, A., Puttaswamy, K.P.N., Zhao, B.Y.: User intereactions in social networks and their implications. In: Proceedings of the 4th ACM European Conference on Compter Systems, pp. 205–218 (2009)
Google Scholar
Kim, J., Ko, B., Jeong, H., Kim, P.: A method for extracting topics in news twitter. Int. J. Softw. Eng. Appl. 7(2), 1–6 (2013)
Article Google Scholar
Vespignani, A.: Modelling dynamical processes in complex socio-technical systems. Nat. Phys. 8, 32–39 (2012)
Article Google Scholar
Beaufort, R., Roekhaut, S., Cougnon, L.A., Fairon, C.: A hybrid rule/model-based finite-state framework for normalizing SMS messages. In: Proceedings of the 48th Annual Meeting of the ACL (ACL 2010), pp. 770–779 (2010)
Google Scholar
Choudhury, M., Saraf, R., Jain, V., Mukherjee, A., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. Int. J. Doc. Anal. Recogn. 10(3), 157–174 (2007)
Article Google Scholar
Hassan, H., Menezes, A.: Social text normalization using contextual graph random walks. In: The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 1577–1586 (2013)
Google Scholar
Kobus, C., Yvon, F., Damnati, G.: Normalizing SMS: are two metaphors better than one? In: The 22nd International Conference on Computational Linguistics (COLING 2008), pp. 441–448 (2008)
Google Scholar
Chen, Y.: Improving text normalization using character-blocks based models and system combination. In: The 24th International Conference on Computational Linguistics (COLING 2012), pp. 1587–1602 (2012)
Google Scholar
Jung, J.J.: Online named entity recognition method for microtexts in social networking services: a case study of twitter. Expert Syst. Appl. 39(9), 8066–8070 (2012)
Article Google Scholar
Longest common subsequence problem. http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

Download references

Acknowledgments

This research was supported by the Human Resource Training Program for Regional Innovation and Creativity through the Ministry of Education and National Research Foundation of Korea (NRF-2014H1C1A1073115) and This research was supported by SW Master’s course of hiring contract Program grant funded by the Ministry of Science, ICT and Future Planning (H0116-16-1013).

Author information

Authors and Affiliations

Department of Computer Engineering, Chosun University, Gwangju, Republic of Korea
Jeongin Kim, Eunji Lee & Pankoo Kim
Department of Software Convergence Engineering, Chosun University, Gwangju, Republic of Korea
Taekeun Hong

Authors

Jeongin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Eunji Lee
View author publications
You can also search for this author in PubMed Google Scholar
Taekeun Hong
View author publications
You can also search for this author in PubMed Google Scholar
Pankoo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pankoo Kim .

Editor information

Editors and Affiliations

Department of Computer Engineering, Chung-Ang University, Seoul, Korea (Republic of)
Jason J. Jung
Chosun Unversity , Gwangju, Korea (Republic of)
Pankoo Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, J., Lee, E., Hong, T., Kim, P. (2017). Correcting Misspelled Words in Twitter Text. In: Jung, J., Kim, P. (eds) Big Data Technologies and Applications. BDTA 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 194. Springer, Cham. https://doi.org/10.1007/978-3-319-58967-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-58967-1_10
Published: 07 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58966-4
Online ISBN: 978-3-319-58967-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics