skip to main content
10.1145/3297280.3297478acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
poster

Word2Vec based spelling correction method of Twitter message

Published: 08 April 2019 Publication History

Abstract

Twitter1 became popular owing to the devices like smartphones and tablets, with which short messages can be easily composed. Due to the popularity of Twitter, the volume of Twitter messages has increased rapidly. Accordingly, studies have been carried out to extract various data by analyzing Twitter messages. However, there is a limitation in mining accurate data from Twitter messages that are composed in short sentences because the misspelling problem is persisting. Although studies using Word2Vect are continuously conducted for spelling correction, it can be said that they are replacing the extracted words by using the Word2Vec rather than correcting the words. Furthermore, since characters of misspelled word are not taken into consideration, it does not fit the meaning of correction. This paper proposes a method of correcting misspelled words in Twitter messages by using an improved Word2Vec. Misspelled words in a Twitter message are selected through pre-processing process. For a selected misspelled word, candidate correction words are extracted through the improved Word2Vec. Among the extracted candidate words, the word that has the highest similarity value for the misspelled word replaces the misspelled word, thereby correcting the spelling error.

References

[1]
Robert A. Wagner and Michael J. Fischer. 1974. The String-to-String Correction Problem. Journal of the ACM 21, 1 (Jan. 1974), 168--173.
[2]
J. Kim, E. Lee, T. Hong, and P. Kim. 2016. Correcting Misspelled Words in Twitter Text. 7th International Conference, BDTA 2016, 1 (Nov. 2016), 83--90.
[3]
B. Han, P. Cook, and T. Baldwin. 2013. Lexical normalization for social media text. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 1 (Jan. 2013), 1--27.
[4]
J. Lee, M. Kim, and H. Kwon. 2018. Context-Sensitive Spelling Error Correction Techniques using Word Embedding. Korea Computer Congress 2018, 607--609.
[5]
G Chrupal. 2014. Normalizing tweets with edit scripts and recurrent neural embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), 680--686.
[6]
Danish Contractor, Tanveer A. Faruquie, and L. Venkata Subramaniam. 2010. Unsupervised cleansing of noisy text. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), 189--196.
[7]
AiTi Aw, Min Zhang, Juan Xiao, and Jian Su. 2006. A phrase-based statistical model for SMS text normalization. In Proceedings of COLING/ACL 2006, 33--40.
[8]
W. ling, C. Dyer, AW Black, and I. Trancoso. 2013. Paraphrasing 4 microblog normalization. The 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 73--84.
[9]
D. Pennell and Y. Liu. 2011. A character-level machine translation approach for normalization of SMS abbreviations. The 5th International Joint Conference on Natural Language Processing (IJCNLP 2011), 974--982.
[10]
R. Sproat, AW. Black, S. Chen, S. Kumar, M Ostendorf, and C. Richards. 2001. Normalization of non-standard words. Computer Speech and Language 15. 287--333.
[11]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781

Cited By

View all

Index Terms

  1. Word2Vec based spelling correction method of Twitter message

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
    April 2019
    2682 pages
    ISBN:9781450359337
    DOI:10.1145/3297280
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 April 2019

    Check for updates

    Author Tags

    1. Twitter
    2. Worde2Vec
    3. misspelling
    4. spelling correction

    Qualifiers

    • Poster

    Conference

    SAC '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Shielding against online harmEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108241133:PDOnline publication date: 24-Jul-2024
    • (2021)Latent Twitter Image Information for Social AnalyticsInformation10.3390/info1202004912:2(49)Online publication date: 21-Jan-2021
    • (2021)Platform-Oblivious Anti-Spam GatewayProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3488024(1064-1077)Online publication date: 6-Dec-2021
    • (2021)Neural spelling correction: translating incorrect sentences to correct sentences for multimediaMultimedia Tools and Applications10.1007/s11042-020-09148-280:26-27(34591-34608)Online publication date: 1-Nov-2021
    • (2020)A Polarity Capturing Sphere for Word to Vector RepresentationApplied Sciences10.3390/app1012438610:12(4386)Online publication date: 26-Jun-2020
    • (2019)Judicial Knowledge Reasoning Based on Representation Learning2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C)10.1109/QRS-C.2019.00029(84-88)Online publication date: Jul-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media