Twitter Normalization via 1-to-N Recovering

Ren, Yafeng; Deng, Jiayuan; Ji, Donghong

doi:10.1007/978-3-319-48740-3_2

Yafeng Ren¹⁹,
Jiayuan Deng¹⁹ &
Donghong Ji¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10041))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1323 Accesses

Abstract

Twitter messages are written in an informal style, which hinders many information retrieval and natural language processing applications. Existing normalization systems have two major drawbacks. The first is that these methods largely require large-scale annotated training data. The second is that these systems assume that a nonstandard token is recovered to one standard word. However, there are many nonstandard tokens that should be recovered to two or more standard words, so the problem remains to be highly challenging. To address the above issues, we propose an unsupervised normalization system based on the context similarity. The proposed system does not require any annotated data. Meanwhile, a nonstandard token will be recovered to one or more standard words. Results show that the proposed approach achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Modular Approach for Social Media Text Normalization

A customizable pipeline for social media text normalization

Article 09 September 2017

TweetNorm: a benchmark for lexical normalization of Spanish tweets

Article 15 August 2015

Notes

References

Almeida, T.A., Silva, T.P., Santos, I., Hidalgo, J.M.G.: Text normalization and semantic indexing to enhance instant messaging and sms spam filtering. Knowl. Based Syst. 108, 25–32 (2016)
Article Google Scholar
Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for sms text normalization. In: Proceedings of the Joint Conference on Annual Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics, pp. 33–40 (2006)
Google Scholar
Benson, E., Haghighi, A., Barzilay, R.: Event discovery in social media feeds. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 389–398 (2011)
Google Scholar
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293 (2000)
Google Scholar
Choudhury, M., Saraf, R., Jain, V., Mukherjee, A., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. Int. J. Doc. Anal. Recogn. 10(3–4), 157–174 (2007)
Article Google Scholar
Contractor, D., Faruquie, T.A., Subramaniam, L.V.: Unsupervised cleansing of noisy text. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 189–196 (2010)
Google Scholar
Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78 (2009)
Google Scholar
Cotelo, J.M., Cruz, F.L., Troyano, J., Ortega, F.J.: A modular approach for lexical normalization applied to spanish tweets. Expert Syst. Appl. 42(10), 4743–4754 (2015)
Article Google Scholar
Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 600–609 (2011)
Google Scholar
Han, B., Baldwin, T.: Lexical normalisation of short text messages: makn sens a# twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 368–378 (2011)
Google Scholar
Han, B., Cook, P., Baldwin, T.: Automatically constructing a normalisation dictionary for microblogs. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 421–432 (2012)
Google Scholar
Hassan, H., Menezes, A.: Social text normalization using contextual graph random walks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1577–1586 (2013)
Google Scholar
Hughes, T., Ramage, D.: Lexical semantic relatedness with random graph walks. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 581–589 (2007)
Google Scholar
Idzelis, M.: Jazzy: the java open source spell checker (2005)
Google Scholar
Li, C., Liu, Y.: Improving text normalization via unsupervised model and discriminative reranking. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 86–93 (2014)
Google Scholar
Li, C., Liu, Y.: Joint pos tagging and text normalization for informal text. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 1263–1269 (2015)
Google Scholar
Liu, F., Liu, Y., Weng, F.: Why is sxsw trending? exploring multiple text sources for twitter topic summarization. In: Proceedings of the Workshop on Languages in Social Media, pp. 66–75 (2011)
Google Scholar
Liu, F., Weng, F., Wang, B., Liu, Y.: Insertion, deletion, or substitution? normalizing text messages without pre-categorization nor supervision. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 71–76 (2011)
Google Scholar
Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 359–367 (2011)
Google Scholar
Melamed, I.D.: Bitext maps and alignment via pattern recognition. Comput. Linguist. 25(1), 107–130 (1999)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Minkov, E., Cohen, W.W.: Graph based similarity measures for synonym extraction from parsed text. In: Proceedings of the Workshop on Graph-based Methods for Natural Language Processing, pp. 20–24 (2012)
Google Scholar
Norris, J.R.: Markov Chains. Cambridge University Press, New York (1998)
MATH Google Scholar
Pennell, D., Liu, Y.: A character-level machine translation approach for normalization of sms abbreviations. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 974–982 (2011)
Google Scholar
Ren, Y., Ji, D., Yin, L., Zhang, H.: Finding deceptive opinion spam by correcting the mislabeled instances. Chin. J. Electron. 24(1), 52–57 (2015)
Article Google Scholar
Ren, Y., Ji, D., Zhang, H.: Positive unlabeled learning for deceptive reviews detection. In: Proceedings of the 2014 Joint Conference on Empirical Methods in Natural Language Processing, pp. 488–498 (2014)
Google Scholar
Ren, Y., Zhang, Y., Zhang, M., Ji, D.: Context-sensitive twitter sentiment classification using neural network. In: Proceedings of the 30th AAAI Conference on Artifical Intelligence, pp. 215–221 (2016)
Google Scholar
Ren, Y., Zhang, Y., Zhang, M., Ji, D.: Improving twitter sentiment classification using topic-enriched multi-prototype word embeddings. In: Proceedings of the 30th Conference on Artificial Intelligence, pp. 3038–3044 (2016)
Google Scholar
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534 (2011)
Google Scholar
Schulz, S., De Pauw, G., De Clercq, O., Desmet, B., Hoste, V., Daelemans, W., Macken, L.: Multi-modular text normalization of dutch user-generated content. ACM Trans. Intell. Syst. Technol. 7(4), 1–22 (2016)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar
Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 144–151 (2002)
Google Scholar
Wang, P., Ng, H.T.: A beam-search decoder for normalization of social media text with application to machine translation. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 471–481 (2013)
Google Scholar
Wang, Z., Wu, Z., Wang, R., Ren, Y.: Twitter sarcasm detection exploiting a context-based model. In: Proceedings of the International Conference on Web Information Systems Engineering, pp. 77–91 (2015)
Google Scholar

Download references

Acknowledgments

This work is supported by the State Key Program of National Natural Science Foundation of China (Grant No. 61133012), the National Natural Science Foundation of China (Grant Nos. 61173062, 61373108) and the National Philosophy Social Science Major Bidding Project of China (Grant No. 11&ZD189).

Author information

Authors and Affiliations

Computer School, Wuhan University, Wuhan, China
Yafeng Ren, Jiayuan Deng & Donghong Ji

Authors

Yafeng Ren
View author publications
You can also search for this author in PubMed Google Scholar
Jiayuan Deng
View author publications
You can also search for this author in PubMed Google Scholar
Donghong Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yafeng Ren .

Editor information

Editors and Affiliations

Poznań University of Economics, Poznan, Poland
Wojciech Cellary
University of Minnesota, Minneapolis, Minnesota, USA
Mohamed F. Mokbel
Tsinghua University, Beijing, China
Jianmin Wang
Victoria University, Melbourne, Victoria, Australia
Hua Wang
Victoria University, Melbourne, Victoria, Australia
Rui Zhou
Victoria University, Melbourne, Victoria, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, Y., Deng, J., Ji, D. (2016). Twitter Normalization via 1-to-N Recovering. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10041. Springer, Cham. https://doi.org/10.1007/978-3-319-48740-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-48740-3_2
Published: 02 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48739-7
Online ISBN: 978-3-319-48740-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics