Microtext Processing

Khoury, Richard; Khoury, Raphaël; Hamou-Lhadj, Abdelwahab

doi:10.1007/978-1-4614-6170-8_353

Richard Khoury³,
Raphaël Khoury⁴ &
Abdelwahab Hamou-Lhadj⁵

152 Accesses
3 Citations

Synonyms

Comment; Instant message; Microblog; Microtext; Post; SMS; Status update; Tweet

Glossary

NLP :: Natural Language Processing

Definition

The term “microtext” was proposed by US Navy researchers (Dela Rosa and Ellen 2009) to describe a type of written text document that has three characteristics: (a) it is very short, typically one or two sentences, and possibly as little as a single word; (b) it is written in an informal manner and unedited for quality and thus may use loose grammar, a conversational tone, vocabulary errors, and uncommon abbreviations and acronyms; and (c) it is semi-structured in the NLP sense, in that it includes some metadata such as a time stamp, an author, or the name of a field it was entered into. Microtexts have become omnipresent in today’s world: they are notably found in online chat discussions; online forum posts; user comments posted on online material such as videos, pictures, and news stories; Facebook newsfeeds and Twitter updates; Internet...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,500.00; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baldwin T, Chai JY (2011) Beyond normalization: pragmatics of word form in text messages. In: 5th international joint conference on natural language processing, Chiang Mai, 8–13 Nov 2011
Google Scholar
Barbosa L, Feng, J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd international conference on computational Linguistics, Beijing, pp 36–44
Google Scholar
Baron N, Ling R (2007) Text messaging and IM: linguistic comparison of American college data. J Lang Psychol Stud 26:291–298
Google Scholar
Chen L, Wang W, Sheth AP (2012) Are Twitter users equal in predicting elections? A study of user groups in predicting 2012 U.S. republican presidential primaries. In: SocInfo 2012, Lausanne. LNCS 7710, Springer, pp 379–392
Google Scholar
Cormack GV, Gómez Hidalgo JM, Puertas Sánz E (2007) Spam filtering for short messages. In: Proceedings of the 16th ACM conference on information and knowledge management (ACM CIKM’07), Lisbon, pp 313–320
Google Scholar
Cvijikj IP, Michahelles F (2011) Monitoring trends on Facebook. In: Ninth IEEE international conference on dependable, autonomic and secure computing, Zurich, 12–14 Dec 2011, pp 895–202
Google Scholar
Dela Rosa K, Ellen J (2009) Text classification methodologies applied to micro-text in military chat. In: Proceedings of the international conference on machine learning and applications, Miami Beach, pp 710–714
Google Scholar
Dong H, Hui SC, He Y (2006) Structural analysis of chat messages for topic detection. Online Inf Rev 30(5):496–516
Google Scholar
Ellen J (2011) All about microtext: a working definition and a survey of current microtext research within artificial intelligence and natural language processing. In: ICAART (1), Rome, pp 329–336
Google Scholar
Ferrara K, Brunner H, Whittemore G (1991) Interactive written discourse as an emergent register. Writ Commun 8:8–34
Google Scholar
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. Technical report, Stanford
Google Scholar
Healy M, Delany S, Zamolotskikh A (2005) An assessment of case-based reasoning for short text messages. In: Creaney N (ed) Proceedings of the 16th Irish conference on artificial intelligence and cognitive science, pp 257–266
Google Scholar
Kolenda T, Hansen LK, Larsen J (2001) Signal detection using ICA: application to chat room topic spotting. In: Proceedings of the third international conference on independent component analysis and blind source separation, San Diego, pp 540–545
Google Scholar
Liu F, Weng F, Wang B, Liu Y (2011) Insertion, deletion, or substitution?: normalizing text messages without pre-categorization nor supervision. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, vol 2, pp 71–76
Google Scholar
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the seventh conference on international language resources and evaluation, Valletta. European Language
Google Scholar
Paolillo JC (1999) The virtual speech community: social network and language variation on IRC. In: Proceedings of the 32nd annual Hawaii international conference on system sciences, Maui
Google Scholar
Petrovic S, Osborne M, Lavrenko V (2010) The Edinburgh Twitter corpus. In: Proceedings of the NAACL HLT workshop on computational linguistics in a world of social media, Los Angeles pp 25–26
Google Scholar
Ritterman J, Osborne M, Klein E (2009) Using prediction markets and Twitter to predict a swine flu pandemic. In: 1st international workshop on mining social media – 13th conference of the Spanish association for artificial intelligence
Google Scholar
Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In: 11th IEEE international conference on data mining, Tokyo, 11–14 Dec 2011, pp 1230–1235
Google Scholar
Wang AH (2010) Don’t follow me – spam detection in Twitter. In: Proceedings of the international conference on security and cryptography (SECRYPT 2010), Athens, pp 142–151
Google Scholar
Wu T, Khan FM, Fisher TA, Shuler LA, Pottenger WM (2002) Posting act tagging using transformation-based learning. In: The proceedings of the workshop on foundations of data mining and discovery, IEEE international conference on data mining (ICDM’02), Dec 2002
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Lakehead University, Thunder Bay, ON, Canada
Richard Khoury
Department of Computer Science and Software Engineering, Laval University, Quebec City, QC, Canada
Raphaël Khoury
Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Abdelwahab Hamou-Lhadj

Authors

Richard Khoury
View author publications
You can also search for this author in PubMed Google Scholar
Raphaël Khoury
View author publications
You can also search for this author in PubMed Google Scholar
Abdelwahab Hamou-Lhadj
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Calgary, Calgary, AB, Canada
Reda Alhajj
Department of Computer Science, University of Calgary, Calgary, AB, Canada
Jon Rokne

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Khoury, R., Khoury, R., Hamou-Lhadj, A. (2014). Microtext Processing. In: Alhajj, R., Rokne, J. (eds) Encyclopedia of Social Network Analysis and Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6170-8_353

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6170-8_353
Published: 05 October 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6169-2
Online ISBN: 978-1-4614-6170-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics