A Review of Shorthand Systems: From Brachygraphy to Microtext and Beyond

Satapathy, Ranjan; Cambria, Erik; Nanetti, Andrea; Hussain, Amir

doi:10.1007/s12559-020-09723-7

A Review of Shorthand Systems: From Brachygraphy to Microtext and Beyond

Review
Published: 22 June 2020

Volume 12, pages 778–792, (2020)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Ranjan Satapathy¹,
Erik Cambria¹,
Andrea Nanetti² &
…
Amir Hussain³

400 Accesses
23 Citations
1 Altmetric
Explore all metrics

Abstract

Human civilizations have performed the art of writing across continents and over different time periods. In order to speed up the writing process, the art of shorthand (brachygraphy) came into existence. Today, the performance of writing does not make an exception in social media platforms. Brachygraphy started to re-emerge in the early 2000s in the form of microtext in order to facilitate faster typing without compromising semantic clarity. This paper focuses on microtext approaches predominantly found in social media and explains the relevance of microtext normalization for natural language processing tasks in English. The review introduces brachygraphy and how it has evolved into microtext in today’s social media–dominant society. The study provides a comprehensive classification of microtext normalization based on different approaches. We propose to classify microtext based on different normalization techniques, i.e. syntax-based (syntactic), probability-based (probabilistic) and phonetic-based approaches and review application areas, strategies and challenges of microtext normalization. The review shows that there is a compelling similarity between brachygraphy and microtext even though they started centuries apart. This paper represents the first attempt to connect brachygraphy to current texting language and to show its impact in social media. This paper classifies microtext normalization according to different approaches and discusses how, in the future, microtext will likely comprise both words and images together. This will expand the horizon of human creative power. We conclude the review with some considerations on future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Transliterating Latin to Amharic scripts using user-defined rules and character mappings

Article 02 March 2023

A Trie Based Model for SMS Text Normalization

Notes

https://en.wiktionary.org/wiki/vegetal
https://en.wikipedia.org/wiki/Romance_languages
It includes English, before and after the advent of print
www.en.wikipedia.org/wiki/List_of_Latin_abbreviations (accessed on 15 July 2019)
http://americanhistory.si.edu/collections/search/object/nmah_849951 (accessed on 15 July 2019)
http://en.wikipedia.org
The Project Gutenberg website http://www.gutenberg.org/
http://giellatekno.uit.no
https://noisy-text.github.io/norm-shared-task.html
The US Conference of Catholic Bishops website: http://www.usccb.org
The Project Gutenberg website: http://www.gutenberg.org/
A Chinese version of Twitter at www.weibo.com
Available at www.comp.nus.edu.sg/~nlp/corpora.html
http://catalog.ldc.upenn.edu/docs/LDC93S1/PHONCODE.TXT
http://catalog.ldc.upenn.edu/docs/LDC93S1/PHONCODE.TXT
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
http://www.cstr.ed.ac.uk/projects/festival/manual/festival_13.html

References

Agarwal S, Godbole S, Punjani D, Roy S. How much noise is too much: a study in automatic text classification. Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007; 2007. p. 3–12.
Aha D W, Kibler D, Albert M K. Instance-based learning algorithms. Mach Learn 1991;6(1):37–66.
Google Scholar
Baldwin T, de Marneffe M-C, Han B, Kim Y-B, Ritter A, Xu W. Shared tasks of the 2015 workshop on noisy user-generated text: Twitter lexical normalization and named entity recognition. Proceedings of the Workshop on Noisy User-generated Text; 2015. p. 126–135.
Bartlett S, Kondrak G, Cherry C. Automatic syllabification with structured SVMs for letter-to-phoneme conversion. Proceedings of ACL-08: HLT; 2008. pp 568–576.
Bayer T, Kressel U, Mogg-Schneider H, Renz I. Categorizing paper documents: a generic system for domain and language independent text categorization. Comput Vis Image Underst 1998;70(3):299–306.
Google Scholar
Beaufort R, Roekhaut S, Cougnon L-AL, Fairon C. A hybrid rule/model-based finite-state framework for normalizing SMS messages. In: ACL. Association for Computational Linguistics; 2010. p. 770–779.
Black A, Taylor P, Caley R, Clark R. 1998. The festival speech synthesis system.
Bouma G. Finite state methods for hyphenation. Nat Lang Eng 2003;9(1):5–20.
Google Scholar
Brody S, Diakopoulos N. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics; 2011. p. 562–570.
Cambria E, Hussain A, Havasi C, Eckl C. Sentic Computing: Exploitation of Common Sense for the Development of Emotion-Sensitive Systems. LNCS 5967; 2010. p. 148–156.
Cambria E, Poria S, Gelbukh A, Thelwall M. Sentiment analysis is a big suitcase. IEEE Intell Syst 2017;32(6):74–80.
Google Scholar
Cappelli A, Pelzer A. 1967. Dizionario di abbreviature latine ed italiane. Ulrico Hoepli. http://www.hist.msu.ru/Departments/Medieval/Cappelli.
Chaturvedi I, Cambria E, Welsch R, Herrera F. Distinguishing between facts and opinions for sentiment analysis: survey and challenges. Inf Fus 2018;44:65–77.
Google Scholar
Choudhury M, Saraf R, Jain V, Mukherjee A, Sarkar S, Basu A. Investigation and modeling of the structure of texting language. Int J Doc Anal Recogn (IJDAR) 2007;10(3-4):157–174.
Google Scholar
Chrupała G. Normalizing tweets with edit scripts and recurrent neural embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2014. p. 680–686.
Clark E, Araki K. Text normalization in social media: progress, problems and applications for a pre-processing system of casual english. Procedia-Soc Behav Sci 2011;27:2–11.
Google Scholar
Current R N. The Original Typewriter Enterprise 1867-1873. Wis Mag Hist; 1949. p. 391–407.
Daelemans W, van den Bosch A. Generalization performance of backpropagation learning on a syllabification task. Proceedings of the 3rd Twente Workshop on Language Technology. Enschede: Universiteit Twente; 1992. p. 27–38.
Daelemans W, Zavrel J, Van Der Sloot K, Van den Bosch A. Timbl: Tilburg memory-based learner. Tilburg: Tilburg University; 2004.
Google Scholar
Desai N, Narvekar M. Normalization of noisy text data. Procedia Comput Sci 2015;45:127–132. International Conference on Advanced Computing Technologies and Applications (ICACTA).
Google Scholar
Doval Y, Vilares M, Vilares J. On the performance of phonetic algorithms in microtext normalization. Expert Syst Appl 2018;113:213–222.
Google Scholar
Ellen J. All about microtext-a working definition and a survey of current microtext research within artificial intelligence and natural language processing., ICAART (1) 2011; 2011. p. 329–336.
Fairon C, Paumier S. A translated corpus of 30,000 French SMS. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006); 2006. p. 351–354.
Fossati D, Di Eugenio B. I saw TREE trees in the park: how to correct real-word spelling mistakes. In: LREC. Citeseer; 2008. p. 896–901.
Gopalakrishna Pillai R, Thelwall M, Orasan C. Detection of stress and relaxation magnitudes for Tweets. In: Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee; 2018. p. 1677–1684.
Gouws S, Metzler D, Cai C, Hovy E. Contextual bearing on linguistic variation in social media. In: Proceedings of the Workshop on Languages in Social Media. Association for Computational Linguistics; 2011. p. 20–29.
Han B, Baldwin T. Lexical normalisation of short text messages: Makn sens a# twitter. In: ACL; 2011. p. 368–378.
Hanna S. An essential guide to singlish. Gartbooks: Singapore; 2003.
Google Scholar
Hirst G, Budanitsky A. Correcting real-word spelling errors by restoring lexical cohesion. Nat Lang Eng 2005;11(1):87–111.
Google Scholar
Hocq S. 2006. Étude des sms en franċais: constitution et exploitation d’un corpus aligné SMS-langue standard. Rapport interne, Université Aix-Marseille.
Hoppe H R. The Third (1600) Edition of Bales’s “Brachygraphy”. J Engl German Philol 1938;37(4):537–541.
Google Scholar
How Y, Kan M-Y. 2005. Optimizing predictive text entry for short message service on mobile phones. In: Proceedings of HCII; 2005. vol. 5. p.
Jahjah V, Khoury R, Lamontagne L. Word Normalization Using Phonetic Signatures. In: Khoury R. and Drummond C, editors. Advances in artificial intelligence. Springer International Publishing; 2016. p. 180–185.
Jiampojamarn S, Kondrak G, Sherif T. Applying many-to-many alignments and hidden markov models to letter-to-phoneme conversion. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference; 2007. p. 372–379.
Jing H, Lopresti D, Shih C. Summarizing noisy documents. In: Proceedings of the Symposium on Document Image Understanding Technology; 2003. p. 111–119.
Jose G, Raj NS. Lexico-syntactic normalization model for noisy SMS text. In: 2014 International Conference on Electronics, Communication and Computational Engineering (ICECCE). IEEE; 2014. p. 163–168.
Kaufmann M, Kalita J. Syntactic normalization of Twitter messages. International conference on natural language processing. India: Kharagpur; 2010. p. 7.
Khoury R. 2015. Phonetic normalization of microtext. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE; 2015s, p. 1600–1601.
Kobus C, Yvon F, Damnati G. Normalizing SMS: are two metaphors better than one? In: Proceedings of the 22nd International Conference on Computational Linguistics. Vol. 1. Association for Computational Linguistics; 2008, p. 441–448.
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, et al. Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics; 2007, p. 177–180.
Kohavi R, et al. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI. p. 1137–1145.
Leeman-Munk S, Lester J, Cox J. Ncsu_sas_sam: deep encoding and reconstruction for normalization of noisy text. In: Proceedings of the Workshop on Noisy User-generated Text; 2015. p. 154–161.
Li C, Liu Y. Normalization of text messages using character-and phone-based machine translation approaches. In: Thirteenth Annual Conference of the International Speech Communication Association; 2012. p. 2330–2333.
Liu F, Weng F, Jiang X. A broad-coverage normalization system for social media language. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers; 2012. Vol. 1. p. 1035–1044.
Lo S L, Cambria E, Chiong R, Cornforth D. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 2017;48(4):499–527.
Google Scholar
Lopes C, Perdigao F. Phoneme recognition on the TIMIT database. In: Speech Technologies. InTech; 2011, p. 285–302.
Lourentzou I, Manghnani K, Zhai C. 2019. Adapting sequence to sequence models for text normalization in social media. arXiv:1904.06100.
Luong M-T, Manning C. 2016. Achieving open vocabulary neural machine translation with hybrid word-character models. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2016. vol. 1, p. 1054–1063.
Luong T, Socher R, Manning C. Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning; 2013. p. 104–113.
Lusetti M, Ruzsics T, Göhring A, Samardžić T, Stark E. Encoder-decoder methods for text normalization. Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018). Santa Fe: Association for Computational Linguistics; 2018. p. 18–28. https://www.aclweb.org/anthology/W18-3902.
Manning C. 2011. Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In: International conference on intelligent text processing and computational linguistics. Springer; 2011, pp. 171–189.
Miller D, Boisen S, Schwartz R, Stone R, Weischedel R. 2000. Named entity extraction from noisy input: speech and OCR. In: Proceedings of the sixth conference on Applied natural language processing. Association for Computational Linguistics; 2000, p. 316–324.
Mittal A, Bhatt P, Kumar P. 2014. Phonetic matching and syntactic tree similarity based QA system for SMS queries. In: 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE). IEEE; 2014, p. 1–6.
Mitzschke P, Lipsius J, Haffley N. Biography of the father of stenography, Marcus Tullius Tiro together with the Latin Letter De Notis. Brooklyn: Concerning the Origin of Shorthand ; 1882.
Google Scholar
Molyneux J. 1993. Greek Lyric, Vol. III Stesichorus, Ibycus, Simonides, and Others ed. by David A. Campbell, Vol. 37.
Norvig P. 2007. How to write a spelling corrector. De: http://norvig.com/spell-correct.html.
Peng H, Ma Y, Li Y, Cambria E. Learning multi-grained aspect target sequence for Chinese sentiment analysis. Knowl-Based Syst 2018;148:167–176.
Google Scholar
Pennell DL, Liu Y. Normalization of text messages for text-to-speech. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). IEEE; 2010. p. 4842–4845.
Pennell D. L, Liu Y. A character-level machine translation approach for normalization of SMS abbreviations. In: IJCNLP; 2011. p. 974–982.
Pennell D L, Liu Y. Normalization of informal text. Comput Speech Lang 2014;28(1):256–277.
Google Scholar
Petrović S, Osborne M, Lavrenko V. The Edinburgh Twitter corpus. In: Proceedings of the NAACL HLT Workshop on Computational Linguistics in a World of Social Media; 2010. p. 25–26.
Pirinen TA, Hardwick S. 2012. Effects of weighted finite-state language and error models on speed and efficiency of finite-state spell-checking. In: Preproceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing FSMNLP; 2012. p. 6–14.
Pirinen T. A, Lindén K. State-of-the-art in weighted finite-state spell-checking. In: International Conference on Intelligent Text Processing and Computational Linguistics. Springer; 2014, p. 519–532.
Platt J T. 1975. The Singapore English speech continuum and its basilect‘Singlish’as a‘creoloid’. Anthropological Linguistics; 1975. p. 363–374.
Plutarch, Vol. 4. Moralia. Cambridge: Harvard University Press; 1936, p. 500.
Google Scholar
Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fus 2017;37:98–125.
Google Scholar
Psellu M. De operatione daemonum. A.M Hakkert; 1964. p. 2.
Robertson D S, et al. Phase change: the computer revolution in science and mathematics. USA: Oxford University Press; 2003.
Google Scholar
Rosa KD, Ellen J. Text classification methodologies applied to micro-text in military chat. In: Proc. Eight International Conference on Machine Learning and Applications. Miami; 2009, p. 710–714.
Satapathy R, Guerreiro C, Chaturvedi I, Cambria E. Phonetic-based microtext normalization for Twitter sentiment analysis. In: ICDM; 2017. p. 407–413.
Satapathy R, Li Y, Cavallari S, Cambria E. Seq2seq deep learning models for microtext normalization. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE; 2019.
Satapathy R, Singh A, Cambria E. PhonSenticNet: a cognitive approach to microtext normalization for concept-level sentiment analysis. CSoNet; 2019. p. 177–188. arXiv:1905.01967.
Schiaparelli L. Avviamento allo studio delle abbreviature latine nel medioevo. Olschki; 1926.
Skut W, Krenn B, Brants T, Uszkoreit H. 1997. An annotation scheme for free word order languages. In: Proceedings of the Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics, p. 88–95.
Taghva K, Borsack J, Condit A. Effects of OCR errors on ranking and feedback using the vector space model. Inf Process Manag 1996;32(3):317–327.
Google Scholar
Thurlow C, Brown A. Generation txt? The sociolinguistics of young people’s text-messaging. Discour Anal Online 2003;1(1):30.
Google Scholar
Wang P, Ng HT. A beam-search decoder for normalization of social media text with application to machine translation. In: HLT-NAACL; 2013. p. 471–481.
Wilcox-O’Hearn A, Hirst G, Budanitsky A. 2008. Real-word spelling correction with trigrams: a reconsideration of the Mays, Damerau, and Mercer model. In: International conference on intelligent text processing and computational linguistics. Springer; 2008, p. 605–616.
Xu K, Xia Y, Lee C-H. 2015. Tweet normalization with syllables In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers); 2015. vol. 1, p. 920–928.
Xue Z, Yin D, Davison B D. Normalizing microtext. Analyzing Microtext. 2011:74–79.
Yang Y, Eisenstein J. 2013. A log-linear model for unsupervised text normalization. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. p. 61–72.
Yuan S, Wu J, Wang L, Wang Q. A hybrid method for multi-class sentiment analysis of micro-blogs. In: 2016 13th International Conference on Service Systems and Service Management (ICSSSM). IEEE; 2016. p. 1–6.
Zhang C, Baldwin T, Ho H, Kimelfeld B, Li Y. Adaptive parser-centric text normalization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2013. vol. 1, p. 1159–1168.

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Ranjan Satapathy & Erik Cambria
School of Art, Design and Media, School of Humanities, Nanyang Technological University, Singapore, Singapore
Andrea Nanetti
School of Computing, Edinburgh Napier University, Edinburgh, UK
Amir Hussain

Authors

Ranjan Satapathy
View author publications
You can also search for this author in PubMed Google Scholar
Erik Cambria
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Nanetti
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Cambria.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Informed Consent

Informed consent was not required as no human or animals were involved.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Satapathy, R., Cambria, E., Nanetti, A. et al. A Review of Shorthand Systems: From Brachygraphy to Microtext and Beyond. Cogn Comput 12, 778–792 (2020). https://doi.org/10.1007/s12559-020-09723-7

Download citation

Received: 10 September 2019
Accepted: 05 March 2020
Published: 22 June 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s12559-020-09723-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Shorthand Systems: From Brachygraphy to Microtext and Beyond

Abstract

Access this article

Similar content being viewed by others

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Transliterating Latin to Amharic scripts using user-defined rules and character mappings

A Trie Based Model for SMS Text Normalization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Informed Consent

Human and Animal Rights

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Review of Shorthand Systems: From Brachygraphy to Microtext and Beyond

Abstract

Access this article

Similar content being viewed by others

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Transliterating Latin to Amharic scripts using user-defined rules and character mappings

A Trie Based Model for SMS Text Normalization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Informed Consent

Human and Animal Rights

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation