Skip to main content
Log in

Capitalization and punctuation restoration: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Ensuring proper punctuation and letter casing is a key pre-processing step towards applying complex natural language processing algorithms. This is especially significant for textual sources where punctuation and casing are missing, such as the raw output of automatic speech recognition systems. Additionally, short text messages and micro-blogging platforms offer unreliable and often wrong punctuation and casing. This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing. Furthermore, current challenges and research directions are highlighted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://mi.eng.cam.ac.uk/research/projects/EARS/ears_summary.html.

  2. https://www.nist.gov/itl/iad/mig/rich-transcription-evaluation.

  3. https://github.com/GateNLP/broad_twitter_corpus.

  4. http://storage.googleapis.com/books/ngrams/books/datasetsv2.html.

  5. http://www.intelligencesquaredus.org/.

  6. https://www.nist.gov/itl/iad/mig/rich-transcription-evaluation.

  7. http://www.ted.com.

  8. http://creativecommons.org/licenses/by-nc-nd/3.0/.

References

  • Agbago A, Kuhn R, Foster G (2005) Truecasing for the portage system. In: Proceedings of recent advances in natural language processing (RANLP)

  • Appelt DE, Hobbs JR, Bear J, Israel D, Kameyama M, Kehler A, Martin D, Myers K, Tyson M (1995) SRI international FASTUS system MUC-6 test results and analysis. In: Proceedings of the 6th message understanding conference

  • Augustyniak Ł, Szymanski P, Morzy M, Zelasko P, Szymczak A, Mizgajski J, Carmiel Y, Dehak N (2020) Punctuation prediction in spontaneous conversations: can we mitigate ASR errors with retrofitted word embeddings? arXiv:2004.05985

  • Baevski A, Edunov S, Liu Y, Zettlemoyer L, Auli M (2019) Cloze-driven pretraining of self-attention networks. arXiv:1903.07785

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  • Baldwin T, Cook P, Lui M, MacKinlay A, Wang L (2013) How noisy social media text, how different social media sources? In: Proceedings of the 6th international joint conference on natural language processing, pp 356–364

  • Ballesteros M, Wanner L (2016) A neural network architecture for multilingual punctuation generation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, association for computational linguistics, pp 1048–1053

  • Barr C, Jones R, Regelson M (2008) The linguistic structure of English web-search queries. In: Proceedings of the 2008 conference on empirical methods in natural language processing, association for computational linguistics, Honolulu, Hawaii, pp 1021–1030. https://www.aclweb.org/anthology/D08-1107. Accessed Jan 2021

  • Batista F, Namede N, Trancoso I (2008a) Language dynamics and capitalization using maximum entropy. In: Proceedings of ACL-08: HLT, short papers, association for computational linguistics, pp 1–4

  • Batista F, Caseiro D, Namede N, Trancoso I (2008b) Recovering capitalization and punctuation marks for automatic speech recognition: case study for Portuguese broadcast news. Speech Commun 50(10):847–862

    Article  Google Scholar 

  • Batista F, Namede N, Trancoso I (2008c) The impact of language dynamics on the capitalization of broadcast news. In: Proceedings of the 9th annual conference of the international speech communication association INTERSPEECH 2008

  • Batista F, Trancoso I, Mamede N (2009) Automatic recovery of punctuation marks and capitalization information for Iberian languages. In: Proceedings of the joint SIG-IL/microsoft workshop on speech and language technologies for Iberian languages, pp 99–102

  • Batista F, Moniz H, Trancoso I, Mamede N (2012) Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans Audio Speech Lang Process 20(2):474–485

    Article  Google Scholar 

  • Beeferman D, Berger A, Lafferty J (1998) Cyberpunc: a lightweight punctuation annotation system for speech. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, ICASSP ’98 (Cat. No.98CH36181), vol 2, pp 689–692. https://doi.org/10.1109/ICASSP.1998.675358

  • Bell P, Gales M, Hain T, Kilgour J, Lanchantin P, Liu X, McParland A, Renals S, Saz O, Wester M, Woodland PC (2015) The MGB challenge: evaluating multigenre broadcast media recognition. In: Proceedings of the 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), pp 687–693

  • Boháč M, Rott M, Kovář V (2017) Text punctuation: an inter-annotator agreement study. In: Proceedings of the international conference on text, speech, and dialogue, pp 120–128

  • Bradbury J, Merity S, Xiong C, Socher R (2016) Quasi-recurrent neural networks. arXiv:1611.01576 [cs.NE]

  • Brants T, Franz A (2006) Web 1t 5-gram corpus version 1.1. Technical Report, Google Research

  • Brants T, Franz A (2009) Web 1t 5-gram, 10 european languages version 1. LDC2009T25, Linguistic Data Consortium

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1983) Classification and regression trees. Wadsworth and Brooks, Pacific Grove

    MATH  Google Scholar 

  • Brill E (1993) A corpus-based approach to language learning. Ph.D. thesis, University of Pennsylvania

  • Britz D, Goldie A, Luong MT, Le Q (2017) Massive exploration of neural machine translation architectures. arXiv:1703.03906 [cs.CL]

  • Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. arXiv:2005.14165 [cs.CL]

  • Caranica A, Cucu H, Buzo A, Burileanu C (2015) Capitalization and punctuation restoration for Romanian language. UPB Sci Bull Ser C 77:95–106

    Google Scholar 

  • Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75

    Article  MathSciNet  Google Scholar 

  • Chan W, Ke NR, Lane I (2015) Transferring knowledge from a rnn to a dnn. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 3264–3268

  • Che X, Luo S, Yang H, Meinel C (2016) Sentence boundary detection based on parallel lexical and acoustic models. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp 2528–2532

  • Chebotar Y, Waters A (2016) Distilling knowledge from ensembles of neural networks for speech recognition. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 3439–3443

  • Chelba C, Acero A (2004) Adaptation of maximum entropy capitalizer: Little data can help a lot. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 285–292

  • Chen J (1999) Speech recognition with automatic punctuation. In: Proceedings of Eurospeech ’99, pp 447–450

  • Chen Q, Chen M, Li B, Wang W (2020) Controllable time-delay transformer for real-time punctuation prediction and disfluency detection. In: Proceedings of the 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8069–8073

  • Cho E, Niehues J, Waibel A (2012) Segmentation and punctuation prediction in speech language translation using a monolingual translation system. In: In proceedings of the international workshop for spoken language translation (IWSLT 2012), pp 252–259

  • Cho E, Niehues J, Waibel A (2017) NMT-based segmentation and punctuation insertion for real-time spoken language translation. Proc Interspeech 2017:2645–2649

    Article  Google Scholar 

  • Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of EMNLP, pp 1724–1734

  • Christensen H, Gotoh Y, Renals S (2001) Punctuation annotation using statistical prosody models. In: Proceedings of ISCA workshop on prosody in speech recognition and understanding

  • Chung YA, Glass J (2018) Speech2vec: a sequence-to-sequence framework for learning word embeddings from speech. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 811–815

  • Coniam D (2008) Evaluating the language resources of chatbots for their potential in English as a second language. ReCALL: J EUROCALL 20(1):98

    Article  Google Scholar 

  • Coniam D (2014) The linguistic accuracy of chatbots: usability from an ESL perspective. Text Talk 34(5):545–567

    Article  Google Scholar 

  • Coster W, Kauchak D (2011) Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, association for computational linguistics, vol 2, pp 665–669

  • Courtland M, Faulkner A, McElvain G (2020) Efficient automatic punctuation restoration using bidirectional transformers with robust inference. In: Proceedings of the 17th international conference on spoken language translation (IWSLT), pp 272–279. https://doi.org/10.18653/v1/2020.iwslt-1.33

  • Datta P, Jakubowicz P, Vogler C, Kushalnagar R (2020) Readability of punctuation in automatic subtitles. In: Proceedings of the international conference on computers helping people with special needs, pp 195–201

  • Deng L, Platt JC (2014) Ensemble deep learning for speech recognition. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 1915–1919

  • Derczynski L, Bontcheva K, Roberts I (2016) Broad Twitter corpus: a diverse named entity recognition resource. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, the COLING 2016 organizing committee, Osaka, Japan, pp 1169–1179. https://www.aclweb.org/anthology/C16-1111

  • Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), Association for Computational Linguistics, pp 4171–4186

  • Dingwall N, Potts C (2018) Mittens: an extension of glove for learning domain-specialized representations. arXiv:1803.09901

  • Driesen J, Birch A, Grimsey S, Safarfashandi S, Gauthier J, Simpson M, Renals S (2014) Automated production of true-cased punctuated subtitles for weather and news broadcasts. In: Proceedings of the 15th annual conference of the international speech communication association INTERSPEECH 2014, international speech communication association, pp 2146–2147

  • Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159

    MathSciNet  MATH  Google Scholar 

  • Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing with stack long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, volume 1: long papers, Association for Computational Linguistics, pp 334–343

  • Ehara Y, Sato I, Oiwa H, Nakagawa H (2013) Understanding seed selection in bootstrapping. In: Proceedings of TextGraphs-8 graph-based methods for natural language processing, pp 44–52

  • Elsayed H, Elghazaly T (2015) A named entities recognition system for modern standard Arabic using rule-based approach. In: Proceedings of the first international conference on Arabic computational linguistics (ACLing), pp 51–54. https://doi.org/10.1109/ACLing.2015.14

  • Etchegoyhen T, Gete H (2020) To case or not to case: evaluating casing methods for neural machine translation. In: Proceedings of the 12th language resources and evaluation conference, European language resources association, Marseille, France, pp 3752–3760. https://www.aclweb.org/anthology/2020.lrec-1.463

  • Federico M, Bentivogli L, Paul M, Stueker S (2011) Overview of the iwslt 2011 evaluation campaign. In: Proceedings of the international workshop on spoken language translation (IWSLT), pp 11–27

  • Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43nd annual meeting of the association for computational linguistics (ACL 2005), pp 363–370

  • Freund Y, Schapire R (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37

  • Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    MathSciNet  MATH  Google Scholar 

  • Gage P (1994) A new algorithm for data compression. C Users J 12(2):23–38

    Google Scholar 

  • Gale W, Parthasarathy S (2017) Experiments in character-level neural network models for punctuation. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 2794–2798

  • Gale W, Church K, Yarowsky D (1992) A method for disambiguating word senses in a large corpus. Comput Humanit 26:415–439

    Article  Google Scholar 

  • Gale W, Church KW, Yarowsky D (1994) Discrimination decisions for 100,000-dimensional spaces. In: Current issues in computational linguistics, pp 429–450

  • Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17:1–35

    MathSciNet  MATH  Google Scholar 

  • Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Adv Neural Inf Process Syst 3:2672–2680

    Google Scholar 

  • Gravano A, Jansche M, Bacchiani M (2009) Restoring punctuation and capitalization in transcribed speech. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 4741–4744

  • Gupta NK, Bangalore S (2002) Extracting clauses for spoken language understanding in conversational systems. Proc Conf Empir Methods Natural Lang Process 10:273–280

    Google Scholar 

  • Hahnloser RH, Sarpeshkar R, Mahowald MA, Douglas RJ, Seung HS (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405:947–951. https://doi.org/10.1038/35016072

    Article  Google Scholar 

  • Hakkani-Tur D, Tur G, Stolcke A, Shriberg E (1999) Combining words and prosody for information extraction from speech. In: Proceedings of the European Conference on Speech Communication and Technology, (EUROSPEECH)

  • Han X, Eisenstein J (2019) Unsupervised domain adaptation of contextualized embeddings for sequence labeling. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4229–4239

  • Harris ZS (1954) Distributional structure. WORD 10(2–3):146–162. https://doi.org/10.1080/00437956.1954.11659520

    Article  Google Scholar 

  • Hasan M, Doddipatla R, Hain T (2014) Multi-pass sentence-end detection of lecture speech. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, pp 2902–2906

  • Hasan M, Doddipatla R, Hain T (2015) Noise-matched training of crf based sentence end detection models. In: Proceedings of the 16th annual conference of the international speech communication association (INTERSPEECH), pp 349–353

  • Heinzerling B, Strube M (2018) Bpemb: Tokenization-free pretrained subword embeddings in 275 languages. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), European language resources association (ELRA), pp 2989–2993

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Huang J, Zweig G (2002) Maximum entropy model for punctuation annotation from speech. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 917–920

  • Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv:1508.01991 [cs.CL]

  • Ide N, Macleod C (2001) The American national corpus: a standardized resource for American English. Proc Corpus Linguist 2001:831–836

    Google Scholar 

  • Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: Proceedings of the 12th IEEE international conference on computer vision, pp 2146–2153. https://doi.org/10.1109/ICCV.2009.5459469

  • Jones BEM (1994) Exploring the role of punctuation in parsing natural text. In: Proceedings of the 15th conference on computational linguistics—volume 1 (COLING ’94), Association for Computational Linguistics, pp 421–425

  • Jones DA, Wolf F, Gibson E, Williams E, Fedorenko E, Reynolds DA, Zissman M (2003) Measuring the readability of automatic speech-to-text transcripts. In: Proceedings of the 8th European conference on speech communication and technology (EUROSPEECH), pp 1585–1588

  • Juin CC, Wei RXJ, D’Haro LF, Banchs RE (2017) Punctuation prediction using a bidirectional recurrent neural network with part-of-speech tagging. In: Proceedings of the IEEE region 10 conference TENCON 2017, pp 1806–1811. https://doi.org/10.1109/TENCON.2017.8228151

  • Jurafsky D, Martin J (2008) Speech and language processing, 2nd edn. Prentice Hall, New York

    Google Scholar 

  • Kaplan A (1950) An experimental study of ambiguity in context. Mech Transl 1:1–3

    Google Scholar 

  • Kaufmann M, Kalita J (2010) Syntactic normalization of twitter messages. In: Proceedings of the international conference on natural language processing

  • Kim JH, Woodland PC (2000) A rule-based named entity recognition system for speech input. In: Proceedings lCSLP, pp S21–524

  • Kim JH, Woodland PC (2002) Implementation of automatic capitalisation generation systems for speech input. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp I–857–I–860. https://doi.org/10.1109/ICASSP.2002.5743874

  • Kim JH, Woodland PC (2004) Automatic capitalisation generation for speech input. Comput Speech Lang 18(1):67–90

    Article  Google Scholar 

  • Kim S (2019) Deep recurrent neural networks with layer-wise multi-head attentions for punctuation restoration. In: Proceedings of the 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP 2019), pp 7280–7284

  • Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: open-source toolkit for neural machine translation. arXiv:1701.02810

  • Klejch O, Bell P, Renals S (2016) Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches. In: Spoken language technology workshop, pp 433–440

  • Klejch O, Bell P, Renals S (2017) Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5700–5704

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. MT Summit 5:79–86

    Google Scholar 

  • Kolář J, Lamel L (2012) Development and evaluation of automatic punctuation for French and English speech-totext. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 1376–1379

  • Kolář J, Švec J, Psutka J (2004) Automatic punctuation annotation in Czech broadcast news speech. In: Proceedings SPECOM, pp 319–325

  • Kompe R (1996) Prosody in speech understanding systems. Springer, Berlin

    Google Scholar 

  • Kondratyuk D, Straka M (2019) 75 languages, 1 model: parsing universal dependencies universally. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Association for Computational Linguistics, pp 2779–2795

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmentation and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML ’01), Morgan Kaufmann Publishers Inc., pp 282–289

  • Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies, pp 260–270

  • Li X, Lin E (2020) A 43 language multilingual punctuation prediction neural network model. Proc INTERSPEECH 2020:1067–1071

    Google Scholar 

  • Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings ICCV, pp 2980–2988

  • Ling W, Dyer C, Black AW, Trancoso I, Fermandez R, Amir S, Marujo L, Luìs T (2015) Finding function in form: compositional character models for open vocabulary word representation. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, pp 1520–1530

  • Lita LV, Ittycheriah A, Roukos S, Kambhatla N (2003) Truecasing. In: Proceedings of the 41st annual meeting on association for computational linguistics, pp 152–159

  • Liu Y, Shriberg E, Stolcke A, Peskin B, Ang J, Hillard D, Ostendorf M, Tomalin M, Woodland P, Harper M (2005a) Structural metadata research in the ears program. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP 2005), pp 957–960

  • Liu Y, Stolcke A, Shriberg E, Harper M (2005b) Using conditional random fields for sentence boundary detection in speech. In: Proceedings of ACL’05. https://doi.org/10.3115/1219840.1219896

  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized Bert pretraining approach. arXiv:1907.11692 [cs.CL]

  • Lu W, Ng HT (2010) Better punctuation prediction with dynamic conditional random fields. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 177–186

  • MacIntyre R (1998) 1996 CSR hub4 language model. LDC98T31, Linguistic Data Consortium

  • Makhija K, Ho TN, Siong CE (2019) Transfer learning for punctuation prediction. In: Proceedings of the 2019 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 268–273

  • Makhoul J, Kubala F, Schwartz R, Weischede R (1999) Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop, pp 249–252

  • Makhoul J, Baron A, Bulyko I, Nguyen L, Ramshaw LA, Stallard D, Schwartz RM, Xiang B (2005) The effects of speech recognition and punctuation on information extraction performance. In: INTERSPEECH 2005—Eurospeech, 9th European conference on speech communication and technology, Lisbon, Portugal, September 4–8, 2005, ISCA, pp 57–60. http://www.isca-speech.org/archive/interspeech_2005/i05_0057.html

  • Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The Stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60

  • Marcus MP, Santorini B, Marcinkiewicz MA, Taylor A (1999) Treebank-3. LDC99T42, linguistic data consortium

  • Markwardt AH (1942) Introduction to the English language. Oxford University Press, New York

    Google Scholar 

  • Masterson M (1967) Mechanical pidgin translation. Wiley, Hoboken

    Google Scholar 

  • Matusov E, Mauser A, Ney H (2006) Automatic sentence segmentation and punctuation prediction for spoken language translation. In: Proceedings of IWSLT, pp 158–165

  • Mayhew S, Tsygankova T, Roth D (2019) ner and pos when nothing is capitalized. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, pp 6256–6261. https://doi.org/10.18653/v1/D19-1650, https://www.aclweb.org/anthology/D19-1650

  • Michel JB, Shen YK, Aiden AP, Veres A, Gray MK, Brockman W, Team TGB, Pickett JP, Hoiberg D, Clancy D, Norvig P, Orwant J, Pinker S, Nowak MA, Aiden EL (2011) Quantitative analysis of culture using millions of digitized books. Science 331:176–182. https://doi.org/10.1126/science.1199644

    Article  Google Scholar 

  • Mikheev A (1999) A knowledge-free method for capitalized word disambiguation. In: Proceedings of the annual meeting of the association for computational linguistics, pp 159–166

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs.CL]

  • Miller D, Boisen S, Schwartz R, Stone R, Weischedel R (2000) Named entity extraction from noisy input: speech and OCR. In: Proceedings of the sixth conference on applied natural language processing, pp 316–324

  • Moniz H, Batista F, Meinedo H, Abad A, Trancoso I, Mata AI, Mamede N (2010) Prosodically-based automatic segmentation and punctuation. In: Proceedings of speech prosody 2010, p 910

  • Mota C (2008) How to keep up with language dynamics? A case study on named entity recognition. Ph.D. thesis, IST/UTL

  • Mota C, Grishman R (2008) Is this ne tagger getting old? In: Proceedings of the sixth international conference on language resources and evaluation (LREC’08), European Language Resources Association (ELRA), pp 28–30

  • Nanchen A, Garner PN (2019) Empirical evaluation and combination of punctuation prediction models applied to broadcast news. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7275–7279

  • Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69. https://doi.org/10.1145/1459352.1459355

    Article  Google Scholar 

  • Nebhi K, Bontcheva K, Gorrell G (2015) Restoring capitalization in #tweets. In: Proceedings of WWW companion, pp 1111–1115

  • Nguyen B, Nguyen VBH, Nguyen H, Phuong PN, Nguyen TL, Do QT, Mai LC (2019) Fast and accurate capitalization and punctuation for automatic speech recognition using transformer and chunk merging. In: Proceedings of the 2nd conference of the oriental COCOSDA international committee for the co-ordination and standardisation of speech databases and assessment techniques (O-COCOSDA), pp 1–5. https://doi.org/10.1109/O-COCOSDA46868.2019.9041202

  • Niesler T, Woodland P (1996) A variable-length category-based ngram language model. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing ICASSP-96, vol 1, pp 164–167

  • Nivre J (2004) Incrementality in deterministic dependency parsing. In: Proceedings of the workshop on incremental parsing: bringing engineering and cognition together, Association for Computational Linguistics, pp 50–57

  • Nunberg G (1990) The linguistics of punctuation. In: CSLI lecture notes, p 18

  • Nöth E, Batliner A, Kießling A, Kompe R, Niemann H (1999) Suprasegmental modelling. In: Computational models of speech pattern processing, NATO ASI series (Series F: computer and systems sciences), vol 169, pp 181–198

  • Ostendorf M, Favre B, Grishman R, Hakkani-Tür D, Harper M, Hillard D, Hirschberg J, Ji H, Kahn JG, Liu Y, Maskey S, Matusov E, Ney H, Rosenberg A, Shriberg E, Wang W, Wooters C (2008) Speech segmentation and spoken document processing. IEEE Signal Process Mag 25:59–69

    Article  Google Scholar 

  • Pahuja V, Laha A, Mirkin S, Raykar V, Kotlerman L, Lev G (2017) Joint learning of correlated sequence labeling tasks using bidirectional recurrent neural networks. Proc Interspeech 2017:548–552

    Article  Google Scholar 

  • Pallett D, Fiscus J, Garofolo J, Martin A, Przybocki M (2000) 1998 broadcast news benchmark test results: English and non-English word error rate performance measures. In: DARPA broadcast news transcription and understanding workshop

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: A method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics (ACL 2002), Association for Computational Linguistics, pp 311–318

  • Pauls A, Klein D (2011) Faster and smaller n-gram language models. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies—volume 1 (HLT ’11), Association for Computational Linguistics, pp 258–267

  • Peitz S, Freitag M, Mauser A, Ney H (2011) Modeling punctuation prediction as machine translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), pp 238–245

  • Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Petasis G, Vichot F, Wolinski F, Paliouras G, Karkaletsis V, Spyropoulos CD (2001) Using machine learning to maintain rule-based named-entity recognition and classification systems. In: Proceedings of the 39th annual meeting on association for computational linguistics, association for computational linguistics, pp 426–433

  • Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations,. In: Proceedings of NAACL 2018

  • Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlıcek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The kaldi speech recognition toolkit. In: Proceedings of the IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE Signal Processing Society, pp 1–4

  • Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog

  • Ramena G, Nagaraju D, Moharana S, Mohanty DP, Purre N (2020) An efficient architecture for predicting the case of characters using sequence models. arXiv:abs/2002.00738

  • Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. In: Brill E, Church K (eds) Proceedings of the conference on empirical methods in natural language processing, pp 133–142

  • Rayson SJ, Hachamovitch DJ, Kwatinetz AL, Hirsch SM (1998) Autocorrecting text typed into a word processing document. U.S. patent 5761689

  • Rei R, Guerreiro NM, Batista F (2020) Automatic truecasing of video subtitles using Bert: a multilingual adaptable approach. Inf Process Manag Uncertain Knowl-Based Syst 1237:708–721

    Google Scholar 

  • Romero V, Sánchez JA (2013) Category-based language models for handwriting recognition of marriage license books. In: Proceedings of the 12th international conference on document analysis and recognition, pp 788–792. https://doi.org/10.1109/ICDAR.2013.161

  • Rosenfeld R (2000) Two decades of statistical language modeling: where do we go from here? Proc IEEE 88:1270–1278

    Article  Google Scholar 

  • Ruhlen H, Pressey SL (1923) A statistical study of current usage in punctuation. Educ Res Bull 2(12):179–182

    Google Scholar 

  • Sadat F, Johnson H, Agbago A, Foster G, Kuhn R, Martin J, Tikuisis A (2005) Portage: A phrase-based machine translation system. In: Proceedings of the ACL workshop on building and using parallel texts, pp 129–132

  • Salimbajevs A (2016) Bidirectional lstm for automatic punctuation restoration. In: Human language technologies-the baltic perspective: proceedings of the seventh international conference Baltic HLT 2016, vol 289, pp 59–65

  • Salloum W, Finley G, Edwards E, Miller M, Suendermann-Oeft D (2017) Deep learning for punctuation restoration in medical reports. Proc BioNLP 2017:159–164

    Google Scholar 

  • Sanchez G (2019) Sentence boundary detection in legal text. In: Proceedings of the natural legal language processing workshop, Association for Computational Linguistics, pp 31–38

  • Savelka J, Walker VR, Grabmair M, Ashley KD (2017) Sentence boundary detection in adjudicatory decisions in the united states. Traitement Automatique des langues 58(2):21–45

    Google Scholar 

  • Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336

    Article  MATH  Google Scholar 

  • Schapire RE, Singer Y (2000) Boostexter: A boosting-based system for text categorization. Mach Learn 39:135–168

    Article  MATH  Google Scholar 

  • Schukat-Talamazzini EG (1995) Stochastic language models. In: Electrotechnical and computer science conference

  • Schukat-Talamazzini EG, Gallwitz F, Harbeck S, Warnke V (1997) Rational interpolation of maximum likelihood predictors in stochastic language modeling. In: Proceedings of the fifth European conference on speech communication and technology (EUROSPEECH), pp 2731–2734

  • Seide F, Li G, Chen X, Yu D (2011) Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of IEEE workshop on automatic speech recognition and understanding (ASRU), pp 24–29

  • Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), Association for Computational Linguistics, pp 1715–1725

  • Shawar BA, Atwell E (2007) Chatbots: are they really useful? LDV Forum 22(1):29–49

    Google Scholar 

  • Shriberg E, Bates R, Stolcke A (1997) A prosody-only decision-tree model for disfluency detection. In: Proceedings of the European conference on speech communication and technology, (EUROSPEECH), vol 5, pp 2383–2386

  • Siivola V, Pellom BL (2005) Growing an n-gram language model. In: Proceedings of INTERSPEECH-2005, pp 1309–1312

  • Soboleva D, Skopek O, Šajgalìk M, Cărbune V, Weissenberger F, Proskurnia J, Prisacari B, Valcarce D, Lu J, Prabhavalkar R, Miklos B (2020) Replacing human audio with synthetic audio for on-device unspoken punctuation prediction. arXiv:2010.10203 [cs.LG]

  • Song HJ, Kim HK, Kim JD, Park CY, Kim YS (2019) Inter-sentence segmentation of Youtube subtitles using long-short term memory (LSTM). Appl Sci 9(7):1504. https://doi.org/10.3390/app9071504

    Article  Google Scholar 

  • Spitkovsky VI, Alshawi H, Jurafsky D (2011) Punctuation: making a point in unsupervised dependency parsing. In: Proceedings of the fifteenth conference on computational natural language learning (CoNLL), pp 19–28

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Stolcke A, Shriberg E (1996) Statistical language modeling for speech disfluencies. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol 1, pp 405–408

  • Stolcke A, Shriberg E (1997) Automatic linguistic segmentation of conversational speech. In: Proceedings of EuroSpeech 97, vol 2, pp 1005–1008

  • Stolcke A, Shriberg E, Bates R, Ostendorf M, Hakkani D, Plauche M, Tur G, Lu Y (1998) Automatic detection of sentence boundaries and disfluencies based on recognized words. In: Proceedings of the international conference on spoken language processing, pp 2247–2250

  • Sunkara M, Ronanki S, Dixit K, Bodapati S, Kirchhoff K (2020) Robust prediction of punctuation and truecasing for medical asr. In: Proceedings of the first workshop on natural language processing for medical conversations, Association for Computational Linguistics, pp 53–62

  • Susanto RH, Chieu HL, Lu W (2016) Learning to capitalize with character-level recurrent neural networks: an empirical study. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 2090–2095

  • Sutskever I, Vinyals O, Le Q (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th international conference on neural information processing systems (NIPS), MIT Press, vol 2, pp 3104–3112

  • Sutton C, McCallum A, Rohanimanesh K (2007) Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data. J Mach Learn Res 8:693–723

    MATH  Google Scholar 

  • Szaszak G, Tündik M (2019) Leveraging a character, word and prosody triplet for an asr error robust and agglutination friendly punctuation approach. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 2988–2992

  • Tilk O, Alumäe T (2015) Lstm for punctuation restoration in speech transcripts. In: Proceedings of the Sixteenth annual conference of the international speech communication association (INTERSPEECH), pp 683–687

  • Tilk O, Alumäe T (2016) Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 3047–3051

  • Todorovic BT, Rancic SR, Markovic IM, Mulalic EH, Ilic VM (2008) Named entity recognition and classification using context hidden markov model. In: Proceedings of the 9th symposium on neural network applications in electrical engineering proceedings of the 9th symposium on neural network applications in electrical engineering, pp 43–46. https://doi.org/10.1109/NEUREL.2008.4685557

  • Tomita T, Okimoto Y, Yamamoto H, Sagisaka Y (2005) Speech recognition of a named entity. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP ’05), vol 1, pp I/1057–I/1060. https://doi.org/10.1109/ICASSP.2005.1415299

  • Tündik M, Szaszak G (2018) Joint word-and character-level embedding CNN-RNN models for punctuation restoration. In: Proceedings of the 9th IEEE international conference on cognitive infocommunications (CogInfoCom). https://doi.org/10.1109/CogInfoCom.2018.8639876

  • Tündik M, Szaszák G, Gosztolya G, Beke A (2018) User-centric evaluation of automatic punctuation in ASR closed captioning. Proc Interspeech 2018:2628–2632

    Google Scholar 

  • Ákos Tündik M, Kaszás V, Szaszák G (2019) Assessing the semantic space bias caused by ASR error propagation and its effect on spoken document summarization. In: Proceedings of interspeech 2019, pp 1333–1337. https://doi.org/10.21437/Interspeech.2019-2154

  • Ueffing N, Bisani M, Vozila P (2013) Improved models for automatic punctuation prediction for spoken and written text. In: Proceedings of the 13th annual conference of the international speech communication association (INTERSPEECH), pp 3097–3101

  • Vaissiere J (1983) Language-independent prosodic features. Springer, Berlin, pp 53–66

    Google Scholar 

  • Vandeghinste V, Verwimp L, Pelemans J, Wambacq P (2018) A comparison of different punctuation prediction approaches in a translation context. In: Proceedings of the 21st annual conference of the European association for machine translation, pp 269–278

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Łukasz Kaiser, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems (NIPS), pp 5998–6008

  • Vaswani A, Bengio S, Brevdo E, Chollet F, Gomez AN, Gouws S, Jones L, Łukasz Kaiser, Kalchbrenner N, Parmar N, Sepassi R, Shazeer N, Uszkoreit J (2018) Tensor2tensor for neural machine translation. arXiv:1803.07416 [cs.LG]

  • Vāravs A, Salimbajevs A (2018) Restoring punctuation and capitalization using transformer models. In: Proceedings of the international conference on statistical language and speech processing, pp 91–102

  • Wang F, Chen W, Yang Z, Xu B (2018) Self-attention based network for punctuation restoration. In: Proceedings of the 24th international conference on pattern recognition (ICPR), pp 2803–2808

  • Wang T, Cho K (2015) Larger-context language modelling. arXiv:1511.03729

  • Wang W, Knight K, Marcu D (2006) Capitalizing machine translation. In: Proceedings of the main conference on human language technology conference of the North American chapter of the Association of Computational Linguistics, ACM, pp 1–8

  • Wang X, Ng HT, Sim KC (2012) Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: Proceedings of the 13th annual conference of the international speech communication association (INTERSPEECH), pp 1382–1385

  • Warnke V, Kompe R, Niemann H, Noth E (1997) Integrated dialog act segmentation and classification using prosodic features and language models. In: Proceedings of the fifth European conference on speech communication and technology (EUROSPEECH), pp 207–210

  • Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Drame M, Lhoest Q, Rush AM (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv:1910.03771 [cs.CL]

  • Yang J, Zhang Y (2018) Ncrf++: an open-source neural sequence labeling toolkit. In: Proceedings of ACL 2018, system demonstrations, pp 74–79

  • Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 32:5753–5763

    Google Scholar 

  • Yi J, Tao J (2019) Self-attention based model for punctuation prediction using word and speech embeddings. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7270–7274

  • Yi J, Tao J, Wen Z, Li Y (2017) Distilling knowledge from an ensemble of models for punctuation prediction. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 2779–2783

  • Yi J, Tao J, Bai Y, Tian Z, Fan C (2020a) Adversarial transfer learning for punctuation restoration. arXiv:2004.00248 [cs.CL]

  • Yi J, Tao J, Tian Z, Bai Y, Fan C (2020b) Focal loss for punctuation prediction. Proc Interspeech 2020:721–725

    Google Scholar 

  • Zens R, Ney H (2008) Improvements in dynamic programming beam search for phrase-based statistical machine translation. In: Proceedings of the international workshop on spoken language translation, pp 195–205

  • Zhao Y, Xue J, Chen X (2015) Ensemble learning approaches in speech recognition. Springer, New York, pp 113–152

    Google Scholar 

  • Öktem A, Farrús M, Wanner L (2017) Attentional parallel RNNS for generating punctuation in transcribed speech. In: Proceedings of the 5th international conference statistical language and speech processing, pp 131–142

  • Żelasko P, Szymański P, Mizgajski J, Szymczak A, Carmiel Y, Dehak N (2018) Punctuation prediction model for conversational speech. arXiv:1807.00543

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasile Păiş.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Păiş, V., Tufiş, D. Capitalization and punctuation restoration: a survey. Artif Intell Rev 55, 1681–1722 (2022). https://doi.org/10.1007/s10462-021-10051-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-10051-x

Keywords

Navigation