Abstract
Ensuring proper punctuation and letter casing is a key pre-processing step towards applying complex natural language processing algorithms. This is especially significant for textual sources where punctuation and casing are missing, such as the raw output of automatic speech recognition systems. Additionally, short text messages and micro-blogging platforms offer unreliable and often wrong punctuation and casing. This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing. Furthermore, current challenges and research directions are highlighted.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Agbago A, Kuhn R, Foster G (2005) Truecasing for the portage system. In: Proceedings of recent advances in natural language processing (RANLP)
Appelt DE, Hobbs JR, Bear J, Israel D, Kameyama M, Kehler A, Martin D, Myers K, Tyson M (1995) SRI international FASTUS system MUC-6 test results and analysis. In: Proceedings of the 6th message understanding conference
Augustyniak Ł, Szymanski P, Morzy M, Zelasko P, Szymczak A, Mizgajski J, Carmiel Y, Dehak N (2020) Punctuation prediction in spontaneous conversations: can we mitigate ASR errors with retrofitted word embeddings? arXiv:2004.05985
Baevski A, Edunov S, Liu Y, Zettlemoyer L, Auli M (2019) Cloze-driven pretraining of self-attention networks. arXiv:1903.07785
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Baldwin T, Cook P, Lui M, MacKinlay A, Wang L (2013) How noisy social media text, how different social media sources? In: Proceedings of the 6th international joint conference on natural language processing, pp 356–364
Ballesteros M, Wanner L (2016) A neural network architecture for multilingual punctuation generation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, association for computational linguistics, pp 1048–1053
Barr C, Jones R, Regelson M (2008) The linguistic structure of English web-search queries. In: Proceedings of the 2008 conference on empirical methods in natural language processing, association for computational linguistics, Honolulu, Hawaii, pp 1021–1030. https://www.aclweb.org/anthology/D08-1107. Accessed Jan 2021
Batista F, Namede N, Trancoso I (2008a) Language dynamics and capitalization using maximum entropy. In: Proceedings of ACL-08: HLT, short papers, association for computational linguistics, pp 1–4
Batista F, Caseiro D, Namede N, Trancoso I (2008b) Recovering capitalization and punctuation marks for automatic speech recognition: case study for Portuguese broadcast news. Speech Commun 50(10):847–862
Batista F, Namede N, Trancoso I (2008c) The impact of language dynamics on the capitalization of broadcast news. In: Proceedings of the 9th annual conference of the international speech communication association INTERSPEECH 2008
Batista F, Trancoso I, Mamede N (2009) Automatic recovery of punctuation marks and capitalization information for Iberian languages. In: Proceedings of the joint SIG-IL/microsoft workshop on speech and language technologies for Iberian languages, pp 99–102
Batista F, Moniz H, Trancoso I, Mamede N (2012) Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans Audio Speech Lang Process 20(2):474–485
Beeferman D, Berger A, Lafferty J (1998) Cyberpunc: a lightweight punctuation annotation system for speech. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, ICASSP ’98 (Cat. No.98CH36181), vol 2, pp 689–692. https://doi.org/10.1109/ICASSP.1998.675358
Bell P, Gales M, Hain T, Kilgour J, Lanchantin P, Liu X, McParland A, Renals S, Saz O, Wester M, Woodland PC (2015) The MGB challenge: evaluating multigenre broadcast media recognition. In: Proceedings of the 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), pp 687–693
Boháč M, Rott M, Kovář V (2017) Text punctuation: an inter-annotator agreement study. In: Proceedings of the international conference on text, speech, and dialogue, pp 120–128
Bradbury J, Merity S, Xiong C, Socher R (2016) Quasi-recurrent neural networks. arXiv:1611.01576 [cs.NE]
Brants T, Franz A (2006) Web 1t 5-gram corpus version 1.1. Technical Report, Google Research
Brants T, Franz A (2009) Web 1t 5-gram, 10 european languages version 1. LDC2009T25, Linguistic Data Consortium
Breiman L, Friedman JH, Olshen RA, Stone CJ (1983) Classification and regression trees. Wadsworth and Brooks, Pacific Grove
Brill E (1993) A corpus-based approach to language learning. Ph.D. thesis, University of Pennsylvania
Britz D, Goldie A, Luong MT, Le Q (2017) Massive exploration of neural machine translation architectures. arXiv:1703.03906 [cs.CL]
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. arXiv:2005.14165 [cs.CL]
Caranica A, Cucu H, Buzo A, Burileanu C (2015) Capitalization and punctuation restoration for Romanian language. UPB Sci Bull Ser C 77:95–106
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Chan W, Ke NR, Lane I (2015) Transferring knowledge from a rnn to a dnn. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 3264–3268
Che X, Luo S, Yang H, Meinel C (2016) Sentence boundary detection based on parallel lexical and acoustic models. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp 2528–2532
Chebotar Y, Waters A (2016) Distilling knowledge from ensembles of neural networks for speech recognition. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 3439–3443
Chelba C, Acero A (2004) Adaptation of maximum entropy capitalizer: Little data can help a lot. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 285–292
Chen J (1999) Speech recognition with automatic punctuation. In: Proceedings of Eurospeech ’99, pp 447–450
Chen Q, Chen M, Li B, Wang W (2020) Controllable time-delay transformer for real-time punctuation prediction and disfluency detection. In: Proceedings of the 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8069–8073
Cho E, Niehues J, Waibel A (2012) Segmentation and punctuation prediction in speech language translation using a monolingual translation system. In: In proceedings of the international workshop for spoken language translation (IWSLT 2012), pp 252–259
Cho E, Niehues J, Waibel A (2017) NMT-based segmentation and punctuation insertion for real-time spoken language translation. Proc Interspeech 2017:2645–2649
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of EMNLP, pp 1724–1734
Christensen H, Gotoh Y, Renals S (2001) Punctuation annotation using statistical prosody models. In: Proceedings of ISCA workshop on prosody in speech recognition and understanding
Chung YA, Glass J (2018) Speech2vec: a sequence-to-sequence framework for learning word embeddings from speech. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 811–815
Coniam D (2008) Evaluating the language resources of chatbots for their potential in English as a second language. ReCALL: J EUROCALL 20(1):98
Coniam D (2014) The linguistic accuracy of chatbots: usability from an ESL perspective. Text Talk 34(5):545–567
Coster W, Kauchak D (2011) Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, association for computational linguistics, vol 2, pp 665–669
Courtland M, Faulkner A, McElvain G (2020) Efficient automatic punctuation restoration using bidirectional transformers with robust inference. In: Proceedings of the 17th international conference on spoken language translation (IWSLT), pp 272–279. https://doi.org/10.18653/v1/2020.iwslt-1.33
Datta P, Jakubowicz P, Vogler C, Kushalnagar R (2020) Readability of punctuation in automatic subtitles. In: Proceedings of the international conference on computers helping people with special needs, pp 195–201
Deng L, Platt JC (2014) Ensemble deep learning for speech recognition. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 1915–1919
Derczynski L, Bontcheva K, Roberts I (2016) Broad Twitter corpus: a diverse named entity recognition resource. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, the COLING 2016 organizing committee, Osaka, Japan, pp 1169–1179. https://www.aclweb.org/anthology/C16-1111
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), Association for Computational Linguistics, pp 4171–4186
Dingwall N, Potts C (2018) Mittens: an extension of glove for learning domain-specialized representations. arXiv:1803.09901
Driesen J, Birch A, Grimsey S, Safarfashandi S, Gauthier J, Simpson M, Renals S (2014) Automated production of true-cased punctuated subtitles for weather and news broadcasts. In: Proceedings of the 15th annual conference of the international speech communication association INTERSPEECH 2014, international speech communication association, pp 2146–2147
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing with stack long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, volume 1: long papers, Association for Computational Linguistics, pp 334–343
Ehara Y, Sato I, Oiwa H, Nakagawa H (2013) Understanding seed selection in bootstrapping. In: Proceedings of TextGraphs-8 graph-based methods for natural language processing, pp 44–52
Elsayed H, Elghazaly T (2015) A named entities recognition system for modern standard Arabic using rule-based approach. In: Proceedings of the first international conference on Arabic computational linguistics (ACLing), pp 51–54. https://doi.org/10.1109/ACLing.2015.14
Etchegoyhen T, Gete H (2020) To case or not to case: evaluating casing methods for neural machine translation. In: Proceedings of the 12th language resources and evaluation conference, European language resources association, Marseille, France, pp 3752–3760. https://www.aclweb.org/anthology/2020.lrec-1.463
Federico M, Bentivogli L, Paul M, Stueker S (2011) Overview of the iwslt 2011 evaluation campaign. In: Proceedings of the international workshop on spoken language translation (IWSLT), pp 11–27
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43nd annual meeting of the association for computational linguistics (ACL 2005), pp 363–370
Freund Y, Schapire R (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37
Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Gage P (1994) A new algorithm for data compression. C Users J 12(2):23–38
Gale W, Parthasarathy S (2017) Experiments in character-level neural network models for punctuation. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 2794–2798
Gale W, Church K, Yarowsky D (1992) A method for disambiguating word senses in a large corpus. Comput Humanit 26:415–439
Gale W, Church KW, Yarowsky D (1994) Discrimination decisions for 100,000-dimensional spaces. In: Current issues in computational linguistics, pp 429–450
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17:1–35
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Adv Neural Inf Process Syst 3:2672–2680
Gravano A, Jansche M, Bacchiani M (2009) Restoring punctuation and capitalization in transcribed speech. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 4741–4744
Gupta NK, Bangalore S (2002) Extracting clauses for spoken language understanding in conversational systems. Proc Conf Empir Methods Natural Lang Process 10:273–280
Hahnloser RH, Sarpeshkar R, Mahowald MA, Douglas RJ, Seung HS (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405:947–951. https://doi.org/10.1038/35016072
Hakkani-Tur D, Tur G, Stolcke A, Shriberg E (1999) Combining words and prosody for information extraction from speech. In: Proceedings of the European Conference on Speech Communication and Technology, (EUROSPEECH)
Han X, Eisenstein J (2019) Unsupervised domain adaptation of contextualized embeddings for sequence labeling. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4229–4239
Harris ZS (1954) Distributional structure. WORD 10(2–3):146–162. https://doi.org/10.1080/00437956.1954.11659520
Hasan M, Doddipatla R, Hain T (2014) Multi-pass sentence-end detection of lecture speech. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, pp 2902–2906
Hasan M, Doddipatla R, Hain T (2015) Noise-matched training of crf based sentence end detection models. In: Proceedings of the 16th annual conference of the international speech communication association (INTERSPEECH), pp 349–353
Heinzerling B, Strube M (2018) Bpemb: Tokenization-free pretrained subword embeddings in 275 languages. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), European language resources association (ELRA), pp 2989–2993
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Huang J, Zweig G (2002) Maximum entropy model for punctuation annotation from speech. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 917–920
Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv:1508.01991 [cs.CL]
Ide N, Macleod C (2001) The American national corpus: a standardized resource for American English. Proc Corpus Linguist 2001:831–836
Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: Proceedings of the 12th IEEE international conference on computer vision, pp 2146–2153. https://doi.org/10.1109/ICCV.2009.5459469
Jones BEM (1994) Exploring the role of punctuation in parsing natural text. In: Proceedings of the 15th conference on computational linguistics—volume 1 (COLING ’94), Association for Computational Linguistics, pp 421–425
Jones DA, Wolf F, Gibson E, Williams E, Fedorenko E, Reynolds DA, Zissman M (2003) Measuring the readability of automatic speech-to-text transcripts. In: Proceedings of the 8th European conference on speech communication and technology (EUROSPEECH), pp 1585–1588
Juin CC, Wei RXJ, D’Haro LF, Banchs RE (2017) Punctuation prediction using a bidirectional recurrent neural network with part-of-speech tagging. In: Proceedings of the IEEE region 10 conference TENCON 2017, pp 1806–1811. https://doi.org/10.1109/TENCON.2017.8228151
Jurafsky D, Martin J (2008) Speech and language processing, 2nd edn. Prentice Hall, New York
Kaplan A (1950) An experimental study of ambiguity in context. Mech Transl 1:1–3
Kaufmann M, Kalita J (2010) Syntactic normalization of twitter messages. In: Proceedings of the international conference on natural language processing
Kim JH, Woodland PC (2000) A rule-based named entity recognition system for speech input. In: Proceedings lCSLP, pp S21–524
Kim JH, Woodland PC (2002) Implementation of automatic capitalisation generation systems for speech input. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp I–857–I–860. https://doi.org/10.1109/ICASSP.2002.5743874
Kim JH, Woodland PC (2004) Automatic capitalisation generation for speech input. Comput Speech Lang 18(1):67–90
Kim S (2019) Deep recurrent neural networks with layer-wise multi-head attentions for punctuation restoration. In: Proceedings of the 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP 2019), pp 7280–7284
Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: open-source toolkit for neural machine translation. arXiv:1701.02810
Klejch O, Bell P, Renals S (2016) Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches. In: Spoken language technology workshop, pp 433–440
Klejch O, Bell P, Renals S (2017) Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5700–5704
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. MT Summit 5:79–86
Kolář J, Lamel L (2012) Development and evaluation of automatic punctuation for French and English speech-totext. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 1376–1379
Kolář J, Švec J, Psutka J (2004) Automatic punctuation annotation in Czech broadcast news speech. In: Proceedings SPECOM, pp 319–325
Kompe R (1996) Prosody in speech understanding systems. Springer, Berlin
Kondratyuk D, Straka M (2019) 75 languages, 1 model: parsing universal dependencies universally. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Association for Computational Linguistics, pp 2779–2795
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmentation and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML ’01), Morgan Kaufmann Publishers Inc., pp 282–289
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies, pp 260–270
Li X, Lin E (2020) A 43 language multilingual punctuation prediction neural network model. Proc INTERSPEECH 2020:1067–1071
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings ICCV, pp 2980–2988
Ling W, Dyer C, Black AW, Trancoso I, Fermandez R, Amir S, Marujo L, Luìs T (2015) Finding function in form: compositional character models for open vocabulary word representation. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, pp 1520–1530
Lita LV, Ittycheriah A, Roukos S, Kambhatla N (2003) Truecasing. In: Proceedings of the 41st annual meeting on association for computational linguistics, pp 152–159
Liu Y, Shriberg E, Stolcke A, Peskin B, Ang J, Hillard D, Ostendorf M, Tomalin M, Woodland P, Harper M (2005a) Structural metadata research in the ears program. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP 2005), pp 957–960
Liu Y, Stolcke A, Shriberg E, Harper M (2005b) Using conditional random fields for sentence boundary detection in speech. In: Proceedings of ACL’05. https://doi.org/10.3115/1219840.1219896
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized Bert pretraining approach. arXiv:1907.11692 [cs.CL]
Lu W, Ng HT (2010) Better punctuation prediction with dynamic conditional random fields. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 177–186
MacIntyre R (1998) 1996 CSR hub4 language model. LDC98T31, Linguistic Data Consortium
Makhija K, Ho TN, Siong CE (2019) Transfer learning for punctuation prediction. In: Proceedings of the 2019 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 268–273
Makhoul J, Kubala F, Schwartz R, Weischede R (1999) Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop, pp 249–252
Makhoul J, Baron A, Bulyko I, Nguyen L, Ramshaw LA, Stallard D, Schwartz RM, Xiang B (2005) The effects of speech recognition and punctuation on information extraction performance. In: INTERSPEECH 2005—Eurospeech, 9th European conference on speech communication and technology, Lisbon, Portugal, September 4–8, 2005, ISCA, pp 57–60. http://www.isca-speech.org/archive/interspeech_2005/i05_0057.html
Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The Stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60
Marcus MP, Santorini B, Marcinkiewicz MA, Taylor A (1999) Treebank-3. LDC99T42, linguistic data consortium
Markwardt AH (1942) Introduction to the English language. Oxford University Press, New York
Masterson M (1967) Mechanical pidgin translation. Wiley, Hoboken
Matusov E, Mauser A, Ney H (2006) Automatic sentence segmentation and punctuation prediction for spoken language translation. In: Proceedings of IWSLT, pp 158–165
Mayhew S, Tsygankova T, Roth D (2019) ner and pos when nothing is capitalized. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, pp 6256–6261. https://doi.org/10.18653/v1/D19-1650, https://www.aclweb.org/anthology/D19-1650
Michel JB, Shen YK, Aiden AP, Veres A, Gray MK, Brockman W, Team TGB, Pickett JP, Hoiberg D, Clancy D, Norvig P, Orwant J, Pinker S, Nowak MA, Aiden EL (2011) Quantitative analysis of culture using millions of digitized books. Science 331:176–182. https://doi.org/10.1126/science.1199644
Mikheev A (1999) A knowledge-free method for capitalized word disambiguation. In: Proceedings of the annual meeting of the association for computational linguistics, pp 159–166
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs.CL]
Miller D, Boisen S, Schwartz R, Stone R, Weischedel R (2000) Named entity extraction from noisy input: speech and OCR. In: Proceedings of the sixth conference on applied natural language processing, pp 316–324
Moniz H, Batista F, Meinedo H, Abad A, Trancoso I, Mata AI, Mamede N (2010) Prosodically-based automatic segmentation and punctuation. In: Proceedings of speech prosody 2010, p 910
Mota C (2008) How to keep up with language dynamics? A case study on named entity recognition. Ph.D. thesis, IST/UTL
Mota C, Grishman R (2008) Is this ne tagger getting old? In: Proceedings of the sixth international conference on language resources and evaluation (LREC’08), European Language Resources Association (ELRA), pp 28–30
Nanchen A, Garner PN (2019) Empirical evaluation and combination of punctuation prediction models applied to broadcast news. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7275–7279
Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69. https://doi.org/10.1145/1459352.1459355
Nebhi K, Bontcheva K, Gorrell G (2015) Restoring capitalization in #tweets. In: Proceedings of WWW companion, pp 1111–1115
Nguyen B, Nguyen VBH, Nguyen H, Phuong PN, Nguyen TL, Do QT, Mai LC (2019) Fast and accurate capitalization and punctuation for automatic speech recognition using transformer and chunk merging. In: Proceedings of the 2nd conference of the oriental COCOSDA international committee for the co-ordination and standardisation of speech databases and assessment techniques (O-COCOSDA), pp 1–5. https://doi.org/10.1109/O-COCOSDA46868.2019.9041202
Niesler T, Woodland P (1996) A variable-length category-based ngram language model. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing ICASSP-96, vol 1, pp 164–167
Nivre J (2004) Incrementality in deterministic dependency parsing. In: Proceedings of the workshop on incremental parsing: bringing engineering and cognition together, Association for Computational Linguistics, pp 50–57
Nunberg G (1990) The linguistics of punctuation. In: CSLI lecture notes, p 18
Nöth E, Batliner A, Kießling A, Kompe R, Niemann H (1999) Suprasegmental modelling. In: Computational models of speech pattern processing, NATO ASI series (Series F: computer and systems sciences), vol 169, pp 181–198
Ostendorf M, Favre B, Grishman R, Hakkani-Tür D, Harper M, Hillard D, Hirschberg J, Ji H, Kahn JG, Liu Y, Maskey S, Matusov E, Ney H, Rosenberg A, Shriberg E, Wang W, Wooters C (2008) Speech segmentation and spoken document processing. IEEE Signal Process Mag 25:59–69
Pahuja V, Laha A, Mirkin S, Raykar V, Kotlerman L, Lev G (2017) Joint learning of correlated sequence labeling tasks using bidirectional recurrent neural networks. Proc Interspeech 2017:548–552
Pallett D, Fiscus J, Garofolo J, Martin A, Przybocki M (2000) 1998 broadcast news benchmark test results: English and non-English word error rate performance measures. In: DARPA broadcast news transcription and understanding workshop
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: A method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics (ACL 2002), Association for Computational Linguistics, pp 311–318
Pauls A, Klein D (2011) Faster and smaller n-gram language models. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies—volume 1 (HLT ’11), Association for Computational Linguistics, pp 258–267
Peitz S, Freitag M, Mauser A, Ney H (2011) Modeling punctuation prediction as machine translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), pp 238–245
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Petasis G, Vichot F, Wolinski F, Paliouras G, Karkaletsis V, Spyropoulos CD (2001) Using machine learning to maintain rule-based named-entity recognition and classification systems. In: Proceedings of the 39th annual meeting on association for computational linguistics, association for computational linguistics, pp 426–433
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations,. In: Proceedings of NAACL 2018
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlıcek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The kaldi speech recognition toolkit. In: Proceedings of the IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE Signal Processing Society, pp 1–4
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog
Ramena G, Nagaraju D, Moharana S, Mohanty DP, Purre N (2020) An efficient architecture for predicting the case of characters using sequence models. arXiv:abs/2002.00738
Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. In: Brill E, Church K (eds) Proceedings of the conference on empirical methods in natural language processing, pp 133–142
Rayson SJ, Hachamovitch DJ, Kwatinetz AL, Hirsch SM (1998) Autocorrecting text typed into a word processing document. U.S. patent 5761689
Rei R, Guerreiro NM, Batista F (2020) Automatic truecasing of video subtitles using Bert: a multilingual adaptable approach. Inf Process Manag Uncertain Knowl-Based Syst 1237:708–721
Romero V, Sánchez JA (2013) Category-based language models for handwriting recognition of marriage license books. In: Proceedings of the 12th international conference on document analysis and recognition, pp 788–792. https://doi.org/10.1109/ICDAR.2013.161
Rosenfeld R (2000) Two decades of statistical language modeling: where do we go from here? Proc IEEE 88:1270–1278
Ruhlen H, Pressey SL (1923) A statistical study of current usage in punctuation. Educ Res Bull 2(12):179–182
Sadat F, Johnson H, Agbago A, Foster G, Kuhn R, Martin J, Tikuisis A (2005) Portage: A phrase-based machine translation system. In: Proceedings of the ACL workshop on building and using parallel texts, pp 129–132
Salimbajevs A (2016) Bidirectional lstm for automatic punctuation restoration. In: Human language technologies-the baltic perspective: proceedings of the seventh international conference Baltic HLT 2016, vol 289, pp 59–65
Salloum W, Finley G, Edwards E, Miller M, Suendermann-Oeft D (2017) Deep learning for punctuation restoration in medical reports. Proc BioNLP 2017:159–164
Sanchez G (2019) Sentence boundary detection in legal text. In: Proceedings of the natural legal language processing workshop, Association for Computational Linguistics, pp 31–38
Savelka J, Walker VR, Grabmair M, Ashley KD (2017) Sentence boundary detection in adjudicatory decisions in the united states. Traitement Automatique des langues 58(2):21–45
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336
Schapire RE, Singer Y (2000) Boostexter: A boosting-based system for text categorization. Mach Learn 39:135–168
Schukat-Talamazzini EG (1995) Stochastic language models. In: Electrotechnical and computer science conference
Schukat-Talamazzini EG, Gallwitz F, Harbeck S, Warnke V (1997) Rational interpolation of maximum likelihood predictors in stochastic language modeling. In: Proceedings of the fifth European conference on speech communication and technology (EUROSPEECH), pp 2731–2734
Seide F, Li G, Chen X, Yu D (2011) Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of IEEE workshop on automatic speech recognition and understanding (ASRU), pp 24–29
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), Association for Computational Linguistics, pp 1715–1725
Shawar BA, Atwell E (2007) Chatbots: are they really useful? LDV Forum 22(1):29–49
Shriberg E, Bates R, Stolcke A (1997) A prosody-only decision-tree model for disfluency detection. In: Proceedings of the European conference on speech communication and technology, (EUROSPEECH), vol 5, pp 2383–2386
Siivola V, Pellom BL (2005) Growing an n-gram language model. In: Proceedings of INTERSPEECH-2005, pp 1309–1312
Soboleva D, Skopek O, Šajgalìk M, Cărbune V, Weissenberger F, Proskurnia J, Prisacari B, Valcarce D, Lu J, Prabhavalkar R, Miklos B (2020) Replacing human audio with synthetic audio for on-device unspoken punctuation prediction. arXiv:2010.10203 [cs.LG]
Song HJ, Kim HK, Kim JD, Park CY, Kim YS (2019) Inter-sentence segmentation of Youtube subtitles using long-short term memory (LSTM). Appl Sci 9(7):1504. https://doi.org/10.3390/app9071504
Spitkovsky VI, Alshawi H, Jurafsky D (2011) Punctuation: making a point in unsupervised dependency parsing. In: Proceedings of the fifteenth conference on computational natural language learning (CoNLL), pp 19–28
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Stolcke A, Shriberg E (1996) Statistical language modeling for speech disfluencies. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol 1, pp 405–408
Stolcke A, Shriberg E (1997) Automatic linguistic segmentation of conversational speech. In: Proceedings of EuroSpeech 97, vol 2, pp 1005–1008
Stolcke A, Shriberg E, Bates R, Ostendorf M, Hakkani D, Plauche M, Tur G, Lu Y (1998) Automatic detection of sentence boundaries and disfluencies based on recognized words. In: Proceedings of the international conference on spoken language processing, pp 2247–2250
Sunkara M, Ronanki S, Dixit K, Bodapati S, Kirchhoff K (2020) Robust prediction of punctuation and truecasing for medical asr. In: Proceedings of the first workshop on natural language processing for medical conversations, Association for Computational Linguistics, pp 53–62
Susanto RH, Chieu HL, Lu W (2016) Learning to capitalize with character-level recurrent neural networks: an empirical study. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 2090–2095
Sutskever I, Vinyals O, Le Q (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th international conference on neural information processing systems (NIPS), MIT Press, vol 2, pp 3104–3112
Sutton C, McCallum A, Rohanimanesh K (2007) Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data. J Mach Learn Res 8:693–723
Szaszak G, Tündik M (2019) Leveraging a character, word and prosody triplet for an asr error robust and agglutination friendly punctuation approach. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 2988–2992
Tilk O, Alumäe T (2015) Lstm for punctuation restoration in speech transcripts. In: Proceedings of the Sixteenth annual conference of the international speech communication association (INTERSPEECH), pp 683–687
Tilk O, Alumäe T (2016) Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 3047–3051
Todorovic BT, Rancic SR, Markovic IM, Mulalic EH, Ilic VM (2008) Named entity recognition and classification using context hidden markov model. In: Proceedings of the 9th symposium on neural network applications in electrical engineering proceedings of the 9th symposium on neural network applications in electrical engineering, pp 43–46. https://doi.org/10.1109/NEUREL.2008.4685557
Tomita T, Okimoto Y, Yamamoto H, Sagisaka Y (2005) Speech recognition of a named entity. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP ’05), vol 1, pp I/1057–I/1060. https://doi.org/10.1109/ICASSP.2005.1415299
Tündik M, Szaszak G (2018) Joint word-and character-level embedding CNN-RNN models for punctuation restoration. In: Proceedings of the 9th IEEE international conference on cognitive infocommunications (CogInfoCom). https://doi.org/10.1109/CogInfoCom.2018.8639876
Tündik M, Szaszák G, Gosztolya G, Beke A (2018) User-centric evaluation of automatic punctuation in ASR closed captioning. Proc Interspeech 2018:2628–2632
Ákos Tündik M, Kaszás V, Szaszák G (2019) Assessing the semantic space bias caused by ASR error propagation and its effect on spoken document summarization. In: Proceedings of interspeech 2019, pp 1333–1337. https://doi.org/10.21437/Interspeech.2019-2154
Ueffing N, Bisani M, Vozila P (2013) Improved models for automatic punctuation prediction for spoken and written text. In: Proceedings of the 13th annual conference of the international speech communication association (INTERSPEECH), pp 3097–3101
Vaissiere J (1983) Language-independent prosodic features. Springer, Berlin, pp 53–66
Vandeghinste V, Verwimp L, Pelemans J, Wambacq P (2018) A comparison of different punctuation prediction approaches in a translation context. In: Proceedings of the 21st annual conference of the European association for machine translation, pp 269–278
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Łukasz Kaiser, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems (NIPS), pp 5998–6008
Vaswani A, Bengio S, Brevdo E, Chollet F, Gomez AN, Gouws S, Jones L, Łukasz Kaiser, Kalchbrenner N, Parmar N, Sepassi R, Shazeer N, Uszkoreit J (2018) Tensor2tensor for neural machine translation. arXiv:1803.07416 [cs.LG]
Vāravs A, Salimbajevs A (2018) Restoring punctuation and capitalization using transformer models. In: Proceedings of the international conference on statistical language and speech processing, pp 91–102
Wang F, Chen W, Yang Z, Xu B (2018) Self-attention based network for punctuation restoration. In: Proceedings of the 24th international conference on pattern recognition (ICPR), pp 2803–2808
Wang T, Cho K (2015) Larger-context language modelling. arXiv:1511.03729
Wang W, Knight K, Marcu D (2006) Capitalizing machine translation. In: Proceedings of the main conference on human language technology conference of the North American chapter of the Association of Computational Linguistics, ACM, pp 1–8
Wang X, Ng HT, Sim KC (2012) Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: Proceedings of the 13th annual conference of the international speech communication association (INTERSPEECH), pp 1382–1385
Warnke V, Kompe R, Niemann H, Noth E (1997) Integrated dialog act segmentation and classification using prosodic features and language models. In: Proceedings of the fifth European conference on speech communication and technology (EUROSPEECH), pp 207–210
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Drame M, Lhoest Q, Rush AM (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv:1910.03771 [cs.CL]
Yang J, Zhang Y (2018) Ncrf++: an open-source neural sequence labeling toolkit. In: Proceedings of ACL 2018, system demonstrations, pp 74–79
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 32:5753–5763
Yi J, Tao J (2019) Self-attention based model for punctuation prediction using word and speech embeddings. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7270–7274
Yi J, Tao J, Wen Z, Li Y (2017) Distilling knowledge from an ensemble of models for punctuation prediction. In: Proceedings of the annual conference of the international speech communication association (INTERSPEECH), pp 2779–2783
Yi J, Tao J, Bai Y, Tian Z, Fan C (2020a) Adversarial transfer learning for punctuation restoration. arXiv:2004.00248 [cs.CL]
Yi J, Tao J, Tian Z, Bai Y, Fan C (2020b) Focal loss for punctuation prediction. Proc Interspeech 2020:721–725
Zens R, Ney H (2008) Improvements in dynamic programming beam search for phrase-based statistical machine translation. In: Proceedings of the international workshop on spoken language translation, pp 195–205
Zhao Y, Xue J, Chen X (2015) Ensemble learning approaches in speech recognition. Springer, New York, pp 113–152
Öktem A, Farrús M, Wanner L (2017) Attentional parallel RNNS for generating punctuation in transcribed speech. In: Proceedings of the 5th international conference statistical language and speech processing, pp 131–142
Żelasko P, Szymański P, Mizgajski J, Szymczak A, Carmiel Y, Dehak N (2018) Punctuation prediction model for conversational speech. arXiv:1807.00543
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Păiş, V., Tufiş, D. Capitalization and punctuation restoration: a survey. Artif Intell Rev 55, 1681–1722 (2022). https://doi.org/10.1007/s10462-021-10051-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-10051-x