Linguistic Resources Construction: Towards Disfluency Processing in Spontaneous Tunisian Dialect Speech

Boughariou, Emna; Bahou, Younès; Bleguith, Lamia Hadrich

doi:10.1007/978-3-030-27947-9_27

Emna Boughariou⁹,
Younès Bahou¹⁰ &
Lamia Hadrich Bleguith⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11697))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

800 Accesses
3 Citations

Abstract

The Tunisian Dialect (TD) is an under-resourced language which lacks both corpora and Natural Language Processing (NLP) tools despite being increasingly used in spoken and written forms. In this paper, we presented our endeavour to build linguistic resources for TD in order to process disfluencies. First, we created the Disfluencies Corpus from Tunisian Arabic Transcriptions (DisCoTAT), which is a set of manual transcriptions with several disfluency phenomena. Also, we constructed the Tunisian Dialect Wordnet (TD-WordNet) from existing TD lexicons to annotate words with morpho-syntactic tags. Then, we developed the Disfluency Annotation Tool (DisAnT) in order to annotate DisCoTAT. DisAnT provides two levels of annotation: morpho-syntactic tagging and disfluency annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We used the Buckwalter transliteration.
2.
https://www.fichier-pdf.fr/2010/08/31/m14401m/dico-karmous.pdf.
3.
http://www.arabetunisien.com/.
4.
https://files.eric.ed.gov/fulltext/ED183017.pdf.
5.
https://fieldsupport.dliflc.edu/productList.aspx?v=lsk.
6.
https://www.happyscribe.co/.
7.
A city located in the center of Tunisia.

References

Abbassi, H., Bahou, Y., Maaloul, M.H.: L’apport d’une approche hybride dans la compréhension de l’oral arabe spontané. In: 29th of Proceedings of International Business Information Management Association, pp. 2145–2157. Vienna, Austria, May 2017
Google Scholar
Ben Ahmed, Y.: Constitution d’un corpus d’arabe tunisien parlé à orléans. In: Actes des 9éme Journées Internationales de la Linguistique de corpus, p. 173 (2017)
Google Scholar
Ben Ltaief, A., Estève, Y., Graja, M., Belguith Hadrich, L.: Automatic speech recognition for Tunisian Dialect. In: Proceedings of the First Conference on Language Processing and Knowledge Management, LPKM 2017. Kerkennah (Sfax), Tunisia, September 2017
Google Scholar
Bouchlaghem, R., Elkhlifi, A., Faiz, R.: Tunisian dialect wordnet creation and enrichment using web resources and other wordnets. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing, pp. 104–113 (2014)
Google Scholar
Boughariou, E., Bahou, Y., Maaloul, M.H.: Application d’une méthode numérique à base d’apprentissage pour la segmentation conceptuelle de l’oral arabe spontané. In: 29th of Proceedings of International Business Information Management Association, pp. 2820–2835. Vienna, Austria, May 2017
Google Scholar
Boujelbane, R., Khemekhem Ellouze, M., Béchet, F., Belguith Hadrich, L.: De l’arabe standard vers l’arabe dialectal: projection de corpus et ressources linguistiques en vue du traitement automatique de l’oral dans les médias tunisiens. In: Revue TAL (2015)
Google Scholar
Boujelbane, R., Khemekhem Ellouze, M., Ben Ayed, S., Belguith Hadrich, L.: Building bilingual lexicon to create Dialect Tunisian corpora and adapt language model. In: Proceedings of the Second Workshop on Hybrid Approaches to Translation, pp. 88–93 (2013)
Google Scholar
Boujelbane, R., Zribi, I., Kharroubi, S., Khemekhem Ellouze, M.: An automatic process for Tunisian Arabic orthography normalization (2016)
Google Scholar
Christodoulides, G., Avanzi, M., Goldman, J.P.: DisMo: a morphosyntactic, disfluency and multi-word unit annotator. an evaluation on a corpus of french spontaneous and read speech. arXiv preprint. arXiv:1802.02926 (2018)
Graja, M., Jaoua, M., Belguith Hadrich, L.: Lexical study of a spoken dialogue corpus in Tunisian dialect. In: The International Arab Conference on Information Technology. Benghazi, Libya (2010)
Google Scholar
Habash, N., Diab, M.T., Rambow, O.: Conventional orthography for dialectal Arabic. In: LREC, pp. 711–718 (2012)
Google Scholar
Hamdi, A., Boujelbane, R., Habash, N., Nasr, A.: Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde. In: Traitement Automatique des Langues Naturelles, pp. 396–406 (2013)
Google Scholar
Hamdi, A., Nasr, A., Habash, N., Gala, N.: POS-tagging of tunisian dialect using standard Arabic resources and tools. In: Workshop on Arabic Natural Language Processing, pp. 59–68 (2015)
Google Scholar
Karoui, J., Graja, M., Boudabous, M.M., Belguith Hadrich, L.: Domain ontology construction from a Tunisian spoken dialogue corpus. In: International Conference on Web and Information Technologies (2013)
Google Scholar
Labiadh, M., Bahou, Y., Maaloul, M.H.: Complex disfluencies processing in spontaneous Arabic speech. In: Language Processing and Knowledge Management International Conference, LPKM 2018 (2018)
Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The penn Arabic treebank: building a large-scale annotated Arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, vol. 27, Cairo, Egypt. pp. 466–467 (2004)
Google Scholar
Masmoudi, A., Khmekhem, M.E., Esteve, Y., Belguith Hadrich, L., Habash, N.: A corpus and phonetic dictionary for Tunisian Arabic speech recognition. In: LREC. pp. 306–310 (2014)
Google Scholar
Moussa, N.K.B., Soussou, H., Alimi, Adel, M.: Tunisian arabic aeb wordnet: current state and future extensions. In: First International Conference on Arabic Computational Linguistics (ACLing), pp. 3–8 (2015)
Google Scholar
Neifar, W., Bahou, Y., Graja, M., Jaoua, M.: Implementation of a symbolic method for the Tunisian dialect understanding. In: Proceedings of 5th International Conference on Arabic Language Processing. Oujda, Maroc, November 2014
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit, Tech. rep. IEEE Signal Processing Society (2011)
Google Scholar
Rasooli, M.S., Tetreault, J.: Joint parsing and disfluency detection in linear time. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 124–129 (2013)
Google Scholar
Shriberg, E.E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California, Berkeley (1994)
Google Scholar
Zayats, V., Ostendorf, M., Hajishirzi, H.: Disfluency detection using a bidirectional LSTM. arXiv preprint. arXiv:1604.03209 (2016)
Zribi, I., Boujelbane, R., Masmoudi, A., Khemekhem Ellouze, M., Belguith Hadrich, L., Habash, N.: A conventional orthography for Tunisian Arabic. In: LREC, pp. 2355–2361 (2014)
Google Scholar
Zribi, I., Kammoun, I., Khemekhem Ellouze, M., Belguith Hadrich, L., Blache, P.: Sentence boundary detection for transcribed Tunisian Arabic. In: Bochumer Linguistische Arbeitsberichte, pp. 223–231 (2016)
Google Scholar
Zribi, I., Khemekhem Ellouze, M., Belguith Hadrich, L.: Morphological analysis of Tunisian dialect. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 992–996 (2013)
Google Scholar
Zribi, I., Khemekhem Ellouze, M., Belguith Hadrich, L., Blache, P.: Spoken Tunisian Arabic corpus “STAC”: transcription and annotation. Res. Comput. Sci. 90, 123–135 (2015)
Google Scholar
Zribi, I., Khemekhem Ellouze, M., Belguith Hadrich, L., Blache, P.: Morphological disambiguation of Tunisian dialect, pp. 147–155 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Economics and Management of Sfax, University of Sfax, Sfax, Tunisia
Emna Boughariou & Lamia Hadrich Bleguith
Hail University, Hail, Kingdom of Saudi Arabia
Younès Bahou

Authors

Emna Boughariou
View author publications
You can also search for this author in PubMed Google Scholar
Younès Bahou
View author publications
You can also search for this author in PubMed Google Scholar
Lamia Hadrich Bleguith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emna Boughariou .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boughariou, E., Bahou, Y., Bleguith, L.H. (2019). Linguistic Resources Construction: Towards Disfluency Processing in Spontaneous Tunisian Dialect Speech. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-27947-9_27
Published: 06 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27946-2
Online ISBN: 978-3-030-27947-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics