ABSTRACT
Computer processing of written Arabic raises a number of challenges to traditional parsing architectures on many levels of linguistic analysis. In this contribution, we review some of these core issues and the demands they make, to suggest different strategies to successfully tackle them. In the end, we assess these issues in connection with the behaviour of neuro-biologically inspired lexical architectures known as Temporal Self-Organising Maps. We show that, far from being language-specific problems, issues in Arabic processing can shed light on some fundamental characteristics of the human language processor, such as structure-based lexical recoding, concurrent, competitive activation of output candidates and dynamic selection of optimal solutions.
- Tsarfaty, R., Seddah D., Kubler S., and Nivre J. 2013. Parsing Morphologically Rich Languages: Introduction to the Special Issue. Computational Linguistics 39, 1, 15--22. Google ScholarDigital Library
- Dichy, J. 1997. Pour une lexicomatique de l'arabe: l'unité lexicale simple et l'inventaire fini des spécificateurs du domaine du mot. Meta 42, 2, 291--306.Google ScholarCross Ref
- Jackendoff, R. 2002. Foundations of language. Brain, Meaning, Grammar, Evolution. Oxford University Press, New York.Google Scholar
- Nahli, O. 2013. Computational contributions for Arabic language processing. The automatic morphologic analysis of Arabic texts. In Studia graeco-arabica 3, 195--206.Google Scholar
- Internet Archive: http://archive.org/.Google Scholar
- Google Books: http://books.google.com/.Google Scholar
- Dichy, J. and Kanoun S., Eds. 2013. Linguistic Knowledge integration in optical Arabic word and text recognition process. Linguistica Communicatio 15, 1--2.Google Scholar
- Märgner, V. and El Abed H. 2012. Guide to OCR for Arabic scripts. Springer, London. Google ScholarDigital Library
- Boschetti, F., Romanello M., Babeu A., Bamman D., and Crane G. 2009. Improving OCR accuracy for classical critical editions. In Proceedings of the 13th European conference on Research and advanced technology for digital libraries (ECDL'09), M. Agosti, J. Borbinha, S. Kapidakis, C. Papatheodorou and G. Tsakonas, Eds. Springer-Verlag, Berlin, Heidelberg, 156--167. Google ScholarDigital Library
- Tesseract/Cube: http://code.google.com/p/tesseract-ocr/.Google Scholar
- Lasri, Y. 2014. Contribution à la reconnaissance optique (OCR) du texte arabe imprimé, Fès: Université "Sidi Mohamed Ben Abdellah" de Fès, MA Thesis.Google Scholar
- Boschetti, F. 2013. Acquisizione e Creazione di Risorse Plurilingui per gli Studi di Filologia Classica in Ambienti Collaborativi. In. Collaborative Research Practices and Shared Infrastructures for Humanities Computing, M. Agosti and F. Tomasi, Eds. Proceedings of Revised Papers AIUCD 2013 (Padua, Italy, December 11--12, 2013), 55--67.Google Scholar
- Del Gratta, R. and Nahli, O. 2014. Enhancing Arabic WordNet with the use of Princeton WordNet and a bilingual dictionary. IEEE - CiST14 Colloquium on Information Science and Technology - ANLP Invited Session.Google Scholar
- Fellbaum, C., Ed. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge, MA.Google Scholar
- Sagot, B. and Fišer D. 2011. Extending Wordnets by learning from multiple resources In LTC'11: 5th Language and Technology Conference, Poznań, Poland.Google Scholar
- Rodríguez, H., Farwell, D., Farreres, J., Bertran, M., Martí, M.A., Black, W., Elkateb, S., Kirk, J., Vossen, P., and Fellbaum, C. 2008. Arabic Wordnet: Current State and Future Extensions. In Proceedings of the Fourth International Global WordNet - Conference, 387--406.Google Scholar
- Vossen, P., Ed., 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Norwell, MA, USA. Google ScholarDigital Library
- Fellbaum, C., Alkhalifa, M., Black, W. J., Elkateb, S., Pease, A., Rodríguez, H., and Vossen, P. 2006. Building a WordNet for Arabic. In Proceedings of the 5th Conference on Language Resources and Evaluation (ELRA - LREC 2006, Genova), 29--34.Google Scholar
- Boschetti, F., Del Gratta, R., and Lamè, M. 2014. Computer Assisted Annotation of Themes and Motifs in Ancient Greek Epigrams: First Steps. In Proceedings of CLIC, Computational Linguistics Italian Conference. Pisa, Italy.Google Scholar
- Blevins, J.P. 2006. Word-based morphology. Journal of Linguistics 42, 531--573.Google ScholarCross Ref
- Ferro, M., Pezzulo, G., and Pirrelli, V. 2010. Morphology, Memory and the Mental Lexicon. In Lingue e Linguaggio, vol. IX(2), Interdisciplinary aspects to understanding word processing and storage, V. Pirrelli, Ed. Il Mulino, Bologna, 199--238.Google Scholar
- Pirrelli, V., Ferro, M., and Calderone, B. 2011. Learning paradigms in time and space. Computational evidence from Romance languages. In Morphological Autonomy: Perspectives from Romance Inflectional Morphology, M. Goldbach, M. O. Hinzelin, M. Maiden, and J.C. Smith, Eds. Oxford University Press, Oxford, 135--157.Google Scholar
- Marzi, C., Ferro, M., and Pirrelli, V. 2012. Word alignment and paradigm induction. Lingue e Linguaggio XI, 2, 251--274.Google Scholar
- Marzi, C., Ferro, M., and Pirrelli, V. 2014. Morphological structure through lexical parsability. Lingue e Linguaggio XIII, 2, 263--290.Google Scholar
- Henson, R. N. A. 1999. Coding position in short-term memory. International Journal of Psychology 34, 5--6, 403--409.Google ScholarCross Ref
- Davis, C. J. 2010. The spatial coding model of visual word identification. Psychological Review 117, 3, 713--758.Google ScholarCross Ref
- Davis, C. J. and Bowers, J. S. 2004. What do letter migration errors reveal about letter position coding in visual word recognition? Journal of Experimental Psychology: Human Perception and Performance 30, 923--941.Google ScholarCross Ref
- Halle, M. and Marantz, A. 1993. Distributed Morphology and the pieces of inflection. In The view from building 20, K. Hale and S. J. Keyser, Eds. MIT Press, Cambridge, MA, 111--176.Google Scholar
- Embick, D. and Halle, M. 2005. On the Status of stems in morphological theory. In Romance Languages and Linguistics Theory 2003, T. Geerts, I. van Ginneken, and H. Jacobs, Eds. John Benjamins, Amsterdam, 37--62.Google Scholar
- Marzi, C. 2014. Models and dynamics of the morphological lexicon in mono- and bilingual acquisition. PhD unpublished dissertation. University of Pavia.Google Scholar
- Marzi, C., Ferro, M., Caudai, C. and Pirrelli, V. 2012. Evaluating Hebbian Self-Organizing Memories for Lexical representation and Access. Proceedings of 8th International Conference on Language Resources and Evaluation, (ELRA - LREC 2012, Malta), 886--893.Google Scholar
- Marzi, C., Nahli, O., and Ferro, M. 2014. Word Processing for Arabic Language. IEEE - CiST14 Colloquium on Information Science and Technology - ANLP Invited Session.Google Scholar
- Hickok, G.M., and Poeppel, D. 2004. Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92, 67--99.Google ScholarCross Ref
- D'Esposito, M. 2007. From cognitive to neural models of working memory. Philosophical Transactions of the Royal Society B: Biological Sciences 362, 761--772.Google ScholarCross Ref
- Saur, D., Kreher, B.W., Schnell, S., Kümmerer, D., Kellmeyer, P., Vry, M.-S., Umarova, R., Musso, M., Glauche, V., Abel, S., Huber, W., Rijntjes, M., Hennig, J., and Weiller, C. 2008. Ventral and dorsal pathways for language. Proc. Nat. Academy of Sciences 105, 46, 18035--18046.Google ScholarCross Ref
- Forkel, S. J., Thiebaut de Schotten, M., Dell'Acqua, F., Kalra, L., Murphy, D.G.M., Williams, S.C.R., and Catani, M. 2014. Anatomical predictors of aphasia recovery: a tractography study of bilateral perisylvian language networks. Brain 137, 2027--2039.Google ScholarCross Ref
- Ma, W. J., Husain, M., and Bays, P.M. 2014. Changing concepts of working memory. Nature Neuroscience 17, 3, 347--356.Google ScholarCross Ref
- Libben, G. 2005. Everything is psycholinguistics: Material and methodological considerations in the study of compound processing. Canadian Journal of Linguistics 50, 267--283.Google ScholarCross Ref
- Baayen, R.H. 2007. Storage and computation in the mental lexicon. In The Mental Lexicon: Core Perspectives, G. Jarema, G. Libben, Eds. Elsevier, 81--104.Google Scholar
- Luce, P., Pisoni, D., and Goldinger, S.D. 1990. Similarity neighborhoods of spoken words. In Cognitive models of speech pro-cessing: Psycholinguistic and computational perspectives, G.T.M. Altmann, Ed. MIT Press, Cambridge, MA, 122--147. Google ScholarDigital Library
- Huntsman, L.A. and Lima, S.D. 2002. Orthographic Neighbors and Visual Word Recognition. Journal of Psycholinguistic Research 31, 289--306.Google ScholarCross Ref
- Goldrick, M., Folk, J. R., and Rapp, B. 2010. Mrs. Malaprop's neighborhood: Using word errors to reveal neighborhood structure. Journal of Memory and Language 62, 2, 113--13.Google ScholarCross Ref
Index Terms
- Computational Linguistics and Language Physiology: Insights from Arabic NLP and Cooperative Editing
Recommendations
Hebrew Computational Linguistics: Past and Future
This paper reviews the current state of the art in Natural Language Processing for Hebrew, both theoretical and practical. The Hebrew language, like other Semitic languages, poses special challenges for developers of programs for natural language ...
Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair
Semantic word similarity is a quantitative measure of how much two words are contextually similar. Evaluation of semantic word similarity models requires a benchmark corpus. However, despite the millions of speakers and the large digital text of the Urdu ...
Data-driven synset induction and disambiguation for wordnet development
Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and ...
Comments