research-article

Computational Linguistics and Language Physiology: Insights from Arabic NLP and Cooperative Editing

Authors:
Vito Pirrelli

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy
View Profile

,
Ouafae Nahli

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy
View Profile

,
Federico Boschetti

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy
View Profile

,
Riccardo Del Gratta

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy
View Profile

,
Claudia Marzi

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy

Institute of Computational Linguistics "A. Zampolli", CNR, via Moruzzi 1, Pisa, Italy
View Profile

AIUCD '14: Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital EcosystemSeptember 2014Article No.: 12Pages 1–8https://doi.org/10.1145/2802612.2802637

Published:18 September 2014Publication History

AIUCD '14: Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital Ecosystem

Pages 1–8

ABSTRACT

Computer processing of written Arabic raises a number of challenges to traditional parsing architectures on many levels of linguistic analysis. In this contribution, we review some of these core issues and the demands they make, to suggest different strategies to successfully tackle them. In the end, we assess these issues in connection with the behaviour of neuro-biologically inspired lexical architectures known as Temporal Self-Organising Maps. We show that, far from being language-specific problems, issues in Arabic processing can shed light on some fundamental characteristics of the human language processor, such as structure-based lexical recoding, concurrent, competitive activation of output candidates and dynamic selection of optimal solutions.

References

Tsarfaty, R., Seddah D., Kubler S., and Nivre J. 2013. Parsing Morphologically Rich Languages: Introduction to the Special Issue. Computational Linguistics 39, 1, 15--22. Google ScholarDigital Library
Dichy, J. 1997. Pour une lexicomatique de l'arabe: l'unité lexicale simple et l'inventaire fini des spécificateurs du domaine du mot. Meta 42, 2, 291--306.Google ScholarCross Ref
Jackendoff, R. 2002. Foundations of language. Brain, Meaning, Grammar, Evolution. Oxford University Press, New York.Google Scholar
Nahli, O. 2013. Computational contributions for Arabic language processing. The automatic morphologic analysis of Arabic texts. In Studia graeco-arabica 3, 195--206.Google Scholar
Internet Archive: http://archive.org/.Google Scholar
Google Books: http://books.google.com/.Google Scholar
Dichy, J. and Kanoun S., Eds. 2013. Linguistic Knowledge integration in optical Arabic word and text recognition process. Linguistica Communicatio 15, 1--2.Google Scholar
Märgner, V. and El Abed H. 2012. Guide to OCR for Arabic scripts. Springer, London. Google ScholarDigital Library
Boschetti, F., Romanello M., Babeu A., Bamman D., and Crane G. 2009. Improving OCR accuracy for classical critical editions. In Proceedings of the 13th European conference on Research and advanced technology for digital libraries (ECDL'09), M. Agosti, J. Borbinha, S. Kapidakis, C. Papatheodorou and G. Tsakonas, Eds. Springer-Verlag, Berlin, Heidelberg, 156--167. Google ScholarDigital Library
Tesseract/Cube: http://code.google.com/p/tesseract-ocr/.Google Scholar
Lasri, Y. 2014. Contribution à la reconnaissance optique (OCR) du texte arabe imprimé, Fès: Université "Sidi Mohamed Ben Abdellah" de Fès, MA Thesis.Google Scholar
Boschetti, F. 2013. Acquisizione e Creazione di Risorse Plurilingui per gli Studi di Filologia Classica in Ambienti Collaborativi. In. Collaborative Research Practices and Shared Infrastructures for Humanities Computing, M. Agosti and F. Tomasi, Eds. Proceedings of Revised Papers AIUCD 2013 (Padua, Italy, December 11--12, 2013), 55--67.Google Scholar
Del Gratta, R. and Nahli, O. 2014. Enhancing Arabic WordNet with the use of Princeton WordNet and a bilingual dictionary. IEEE - CiST14 Colloquium on Information Science and Technology - ANLP Invited Session.Google Scholar
Fellbaum, C., Ed. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge, MA.Google Scholar
Sagot, B. and Fišer D. 2011. Extending Wordnets by learning from multiple resources In LTC'11: 5th Language and Technology Conference, Poznań, Poland.Google Scholar
Rodríguez, H., Farwell, D., Farreres, J., Bertran, M., Martí, M.A., Black, W., Elkateb, S., Kirk, J., Vossen, P., and Fellbaum, C. 2008. Arabic Wordnet: Current State and Future Extensions. In Proceedings of the Fourth International Global WordNet - Conference, 387--406.Google Scholar
Vossen, P., Ed., 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Norwell, MA, USA. Google ScholarDigital Library
Fellbaum, C., Alkhalifa, M., Black, W. J., Elkateb, S., Pease, A., Rodríguez, H., and Vossen, P. 2006. Building a WordNet for Arabic. In Proceedings of the 5th Conference on Language Resources and Evaluation (ELRA - LREC 2006, Genova), 29--34.Google Scholar
Boschetti, F., Del Gratta, R., and Lamè, M. 2014. Computer Assisted Annotation of Themes and Motifs in Ancient Greek Epigrams: First Steps. In Proceedings of CLIC, Computational Linguistics Italian Conference. Pisa, Italy.Google Scholar
Blevins, J.P. 2006. Word-based morphology. Journal of Linguistics 42, 531--573.Google ScholarCross Ref
Ferro, M., Pezzulo, G., and Pirrelli, V. 2010. Morphology, Memory and the Mental Lexicon. In Lingue e Linguaggio, vol. IX(2), Interdisciplinary aspects to understanding word processing and storage, V. Pirrelli, Ed. Il Mulino, Bologna, 199--238.Google Scholar
Pirrelli, V., Ferro, M., and Calderone, B. 2011. Learning paradigms in time and space. Computational evidence from Romance languages. In Morphological Autonomy: Perspectives from Romance Inflectional Morphology, M. Goldbach, M. O. Hinzelin, M. Maiden, and J.C. Smith, Eds. Oxford University Press, Oxford, 135--157.Google Scholar
Marzi, C., Ferro, M., and Pirrelli, V. 2012. Word alignment and paradigm induction. Lingue e Linguaggio XI, 2, 251--274.Google Scholar
Marzi, C., Ferro, M., and Pirrelli, V. 2014. Morphological structure through lexical parsability. Lingue e Linguaggio XIII, 2, 263--290.Google Scholar
Henson, R. N. A. 1999. Coding position in short-term memory. International Journal of Psychology 34, 5--6, 403--409.Google ScholarCross Ref
Davis, C. J. 2010. The spatial coding model of visual word identification. Psychological Review 117, 3, 713--758.Google ScholarCross Ref
Davis, C. J. and Bowers, J. S. 2004. What do letter migration errors reveal about letter position coding in visual word recognition? Journal of Experimental Psychology: Human Perception and Performance 30, 923--941.Google ScholarCross Ref
Halle, M. and Marantz, A. 1993. Distributed Morphology and the pieces of inflection. In The view from building 20, K. Hale and S. J. Keyser, Eds. MIT Press, Cambridge, MA, 111--176.Google Scholar
Embick, D. and Halle, M. 2005. On the Status of stems in morphological theory. In Romance Languages and Linguistics Theory 2003, T. Geerts, I. van Ginneken, and H. Jacobs, Eds. John Benjamins, Amsterdam, 37--62.Google Scholar
Marzi, C. 2014. Models and dynamics of the morphological lexicon in mono- and bilingual acquisition. PhD unpublished dissertation. University of Pavia.Google Scholar
Marzi, C., Ferro, M., Caudai, C. and Pirrelli, V. 2012. Evaluating Hebbian Self-Organizing Memories for Lexical representation and Access. Proceedings of 8th International Conference on Language Resources and Evaluation, (ELRA - LREC 2012, Malta), 886--893.Google Scholar
Marzi, C., Nahli, O., and Ferro, M. 2014. Word Processing for Arabic Language. IEEE - CiST14 Colloquium on Information Science and Technology - ANLP Invited Session.Google Scholar
Hickok, G.M., and Poeppel, D. 2004. Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92, 67--99.Google ScholarCross Ref
D'Esposito, M. 2007. From cognitive to neural models of working memory. Philosophical Transactions of the Royal Society B: Biological Sciences 362, 761--772.Google ScholarCross Ref
Saur, D., Kreher, B.W., Schnell, S., Kümmerer, D., Kellmeyer, P., Vry, M.-S., Umarova, R., Musso, M., Glauche, V., Abel, S., Huber, W., Rijntjes, M., Hennig, J., and Weiller, C. 2008. Ventral and dorsal pathways for language. Proc. Nat. Academy of Sciences 105, 46, 18035--18046.Google ScholarCross Ref
Forkel, S. J., Thiebaut de Schotten, M., Dell'Acqua, F., Kalra, L., Murphy, D.G.M., Williams, S.C.R., and Catani, M. 2014. Anatomical predictors of aphasia recovery: a tractography study of bilateral perisylvian language networks. Brain 137, 2027--2039.Google ScholarCross Ref
Ma, W. J., Husain, M., and Bays, P.M. 2014. Changing concepts of working memory. Nature Neuroscience 17, 3, 347--356.Google ScholarCross Ref
Libben, G. 2005. Everything is psycholinguistics: Material and methodological considerations in the study of compound processing. Canadian Journal of Linguistics 50, 267--283.Google ScholarCross Ref
Baayen, R.H. 2007. Storage and computation in the mental lexicon. In The Mental Lexicon: Core Perspectives, G. Jarema, G. Libben, Eds. Elsevier, 81--104.Google Scholar
Luce, P., Pisoni, D., and Goldinger, S.D. 1990. Similarity neighborhoods of spoken words. In Cognitive models of speech pro-cessing: Psycholinguistic and computational perspectives, G.T.M. Altmann, Ed. MIT Press, Cambridge, MA, 122--147. Google ScholarDigital Library
Huntsman, L.A. and Lima, S.D. 2002. Orthographic Neighbors and Visual Word Recognition. Journal of Psycholinguistic Research 31, 289--306.Google ScholarCross Ref
Goldrick, M., Folk, J. R., and Rapp, B. 2010. Mrs. Malaprop's neighborhood: Using word errors to reveal neighborhood structure. Journal of Memory and Language 62, 2, 113--13.Google ScholarCross Ref

Index Terms

Computational Linguistics and Language Physiology: Insights from Arabic NLP and Cooperative Editing

Recommendations

Hebrew Computational Linguistics: Past and Future

This paper reviews the current state of the art in Natural Language Processing for Hebrew, both theoretical and practical. The Hebrew language, like other Semitic languages, poses special challenges for developers of programs for natural language ...
Read More
Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair
Semantic word similarity is a quantitative measure of how much two words are contextually similar. Evaluation of semantic word similarity models requires a benchmark corpus. However, despite the millions of speakers and the large digital text of the Urdu ...
Read More
Data-driven synset induction and disambiguation for wordnet development

Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AIUCD '14: Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital Ecosystem
September 2014
119 pages
ISBN:9781450332958
DOI:10.1145/2802612
Editors:
Francesca Tomasi,
Roberto Rosselli Del Turco,
Anna Maria Tammaro,
General Chair:
Dino Buzzetti
University of Bologna, Italy -- AIUCD President
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 September 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Language neuro-physiology
Mental Lexicon
Non-concatenative morphology
Optical Character Recognition
Temporal Self-organising Maps
WordNet
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 58
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Computational Linguistics and Language Physiology: Insights from Arabic NLP and Cooperative Editing

AIUCD '14: Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital Ecosystem

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hebrew Computational Linguistics: Past and Future

Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair

Data-driven synset induction and disambiguation for wordnet development

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Computational Linguistics and Language Physiology: Insights from Arabic NLP and Cooperative Editing

AIUCD '14: Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital Ecosystem

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hebrew Computational Linguistics: Past and Future

Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair

Data-driven synset induction and disambiguation for wordnet development

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media