Abstract
Most Arabs can read text written in Modern Standard Arabic (MSA). However, to easily express themselves, they may find it easier to switch to informal (colloquial) Arabic. The web is open for anyone to express him/herself freely, and people are expressing themselves through many social media platforms, such as blogs and forums increasingly in their native colloquies. Search engines are very good at handling queries in MSA, though not as good if the query is written in colloquial Arabic. Two issues will be addressed in this paper. First, many younger generation Arabs find it hard to write in MSA, which means that many results are missed due to improperly posted queries; and second, a query written in MSA will not retrieve documents written in colloquial Arabic. Thus, with the goal of universal accessibility of the web to all Arabic users, we need a successful mechanism that translates the query back and forth between MSA and the variety of colloquies spread throughout the Arab countries. As a case study, we investigate one of the local dialects in Saudi Arabia, a leading country in social media usage much of which is in colloquial language. We present a web information retrieval system for Arabic that addresses this concern. To test the proposed method, we compiled a corpus of over fourteen hundred documents and measured the performance of our system using 50 sample queries achieving an average recall and precision of 93.4 and 83.6%, respectively.
Similar content being viewed by others
Notes
In the Latin alphabet the diacritics are used to change the sound value of the letter to which they are added, while in Arabic they serve as a vowel pointing system. Distinct letters serve as long vowels, but for short vowels the diacritical markings are used. See Sect. 1 for more detail on the diacritical marking.
Two or more words having the same spelling but different meanings and origins, e.g., lie (untrue) and lie (recline).
References
Ahmad, F., Nürnberger, A.: N-gram conflation approach for Arabic text processing. In: Proceeding of the International Workshop on Improving Non English Web Searching (iNEWS ’07), Amsterdam, The Netherlands, pp. 39–46 (2007)
Ahmad, F., Nürnberger, A.: Evaluation of N-gram conflation approaches for Arabic text retrieval. J. Am. Soc. Inform. Sci. Technol. 60(7), 1448–1465 (2009)
Al-Azami, M.: The History of the Qur’anic Text: From Revelation to Compilation, 2nd edn. Al-Qalam Publishing, Sherwoord Park (2011)
Al-Fedagi, S., Al-Anzi, F.: A new algorithm to generate Arabic root-pattern forms. In: Proceedings of the 11th National Computer Conference, Dhahran, Saudi Arabia, pp. 4–7 (1989)
Al-Khotani, S.: Kingdom leads growth in Arabic digital content. Saudi Gazette, 10 Sep 2013. http://saudigazette.com.sa/index.cfm?method=home.regcon&contentid=20130910179928 (2013)
Alamlahi, Y., Ahmed, F.: Sana’ani dialect to modern standard Arabic: rule-based direct machine translation. In: Proceedings of the 2011 International Conference on Artificial Intelligence (ICAI’11) (2011)
Alkanhal, M., Al-Badrashiny, M., Alghamdi, M., Al-Qabbany, A.: Automatic stochastic Arabic spelling correction with emphasis on space insertions and deletions. IEEE Trans. Audio Speech Lang. Process. 20(7), 2111–2122 (2012)
Al-Gaphari, G.H., Al-Yadoumi, M.: A method to convert Sana’ani accent to modern standard Arabic. Int. J. Inf. Sci. Manag. 8(1), 39–49 (2010)
Almaktebah AlShamela: http://shamela.ws/browse.php/book-7057/page-69 (2013)
Al-Qanair, H.: The effect of migrant workers on the Arabic language in the Gulf region (in Arabic). Alriyadh, 30 Jun 2013. http://www.alriyadh.com/848196 (2013)
Attia, M.: Large scale computational processor of the Arabic morphology, and applications. Master’s thesis, Cairo, Egypt (2000)
Azmi, A., Almajed, R.: A survey of automatic Arabic diacritization techniques. Nat. Lang. Eng. 21(3), 477–496 (2015)
Bellamy, J.: Two pre-islamic arabic inscriptions revised: Jabal Ramm and Umm AlJimal. J. Am. Orient. Soc. 108(3), 369–372 (1988)
Benajiba, Y., Diab, M.: A web application for dialectal Arabic text annotation. In: Proceedings of the Workshop on Semitic Language Processing (LREC-2010), Malta (2010)
Boudel, A., Gaskell, M.: A re-examination of the default system for Arabic plurals. Lang. Cognit. Process 17(3), 321–343 (2002)
Cadora, F.: Lexical relationships among Arabic dialects and the Swadesh list. Anthropol. Linguist. 18(16), 237–260 (1976)
CIA: Central Intelligence Agency: World Factbook. Washington, DC (2008)
Cote, R.: Choosing one dialect for the Arabic speaking world: a status planning dilemma. In: Arizona Working Papers in SLA & Teaching, vol. 16, pp. 75–97 (2009)
Curley, N.: The rise of Arabic on the web. http://wamda.com/2012/04/the-rise-of-arabic-on-the-web-infographic (2012)
Darwish, K., Magdy, W.: Arabic information retrieval. Found. Trends Inf. Retr. 7(4), 239–342 (2013)
Daoudi, A.: Globalisation and e-Arabic: the emergence of a new language at the literal and figurative levels. In: Hasselblatt, C., Houtzagers, P., Pareren, R.V. (eds.) Language Contact in Times of Globalization, pp. 61–76. Rodopi, Amsterdam (2011)
Davis, M.W., Ogden, W.C.: Free resources and advanced alignment for cross-language text retrieval. In: Proceedings of the 6th Text Retrieval Conference (TREC-6), Gaithersburg, MD, pp. 385–395 (1997)
Debili, F., Achour, H., Souissi, E.: De l’etiquetage grammatical a la voyellation automatique de l’arabe. Technical Report. Correspondances de l’Institut de Recherche sur le Maghreb Contemporain 17 (2002)
Diab, M., Habash, N., Rambow, O., Altantawy, M., Benajiba, Y.: COLABA: Arabic dialect annotation and processing. In: Proceedings of the Workshop on Semitic Language Processing (LREC-2010), pp. 66–74 (2010)
El-Khair, I.: Arabic information retrieval. Annu. Rev. Inf. Sci. Technol. 41, 505–533 (2008)
Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)
Ferguson, C.: Diglossia. Word 15(2), 325–340 (1959)
Ferguson, C.: Epilogue: diglossia revisited. In: In Contemporary Arabic Linguistics in Honor of El-Said Badawi, The American University in Cairo (1996)
Goweder, A., De Roeck, A.: Assessment of a significant Arabic corpus. In: Arabic Language Processing: Status and Prospects at ACL/EACL: Workshop, pp. 73–79. Toulouse, France (2001)
Goweder, A., Poesio, M., De Roeck, A., Reynolds, J.: Identifying broken plurals in unvowalised Arabic text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Special Interest Group of the ACL (EMNLP), Barcelona, Spain, pp. 246–253 (2004)
Habash, N.: Introduction to Arabic Natural Language Processing. Morgan & Claypool Publishers, San Rafael (2010)
Habash, N., Rambow, O.: MAGEAD: a morphological analyzer and generator for the Arabic dialects. In: ACL ’06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, pp. 681–688. Association for Computational Linguistics, Sydney, Australia (2006)
Habash, N., Rambow, O., Kiraz, G.: Morphological analysis and generation for Arabic dialects. In: Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Association for Computational Linguistics, pp. 17–24 (2005)
Habash, N., Eskander, R., Hawwari, A.: A morphological analyzer for Egyptian Arabic. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, Montréal, Canada, pp. 1–9 (2012)
Habib, M.B.: An intelligent system for automated Arabic text categorization. Master’s thesis, Cairo, Egypt (2008)
Ingham, B.: Najdi Arabic: Central Arabian. John Benjamins Pub. Co., Amsterdam/Philadelphia (1994)
Jiffry, F.: Saudi Arabia world’s 2nd most Twitter-happy nation. The Arab News, 20 May 2013. http://arabnews.com/news/452204 (2013)
Kent, A., Berry, M.M., Luehrs Jr., F.U., Perry, J.W.: Machine literature searching VIII. Operational criteria for designing information retrieval systems. Am. Doc. 6(2), 93–101 (1955)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Mustafa, M., AbdAlla, H., Suleman, H.: Current approaches in Arabic IR: a survey. In: The 11th International Conference on Asia-Pacific Digital Libraries (ICADL 2008), Bali, Indonesia (2008)
Prochazka Jr., T.: Saudi Arabian Dialects. Kegan Paul Int./Routledge, London (1988)
Rashwan, M., Al-Badrashiny, M., Attia, M., Abdou, S., Rafea, A.: A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Trans. Audio Speech Lang. Process. 19(1), 166–175 (2011)
Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 142–151, (1994)
Semiocast Corporation: Arabic highest growth on Twitter, English expression stabilizes below 40%. http://semiocast.com/publications/2011_11_24_Arabic_highest_growth_on_Twitter (2011)
Shatnawi, M., Yassein, M., Mahafza, R.: A framework for retrieving Arabic documents based on queries written in Arabic slang language. J. Inf. Sci. 38(4), 350–365 (2012)
Taghva, K., Elkhoury, R., Coombs, J.: Arabic stemming without a root dictionary. In: ITCC ’05: International Conference on Information Technology: Coding and Computing, pp. 152–157 (2005)
Versteegh, K.: The Arabic Language. Edinburgh University Press, Edinburgh (2001)
Wahba, K.: Arabic language use and the educated language user. In: Wahba, K., Taha, Z., Englands, L. (eds.) Handbook for Arabic Language Teaching Professionals in the 21st Century, pp. 125–138. Routledge, New York (2006)
Weyman, G.: Translating tweets from the Arabic spring: towards a translation workbench for twitter. http://meedan.org/2012/03/translation-twitter-middle-east-arabic/ (2012)
Whitaker, B.: Arabic words and the Roman alphabet. Tech. rep. www.al-bab.com/arab/language/roman1.htm (2002)
Xu, J., Fraser, A., Weischedel, R.M.: Empirical studies in strategies for Arabic retrieval. In: SIGIR ’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 269–274. Tampere, Finland (2002)
Acknowledgements
We would like to thank all the anonymous reviewers for their helpful comments. This work was supported by a special fund in the Research Center of the College of Computer and Information Sciences (CCIS) at King Saud University.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Azmi, A.M., Aljafari, E.A. Universal web accessibility and the challenge to integrate informal Arabic users: a case study. Univ Access Inf Soc 17, 131–145 (2018). https://doi.org/10.1007/s10209-017-0522-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10209-017-0522-3