Skip to main content
Log in

Universal web accessibility and the challenge to integrate informal Arabic users: a case study

  • Long Paper
  • Published:
Universal Access in the Information Society Aims and scope Submit manuscript

Abstract

Most Arabs can read text written in Modern Standard Arabic (MSA). However, to easily express themselves, they may find it easier to switch to informal (colloquial) Arabic. The web is open for anyone to express him/herself freely, and people are expressing themselves through many social media platforms, such as blogs and forums increasingly in their native colloquies. Search engines are very good at handling queries in MSA, though not as good if the query is written in colloquial Arabic. Two issues will be addressed in this paper. First, many younger generation Arabs find it hard to write in MSA, which means that many results are missed due to improperly posted queries; and second, a query written in MSA will not retrieve documents written in colloquial Arabic. Thus, with the goal of universal accessibility of the web to all Arabic users, we need a successful mechanism that translates the query back and forth between MSA and the variety of colloquies spread throughout the Arab countries. As a case study, we investigate one of the local dialects in Saudi Arabia, a leading country in social media usage much of which is in colloquial language. We present a web information retrieval system for Arabic that addresses this concern. To test the proposed method, we compiled a corpus of over fourteen hundred documents and measured the performance of our system using 50 sample queries achieving an average recall and precision of 93.4 and 83.6%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In the Latin alphabet the diacritics are used to change the sound value of the letter to which they are added, while in Arabic they serve as a vowel pointing system. Distinct letters serve as long vowels, but for short vowels the diacritical markings are used. See Sect. 1 for more detail on the diacritical marking.

  2. Two or more words having the same spelling but different meanings and origins, e.g., lie (untrue) and lie (recline).

References

  1. Ahmad, F., Nürnberger, A.: N-gram conflation approach for Arabic text processing. In: Proceeding of the International Workshop on Improving Non English Web Searching (iNEWS ’07), Amsterdam, The Netherlands, pp. 39–46 (2007)

  2. Ahmad, F., Nürnberger, A.: Evaluation of N-gram conflation approaches for Arabic text retrieval. J. Am. Soc. Inform. Sci. Technol. 60(7), 1448–1465 (2009)

    Article  Google Scholar 

  3. Al-Azami, M.: The History of the Qur’anic Text: From Revelation to Compilation, 2nd edn. Al-Qalam Publishing, Sherwoord Park (2011)

    Google Scholar 

  4. Al-Fedagi, S., Al-Anzi, F.: A new algorithm to generate Arabic root-pattern forms. In: Proceedings of the 11th National Computer Conference, Dhahran, Saudi Arabia, pp. 4–7 (1989)

  5. Al-Khotani, S.: Kingdom leads growth in Arabic digital content. Saudi Gazette, 10 Sep 2013. http://saudigazette.com.sa/index.cfm?method=home.regcon&contentid=20130910179928 (2013)

  6. Alamlahi, Y., Ahmed, F.: Sana’ani dialect to modern standard Arabic: rule-based direct machine translation. In: Proceedings of the 2011 International Conference on Artificial Intelligence (ICAI’11) (2011)

  7. Alkanhal, M., Al-Badrashiny, M., Alghamdi, M., Al-Qabbany, A.: Automatic stochastic Arabic spelling correction with emphasis on space insertions and deletions. IEEE Trans. Audio Speech Lang. Process. 20(7), 2111–2122 (2012)

    Article  Google Scholar 

  8. Al-Gaphari, G.H., Al-Yadoumi, M.: A method to convert Sana’ani accent to modern standard Arabic. Int. J. Inf. Sci. Manag. 8(1), 39–49 (2010)

    Google Scholar 

  9. Almaktebah AlShamela: http://shamela.ws/browse.php/book-7057/page-69 (2013)

  10. Al-Qanair, H.: The effect of migrant workers on the Arabic language in the Gulf region (in Arabic). Alriyadh, 30 Jun 2013. http://www.alriyadh.com/848196 (2013)

  11. Attia, M.: Large scale computational processor of the Arabic morphology, and applications. Master’s thesis, Cairo, Egypt (2000)

  12. Azmi, A., Almajed, R.: A survey of automatic Arabic diacritization techniques. Nat. Lang. Eng. 21(3), 477–496 (2015)

    Article  Google Scholar 

  13. Bellamy, J.: Two pre-islamic arabic inscriptions revised: Jabal Ramm and Umm AlJimal. J. Am. Orient. Soc. 108(3), 369–372 (1988)

    Article  Google Scholar 

  14. Benajiba, Y., Diab, M.: A web application for dialectal Arabic text annotation. In: Proceedings of the Workshop on Semitic Language Processing (LREC-2010), Malta (2010)

  15. Boudel, A., Gaskell, M.: A re-examination of the default system for Arabic plurals. Lang. Cognit. Process 17(3), 321–343 (2002)

    Article  Google Scholar 

  16. Cadora, F.: Lexical relationships among Arabic dialects and the Swadesh list. Anthropol. Linguist. 18(16), 237–260 (1976)

    Google Scholar 

  17. CIA: Central Intelligence Agency: World Factbook. Washington, DC (2008)

  18. Cote, R.: Choosing one dialect for the Arabic speaking world: a status planning dilemma. In: Arizona Working Papers in SLA & Teaching, vol. 16, pp. 75–97 (2009)

  19. Curley, N.: The rise of Arabic on the web. http://wamda.com/2012/04/the-rise-of-arabic-on-the-web-infographic (2012)

  20. Darwish, K., Magdy, W.: Arabic information retrieval. Found. Trends Inf. Retr. 7(4), 239–342 (2013)

    Article  Google Scholar 

  21. Daoudi, A.: Globalisation and e-Arabic: the emergence of a new language at the literal and figurative levels. In: Hasselblatt, C., Houtzagers, P., Pareren, R.V. (eds.) Language Contact in Times of Globalization, pp. 61–76. Rodopi, Amsterdam (2011)

    Google Scholar 

  22. Davis, M.W., Ogden, W.C.: Free resources and advanced alignment for cross-language text retrieval. In: Proceedings of the 6th Text Retrieval Conference (TREC-6), Gaithersburg, MD, pp. 385–395 (1997)

  23. Debili, F., Achour, H., Souissi, E.: De l’etiquetage grammatical a la voyellation automatique de l’arabe. Technical Report. Correspondances de l’Institut de Recherche sur le Maghreb Contemporain 17 (2002)

  24. Diab, M., Habash, N., Rambow, O., Altantawy, M., Benajiba, Y.: COLABA: Arabic dialect annotation and processing. In: Proceedings of the Workshop on Semitic Language Processing (LREC-2010), pp. 66–74 (2010)

  25. El-Khair, I.: Arabic information retrieval. Annu. Rev. Inf. Sci. Technol. 41, 505–533 (2008)

    Article  Google Scholar 

  26. Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)

    Google Scholar 

  27. Ferguson, C.: Diglossia. Word 15(2), 325–340 (1959)

    Article  Google Scholar 

  28. Ferguson, C.: Epilogue: diglossia revisited. In: In Contemporary Arabic Linguistics in Honor of El-Said Badawi, The American University in Cairo (1996)

  29. Goweder, A., De Roeck, A.: Assessment of a significant Arabic corpus. In: Arabic Language Processing: Status and Prospects at ACL/EACL: Workshop, pp. 73–79. Toulouse, France (2001)

  30. Goweder, A., Poesio, M., De Roeck, A., Reynolds, J.: Identifying broken plurals in unvowalised Arabic text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Special Interest Group of the ACL (EMNLP), Barcelona, Spain, pp. 246–253 (2004)

  31. Habash, N.: Introduction to Arabic Natural Language Processing. Morgan & Claypool Publishers, San Rafael (2010)

    Google Scholar 

  32. Habash, N., Rambow, O.: MAGEAD: a morphological analyzer and generator for the Arabic dialects. In: ACL ’06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, pp. 681–688. Association for Computational Linguistics, Sydney, Australia (2006)

  33. Habash, N., Rambow, O., Kiraz, G.: Morphological analysis and generation for Arabic dialects. In: Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Association for Computational Linguistics, pp. 17–24 (2005)

  34. Habash, N., Eskander, R., Hawwari, A.: A morphological analyzer for Egyptian Arabic. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, Montréal, Canada, pp. 1–9 (2012)

  35. Habib, M.B.: An intelligent system for automated Arabic text categorization. Master’s thesis, Cairo, Egypt (2008)

  36. Ingham, B.: Najdi Arabic: Central Arabian. John Benjamins Pub. Co., Amsterdam/Philadelphia (1994)

    Book  Google Scholar 

  37. Jiffry, F.: Saudi Arabia world’s 2nd most Twitter-happy nation. The Arab News, 20 May 2013. http://arabnews.com/news/452204 (2013)

  38. Kent, A., Berry, M.M., Luehrs Jr., F.U., Perry, J.W.: Machine literature searching VIII. Operational criteria for designing information retrieval systems. Am. Doc. 6(2), 93–101 (1955)

    Article  Google Scholar 

  39. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  40. Mustafa, M., AbdAlla, H., Suleman, H.: Current approaches in Arabic IR: a survey. In: The 11th International Conference on Asia-Pacific Digital Libraries (ICADL 2008), Bali, Indonesia (2008)

  41. Prochazka Jr., T.: Saudi Arabian Dialects. Kegan Paul Int./Routledge, London (1988)

    Google Scholar 

  42. Rashwan, M., Al-Badrashiny, M., Attia, M., Abdou, S., Rafea, A.: A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Trans. Audio Speech Lang. Process. 19(1), 166–175 (2011)

    Article  Google Scholar 

  43. Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 142–151, (1994)

  44. Semiocast Corporation: Arabic highest growth on Twitter, English expression stabilizes below 40%. http://semiocast.com/publications/2011_11_24_Arabic_highest_growth_on_Twitter (2011)

  45. Shatnawi, M., Yassein, M., Mahafza, R.: A framework for retrieving Arabic documents based on queries written in Arabic slang language. J. Inf. Sci. 38(4), 350–365 (2012)

    Article  Google Scholar 

  46. Taghva, K., Elkhoury, R., Coombs, J.: Arabic stemming without a root dictionary. In: ITCC ’05: International Conference on Information Technology: Coding and Computing, pp. 152–157 (2005)

  47. Versteegh, K.: The Arabic Language. Edinburgh University Press, Edinburgh (2001)

    Google Scholar 

  48. Wahba, K.: Arabic language use and the educated language user. In: Wahba, K., Taha, Z., Englands, L. (eds.) Handbook for Arabic Language Teaching Professionals in the 21st Century, pp. 125–138. Routledge, New York (2006)

    Google Scholar 

  49. Weyman, G.: Translating tweets from the Arabic spring: towards a translation workbench for twitter. http://meedan.org/2012/03/translation-twitter-middle-east-arabic/ (2012)

  50. Whitaker, B.: Arabic words and the Roman alphabet. Tech. rep. www.al-bab.com/arab/language/roman1.htm (2002)

  51. Xu, J., Fraser, A., Weischedel, R.M.: Empirical studies in strategies for Arabic retrieval. In: SIGIR ’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 269–274. Tampere, Finland (2002)

Download references

Acknowledgements

We would like to thank all the anonymous reviewers for their helpful comments. This work was supported by a special fund in the Research Center of the College of Computer and Information Sciences (CCIS) at King Saud University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aqil M. Azmi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azmi, A.M., Aljafari, E.A. Universal web accessibility and the challenge to integrate informal Arabic users: a case study. Univ Access Inf Soc 17, 131–145 (2018). https://doi.org/10.1007/s10209-017-0522-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10209-017-0522-3

Keywords

Navigation