Skip to main content
Log in

BLARK for multi-dialect languages: towards the Kurdish BLARK

  • Project Notes
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper we introduce the Kurdish BLARK (Basic Language Resource Kit). The original BLARK has not considered multi-dialect characteristics and generally has targeted reasonably well-resourced languages. To consider these two features, we extended BLARK and applied the proposed extension to Kurdish. Kurdish language not only faces a paucity in resources, but also embraces several dialects within a complex linguistic context. This paper presents the Kurdish BLARK and shows that from Natural language processing and computational linguistics perspectives the revised BLARK provides a more applicable view of languages with similar characteristics to Kurdish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Notes

  1. Kurmanji is wiritten in different ways such as Kirmanji and Kirmanci.

  2. Gorani is written as Gurani as well. Zazai, Zaza and Zazaki all refer to one dialect.

  3. Yekgirtû is a Sorani word (in Kurmanji it is called Yekgirtî or Yekgrig), which is literally translated as unified (Kurdish Academy of Language 2017a).

  4. Used for dialect adjustment only (see (Kurdish Academy of Language 2015)).

  5. Kurdish-Cyrillic aspirated cases (see (Kurdish Academy of Language 2015)).

  6. One may call this a new Kurdish text to differentiate it from other sources such as the ancient roots of Kurdish literature (Naderi 2011). For example, there are scholars who suggest that Avesta, the major religious text of Zoroastrianism, could be considered as the ancient root of Kurdish (Paul 2014; Kurdish Academy of Language 2017b; Nebez 2004). However, whether this proposition is accepted or not, studying Kurdish evolution is not an easy task because as Shakely (2002, 2016) states Kurdish literature does not have a prolonged history and the required records.

  7. The reasons for the dominance of Sorani in literature is beyond the scope of this article. The interested readers and researchers can refer to (Hassanpour 1992; Leezenberg 1993)

  8. The usage, stability, and continuity of the gender case is different according to the region (Haig and Öpengin 2015; Öpengin 2015b; Mahmoudveysi and Bailey 2013; Hassanpour 1992). The Kurdish linguists state that Sorani has used gender case before which has been deprecated during its evolution, though there are still some specific usage of it in a few Sorani speaking regions (Hassani and Medjedovic 2016; Hassanpour 1992).

  9. However, the debate among linguists about the Kurdish dialects has continued for several decades and there still are many open and unanswered questions about the evolutions of Kurdish dialects and the relationships among them (Bajalan 2017; Jügel 2016; Malmasi 2016; Mahmoudveysi and Bailey 2013; Hassanpour 1992; McCarus 1960; MacKenzie 1962). Neither our aim is nor we are in the position to put an end to this debate, rather we prefer to be part of the development of a framework and the required LTs which could help in studying Kurdish not only from pure linguistics perspective but also through the application of computational linguistics.

  10. In 2016 Google added Kurmanji to its Google Translate web application. It is a considerable step for exposure of the Kurdish dialects globally. However, this is only one of the dialects of several Kurdish dialects, hence it would be a simplistic view if it is considered as the exposure of Kurdish as a language.

  11. This means that we have observed cases that indicate various attempts of providing specific tools, though we can neither categorize the case as Available nor Unavailable. For this case, we put a question mark as an identifier to let us spot these items, which require more investigation in the future works, either by the writers of this paper, or by other interested researchers.

  12. The results of this and the previous item have been summarized in (Hassani 2017).

  13. The results of this item were summarized in an article, under the title “A Method for Proper Noun Extraction in Kurdish” which was presented in SLATE’17 (Symposium on Languages, Applications and Technologies), Vila do Conde (ESMAD-IPP), Portugal, June 26–27, 2017. It will appear on OACIcs (OpenAccess Series in Informatics).

  14. This table has been provided by using all resources that were available to us. However, to the best of our knowledge, this is the first attempt to investigate Kurdish based on BLARK. Therefore, we would warmly welcome any information or comments that could help to make this portrayal as accurate as possible and to keep it up-to-date.

  15. As a first step, we have already submitted a proposal to our institution to take the initiative in establishing a research center dedicated to Kurdish NLP and CL, which has been approved and we have started the initial steps towards its setup. As the first project we are currently working on the development of first Kurdish Optical Character Recognition. We have also made a preliminary announcement about the Kurdish BLARK project through online media (Hassani 2016) and by using professional relations with regard to the Kurdish BLARK.

References

  • Avin Kurdish Library. (2016). http://evinebook.blogfa.com/. (in Kurdish).

  • Ayishe, M. A. (2008). Kurdish–kurdish dictionary. Kurdish: Ministry of Culture-KRG.

    Google Scholar 

  • Bajalan, D. R. (2017). On the frontiers of empire: Culture and power in early modern “Iranian” Kurdistan. Kurdish Studies, 5(1), 1–10. http://www.tplondon.com/journal/index.php/ks/article/view/916.

  • Bedirxan, C. A., & Roger, L. (1986). Kurdische Grammatik: Kurmancî-Dialekt (Vol. 1). Kurdisches Institut.

  • Benesty, J., Mohan Sondhi, M., & Yiteng (Arden), H. (2008). Springer handbook of speech processing. Secaucus, NJ: Springer.

    Book  Google Scholar 

  • Berrio-Zapata, C., & Rojas, H. (2014). The digital divide in the university: The appropriation of ICT in Higher Education Students from bogota. Comunicar: Colombia.

    Google Scholar 

  • Binnenpoorte, D., Catia, C., Elisabeth, D., Janienke, S., & De Folkert, V. (2002a). Towards a roadmap for human language technologies: Dutch-Flemish experience. In Proceedings lrec 2002.

  • Binnenpoorte, D, de Folkert, V., Janienke, S., Walter, D., Helmer, S., & Catia, C. (2002b) A field survey for establishing priorities in the development of HLT Resources for Dutch. (In Lrec).

  • CNN. (2016). Kurdish people fast facts. http://edition.cnn.com/2014/08/18/world/kurdish-people-fast-facts/.

  • Cullen, R. (2001). Addressing the digital divide. Online Information Review, 25(5), 311–320.

    Article  Google Scholar 

  • Erdal, R. (2015). Lîsteya soranî-kurmancî. http://zanin.ir/wp-content/uploads/downloads/2014/09/Pirt%C3%BBk%C3%AAn-malpera-Zan%C3%AEn.pdf. (in Kurdish).

  • Ergül, S. T. (2015). An overview of Kurdish literature in Turkish. Trans. Nilgün Dungan and Saliha Paker. Tradition, Tension and Translation in Turkey.

  • Esmaili, K. S. (2012). Challenges in Kurdish text processing. arXiv preprint arXiv:1212.0074.

  • Esmaili, K. S., Donya, E., Shahin, S., Purya, A., Asrin, M., Somayeh, Y., Shownem, H. (2013). Building a test collection for Sorani Kurdish. In IEEE on 2013 ACS international conference on computer systems and applications (AICCSA) (pp. 1–7). IEEE.

  • Esmaili, K. S., & Salavati, S. (2013). Sorani Kurdish versus Kurmanji Kurdish: An empirical comparison. In Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 300–305).

  • Esmaili, K. S., Salavati, S., & Datta, A. (2014). Towards kurdish information retrieval. ACM Transactions on Asian Language Information Processing (TALIP), 13(2), 7.

    Article  Google Scholar 

  • Ethnologue. (2015a). Kurdish, Central. http://www.ethnologue.com/language/ckb.

  • Ethnologue. (2015b). Kurdish, Northern. http://www.ethnologue.com/language/kmr.

  • Ethnologue. (2015c). Kurdish, Southern. http://www.ethnologue.com/language/sdh.

  • Ferheng azad (Azad Dictionary). (2016). Pêvek:soranî-kurmancî (Amendment: Sorani-Kurmanji). https://ku.wiktionary.org/wiki/P%C3%AAvek:soran%C3%AE-kurmanc%C3%AE. (in Kurdish)

  • Gasser, M. (2006). How language works, Ed3.0. http://www.indiana.edu/hlw/book.html.

  • GeoNames. (2016). GeoNames. http://www.geonames.org/.

  • Haig, G., & Matras, Y. (2002). Kurdish linguistics: A brief overview. STUF-Language Typology and Universals, 55(1), 3–14.

    Google Scholar 

  • Haig, G., & Öpengin, E. (2014a). Introduction to special issue-Kurdish: A critical research overview. Kurdish Studies, 2(2), 99–122.

    Google Scholar 

  • Haig, G., & Öpengin, E. (2014b). Regional variation in Kurmanji: A preliminary classification of dialects. Kurdish Studies, 2(2), 143–176.

    Google Scholar 

  • Haig, G., & Öpengin, E. (2015a). Gender in Kurdish: Structural and socio-cultural dimensions. Teoksessa Hellinger, Marlis and Motschenbacher, Heiko (toim.) Gender across languages, 4, 247–276.

    Google Scholar 

  • Haig, G., Öpengin, E. (2015b). Kurmanji Kurdish in Turkey: Structure, varieties and status. The Minority Languages in Turkey.

  • Hassani, H. (2016). Kurdish BLARK (Basic Language Resource Kit). https://www.researchgate.net/project/Kurdish-BLARK-Basic-Language-Resource-Kit.

  • Hassani, H. (2017). Kurdish interdialect machine translation. In Proceedings of the fourth workshop on NLP for similar languages, varieties and dialects (VarDial) (pp. 63–72). Association for Computational Linguistics.

  • Hassani, H, & Dzejla, M. (2016). Automatic Kurdish dialects identification. Computer Science and Information Technology 6(2), 61–78. http://airccj.org/CSCP/vol6/csit65007.pdf.

  • Hassani, H., & Rahel, K. (2011). Kurdish text to speech (KTTS). In Designing for global markets 10 proceedings of the tenth international workshop on internationalisation of products and systems iwips 2011 (pp. 79–89).

  • Hassanpour, A. (1992). Nationalism and language in Kurdistan, 1918–1985. Edwin Mellen Pr.

  • Hennerbichler, F., et al. (2012). The origin of kurds. Advances in Anthropology, 2(02), 64.

    Article  Google Scholar 

  • Hesami (2016). Kurdish definition, origin, usage of names. http://www.hesami.com/names/kurdish/.

  • Huning, M. (2014). TextSTAT—simple text analysis tool/ Concordance software Academia. http://neon.niederlandistik.fu-berlin.de/en/textstat/.

  • Izady, M. (1992). The Kurds: A concise handbook. Taylor & Francis.

  • Johanson, L., & Martine, R. (2012). Copies versus cognates in bound morphology (Vol. 2). Brill.

  • Jügel, T. (2014). On the linguistic history of Kurdish. Kurdish Studies, 2(2), 123–142.

    Google Scholar 

  • Jügel, T. (2016). Parvin Mahmoudveysi, Denise Bailey. The Gorani language of Zarda, a village of West Iran. Abstracta Iranica,34, 35–36. http://abstractairanica.revues.org/41149.

  • Khalid, H. S. (2015). Kurdish dialect continuum, as a standardization solution. International Journal of Kurdish Studies, 1(1).

  • Khalid, H. S., & Karacan, H. (2016). Adjectives in Kurdish language: Comparison between dialects. International Journal of Kurdish Studies, 2, 15–23. doi:10.21600/ijks.76230.

    Google Scholar 

  • Krauwer, S. (1998). ELSNET and ELRA: A common past and a common future. ELRA Newsletter, 3(2), 4–5.

    Google Scholar 

  • Krauwer, S. (2003). The basic language resource kit (blark) as the first milestone for the language resources roadmap. InProceedings of SPECOM 2003.

  • Kreyenbroek, P. G., & Sperl, S. (1992). The Kurds: A contemporary review. Routledge.

  • Kurdipedia. (2016). Kurdish names. Kurdipedia. Kurdipedia. http://navenkurdi.com/en/kurdish-names.

  • Kurdish Academy of Language. (2015). Kurdish Language | Kurdish Academy of Language. http://www.kurdishacademy.org/?q=node/41.

  • Kurdish Academy of Language. (2016a). KAL featured articles. http://www.kurdishacademy.org/?q=ku/book/export/html/5.

  • Kurdish Academy of Language. (2016b). Kurdish dialectology. http://www.kurdishacademy.org/?q=node/212.

  • Kurdish Academy of Languages. (2016c). The KTurdish Population. http://www.kurdishacademy.org/?q=node/199.

  • Kurdish Academy of Language. (2017a). Existing kurdish alphabets. http://www.kurdishacademy.org/?q=node/145.

  • Kurdish Academy of Language. (2017b). The History of Kurdish Language. http://www.kurdishacademy.org/?q=node/37.

  • Kurdish Daily. (2016). Kurdish Names for your Baby. Kurdish Daily. Kurdish Daily. http://ekurd.net/mismas/kurdishnames.htm.

  • Kurdish Institute of Paris. (2016). Kurdish Names. Kurdish Institute of Paris, Kurdish Institute of Paris. http://www.institutkurde.org/en/kurdorama/kurdish_baby_names.php.

  • Kurdish Library. (2016.) http://www.pertwk.com/ktebxane/taxonomy/term/1427?page=7. (in Kurdish).

  • Kurdish Names. (2016). Kurdish names. http://navenkurdi.com/en/kurdish-names.

  • Kurdistan Regional Statistics Office. (2014). Iraqi Kurdistan population forecast for 2009–2020 .

  • Leezenberg, M. (1993). Gorani influence on central Kurdish: Substratum or prestige borrowing? http://www.kurdishacademy.org/?q=node/10.

  • MacKenzie, D. N. (1961). The origins of Kurdish. Transactions of the Philological Society, 60(1), 68–86.

    Article  Google Scholar 

  • MacKenzie, D. N. (1962). Kurdish dialect: Studies (Vol. 2). Oxford: Oxford University Press.

    Google Scholar 

  • Maegaard, B., Steven K., Khalid C., & Jørgensen, L. (2006). The BLARK concept and BLARK for Arabic. In Fifth international conference on language resources and evaluation, lrec’06.

  • Mahmoudveysi, P., & Denise, B. (2013). The Gorani Language of Zarda, a Village of West Iran: Texts, Grammar, and Lexicon. Reichert, L.

  • Malmasi, S. (2016). Subdialectal differences in sorani Kurdish. In Proceedings of the third workshop on nlp for similar languages, varieties and dialects (vardial3) (pp. 89–96). Osaka: The COLING 2016 Organizing Committee. http://aclweb.org/anthology/W16-4812.

  • McCarus, E. R. (1960). Kurdish language studies. The Middle East Journal.

  • McDowall, D. (2005). A modern history of kurds. I.B. Tauris.

  • Minstry of Higher Education and Scientific Research. (2016). The admitted studunest in 2010. Kurdipedia. Kurdipedia.http://www.mhe-krg.org/ku/node/698.

  • Mohammed, B. O. (2013). Handwritten Kurdish character recognition using geometric discertization feature. IJCSC, 4, 51–55.

    Google Scholar 

  • Naderi, L. (2011). An anthology of modern Kurdish literature. University of Kurdistan. http://www.diva-portal.org/smash/get/diva2:1054832/FULLTEXT01.pdf.

  • National Library of Sweden. (2016). National Library of Sweden. http://www.kb.se/english/collections/.

  • Nebez, J. (1976). Ziman-i yekgirtû-i kurdi (’towards a unified kurdish language)[in kurdish]. NUKSE.

  • Nebez, J. (2004). The kurds: History and culture. WKA Publications.

  • Nissinen, M., Parra, C., Rosner, M., Schuurman, I., Skadina, I., Quochi, V., et al. (2010). Description of the BLARK, the situation of individual languages D5C-4. CLARIN, 5(4), 2.

    Google Scholar 

  • Passarotti, M. (2010). Leaving behind the less-resourced status. the case of latin through the experience of the index thomisticus treebank. In 7th saltmil workshop on creation and use of basic lexical resources for less-resourced languages lrec 2010, valetta, malta, 23 May 2010 workshop programme, 27.

  • Paul, L. (2014). Kurdish Language. Encyclopædia Iranica, online edition. Encyclopædia Iranica, online edition. http://www.iranicaonline.org/articles/kurdish-language-i.

  • Paveh Virtual Library. (2016). Paveh Virtual Library. http://pavehpdf.blogfa.com/post-2.aspx.

  • Prys, D. (2006). The BLARK Matrix and its relation to the language resources situation for the Celtic languages. Strategies for Developing Machine Translation for Minority Languages.

  • Qasemizadeh, B., Rahimi, S. & Behrooz, M. B. (2014). The first parallel multilingual corpus of Persian: Toward a Persian BLARK. arXiv:1404.4572.

  • Sadat, F., Farnazeh K., & Atefeh F. (2014). Automatic identification of Arabic language varieties and dialects in social media. In Proceedings of SocialNLP.

  • Seraji, M., Megyesi, B., & Nivre, J., (2012). A basic language resource kit for Persian. In Eight international conference on language resources and evaluation (lrec,. 2012), 3–25 May 2012, istanbul, turkey, 2245–2252. European Language Resources Association: European Language Resources Association.

  • Shakely, F. (2002). Classic and Modern Kurdish Poetry. Kerkuk Kurdistane 14. http://www.morsmal.no/morsmkds/images/sampledata/CLASSIC__MODERN.pdf.

  • Shakely, F. (2016). The modern Kurdish short story. Acta Universitatis Upsaliensis. http://www.diva-portal.org/smash/get/diva2:1054832/FULLTEXT01.pdf.

  • Sheyholislami, J. (2009). Language and nation-building in Kurdistan-Iraq. In Middle Eastern studies association 43th annual meeting, Boston, MA.

  • Swedish Language Bank. (2016). Språkbanken. http://spraakbanken.gu.se/eng/about-us/about-spr%C3%A5kbanken.

  • Umîd, D. (2007). Ferhenga Destî -kurdî bi kurdî - çapa duyem a berfireh. weşanên SEWADê. (in Kurdish).

  • Van den Bosch, A. (2014). Peter Spyns and Jan Odijk (eds): Essential speech and language technology for Dutch: Results by the STEVIN programme. Machine Translation, 28(1), 57–60.

    Article  Google Scholar 

  • Walther, G., & Sagot, B. (2010). Developing a large-scale lexicon for a less-resourced language: General methodology and preliminary experiments on sorani kurdish. In Proceedings of the 7th SaLTMiL workshop on creation and use of basic lexical resources for less-resourced languages (LREC 2010 workshop).

  • Wikipedia. (2016a). Kurdish population. https://en.wikipedia.org/wiki/Kurdish_population.

  • Wikipedia. (2016b). Wikipedia. http://www.wikipedia.org.

  • Wikipedia. (2016c). Zimanê kurdî. https://ku.wikipedia.org/wiki/Ziman%C3%AA_kurd%C3%AE.

Download references

Acknowledgements

We would like to express our warm appreciations to Dr. Dzejla Medjedovic an Assistant Professor and Vice Dean of Graduate Program at the University Sarajevo School of Science and Technology (SSST) for reviewing this paper and providing influential recommendations. We would also like to deeply thank the anonymous reviewers for their constructive suggestions, valuable recommendations, and encouragement which have considerably improved both the content and the structure of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hossein Hassani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hassani, H. BLARK for multi-dialect languages: towards the Kurdish BLARK. Lang Resources & Evaluation 52, 625–644 (2018). https://doi.org/10.1007/s10579-017-9400-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9400-0

Keywords

Navigation