Skip to main content

A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8403))

Abstract

With the rapid increase in the volume of Arabic opinionated posts on different social media forums, comes an increased demand for Arabic sentiment analysis tools and resources. Social media posts, especially those made by the younger generation, are usually written using colloquial Arabic and include a lot of slang, many of which evolves over time. While some work has been carried out to build modern standard Arabic sentiment lexicons, these need to be supplemented with dialectical terms and continuously updated with slang. This paper proposes a fully automated approach for building a dialectical/slang subjectivity lexicon for use in Arabic Sentiment analysis using lexico-syntactic patterns.  Since existing Arabic part of speech taggers and other morphological resources have been found to handle colloquial Arabic very poorly, the presented approach does not employ any such tools, allowing the presented approach to generalize across dialects with some minor modifications.   Results of experiments, that targeted Egyptian Arabic, show the approach’s ability to detect subjective internet slang represented by single words or by multi-word expressions, as well as classifying the polarity of these with a high degree of precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Semiocast, Geolocation analysis of Twitter accounts and tweets by Semiocast (2012), http://bit.ly/1kwY9OZ

  2. Farid, D.: Egypt has the largest number of Facebook users in the Arab world. Daily News Egypt (September 2013)

    Google Scholar 

  3. El-Beltagy, S.R., Ali, A.: Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study. In: Proceedings of 9th International Conference on Innovations in Information Technology (IIT), pp. 215–220 (2013)

    Google Scholar 

  4. Volkova, S., Wilson, T., Yarowsky, D.: Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 505–510 (2013)

    Google Scholar 

  5. Turney, P.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Computational Linguistics (ACL), pp. 417–424 (July 2002)

    Google Scholar 

  6. Banea, C., Mihalcea, R., Wiebe, J.: A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 215–220 (2008)

    Google Scholar 

  7. Abdul-Mageed, M., Diab, M.: Toward Building a Large-Scale Arabic Sentiment Lexicon. In: Proceedings of the 6th International Global WordNet Conference, pp. 18–22 (2012)

    Google Scholar 

  8. Esuli, A., Sebastiani, F.: Determining the semantic orientation of terms through gloss classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 617–624 (2005)

    Google Scholar 

  9. Velikovich, L., Blair-Goldensohn, S.: The viability of web-derived polarity lexicons. In: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL), pp. 777–785 (2010)

    Google Scholar 

  10. Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th Conference on Computational Linguistics, COLING 1992, vol. 2, pp. 539–545 (1992)

    Google Scholar 

  11. Klaussner, C., Zhekova, D.: Lexico-Syntactic Patterns for Automatic Ontology Building. In: Proceedings of the Student Research Workshop Associated with RANLP, pp. 109–114 (2011)

    Google Scholar 

  12. Xu, J., Croft, W.B.: Corpus-Based Stemming using Co-occurrence of Word Variants 1 Introduction. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)

    Article  Google Scholar 

  13. Twitter REST API version 1.1, https://dev.twitter.com/docs/api/1.1

  14. Larkey, L.S., Ballesteros, L., Connell, M.E.: Light Stemming for Arabic Information Retrieval. In: Arabic Computational Morphology, pp. 221–243 (2007)

    Google Scholar 

  15. Singhal, A.: Modern Information Retrieval: A Brief Overview. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp. 35–43 (2001)

    Google Scholar 

  16. El-Beltagy, S.R., Ali, A.: unWeighted Opinion Mining Lexicon, Egyptian Arabic (2013), http://bit.ly/MGtMqU

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

ElSahar, H., El-Beltagy, S.R. (2014). A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54906-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54905-2

  • Online ISBN: 978-3-642-54906-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics