Skip to main content

Information Extraction Model for Afan Oromo News Text

  • Conference paper
  • First Online:
Information and Communication Technology for Development for Africa (ICT4DA 2019)

Abstract

Information Extraction (IE) concerned with the automatic extraction of facts from text and stores them in a database for easy use and management of the data. As the first research work on IE from Afan Oromo text, we designed a model that deals with Infrastructure news domains in the Oromo language. The proposed model has document preprocessing, learning and extraction and post processing as its main components. In this work recall, precision and F-measure are used as evaluation metrics for Afan Oromo Text Information Extraction (AOTIE). Being trained and tested for the dataset of size 3169 tokens, AOTIE performed 79.5% precision, 80.5% recall and 80% F-measure. These results are used as a baseline to experiment on AOTIE. We set up two main experimentation scenarios to experiment on AOTIE. The first scenario is conducted by developing a gazetteer. The second scenario is aimed at observing the influence of Afan Oromo grammatical structure. Both scenarios showed that, the performance of AOTIE is mostly dependent on grammatical structure of Afan Oromo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Corro, L.D.: Methods for open information extraction and sense disambiguation on natural language text, Doktor der Ingenieurwissenschaften (Dr.-Ing.), Naturwissenschaftlich-Technischen, Saarlandes (2015)

    Google Scholar 

  2. Fabio Ciravegna, C.G., Kushmerick, N., Lavelli, A., Muslea, I.: Adaptive text extraction and mining (ATEM 2006). In: 11th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of the Workshop, Trento, Italy, 4 April 2006

    Google Scholar 

  3. Wilks, J.C.A.Y.: Information extraction, Lecture note on Information extraction

    Google Scholar 

  4. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  5. Ababa, A.: Population Census Commission, Summary and statistical report of the 2007 population and housing census, population size by age and sex (2008)

    Google Scholar 

  6. Etnologue: Show Language. http://www.ethnologue.com/web.asp. Accessed 15 Jan 2016

  7. Siefkes, C., Siniakov, P.: An overview and classification of adaptive approaches to information extraction. J. Data Semantics IV, 172–212 (2005)

    Google Scholar 

  8. Masche, P.: Multilingual information extraction, Master’s thesis, University of Helsinki, Faculty of Science, Department of Computer Science (2004)

    Google Scholar 

  9. Tsedalu, G.: Information extraction model from Amharic news texts, Addis Ababa University (2010)

    Google Scholar 

  10. Stanford tokenization. https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html. Accessed 24 Apr 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sisay Abera or Tesfa Tegegne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abera, S., Tegegne, T. (2019). Information Extraction Model for Afan Oromo News Text. In: Mekuria, F., Nigussie, E., Tegegne, T. (eds) Information and Communication Technology for Development for Africa. ICT4DA 2019. Communications in Computer and Information Science, vol 1026. Springer, Cham. https://doi.org/10.1007/978-3-030-26630-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26630-1_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26629-5

  • Online ISBN: 978-3-030-26630-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics