Turkish labeled text corpus | IEEE Conference Publication | IEEE Xplore

Abstract:

A labeled text corpus made up of Turkish papers' titles, abstracts and keywords is collected. The corpus includes 35 number of different disciplines, and 200 documents pe...Show More

Abstract:

A labeled text corpus made up of Turkish papers' titles, abstracts and keywords is collected. The corpus includes 35 number of different disciplines, and 200 documents per subject. This study presents the text corpus' collection and content. The classification performance of Term Frequcney — Inverse Document Frequency (TF-IDF) and topic probabilities of Latent Dirichlet Allocation (LDA) features are compared for the text corpus. The text corpus is shared as open source so that it could be used for natural language processing applications with academic purposes.
Date of Conference: 23-25 April 2014
Date Added to IEEE Xplore: 12 June 2014
Electronic ISBN:978-1-4799-4874-1
Print ISSN: 2165-0608
Conference Location: Trabzon, Turkey

Contact IEEE to Subscribe

References

References is not available for this document.