Abstract
The paper is devoted to the issue of short text classification, working on free textual descriptions of books, gathered by crawling the GoodReads portal. Those descriptions are short, often incomplete, and highly biased towards the genre of their respective books, so that establishing a notion of proximity between such texts is a challenging task. Each book was assigned multiple categories from the total number of 506 categories, which makes the problem of genre distribution statistically significant. In addition, the number of the descriptions varies from genre to genre, causing the data to be imbalanced. In order to choose the best text classification method for this specific task, we examine different methods, including baseline naive Bayes models and semantic enrichment methods consuming neural-based distributional models. The algorithms have been evaluated in terms of the classification quality on the unique data set of almost two hundred thousands book descriptions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: ACL 2012, pp. 873–882. Association for Computational Linguistics, Jeju Island (2012)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML 2014, pp. 1188–1196. JMLR.org, Beijing (2014)
Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). https://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: ANIPS 2013, pp. 3111–3119. Curran Associates Inc., Lake Tahoe (2013)
Raschka, S.: Naive Bayes and text classification I-introduction and theory. arXiv preprint arXiv:1410.5329 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Sobkowicz, A., Kozłowski, M., Buczkowski, P. (2018). Reading Book by the Cover—Book Genre Detection Using Short Descriptions. In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds) Man-Machine Interactions 5. ICMMI 2017. Advances in Intelligent Systems and Computing, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-319-67792-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-67792-7_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67791-0
Online ISBN: 978-3-319-67792-7
eBook Packages: EngineeringEngineering (R0)