Abstract
A supervised approach to visualization of collections of scientific documents is presented. We have implemented a text classification module, which leads to class probability estimations, along with a dimensionality reduction technique which represents each class in the 2-D space. Integrating those two procedures, any collection of unlabelled documents can be visualized. The arXiv dataset has been adopted for training the classification and visualization modules. We demonstrate the system’s functionality on a corpus of automatically detected publications of particular EU FP7 funding categories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
arXiv.org: Cornel University Library article archive
http://arxiv.org/help/api/index: The arXiv.org API
Manning, C.D., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval. Cambridge University Press (2008)
Rish I.: An empirical study of the naive bayes classifier. In IJCAI 2001 Workshop on Empirical Methods in AI (2001)
Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics, Stroudsburg (2006)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press (2008)
Kohonen, T.: Self-Organizing Maps. In: Schroeder, M.R., Huang, T.S. (eds.), 3rd edn., Springer-Verlag New York, Inc., Secaucus (2001)
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Conference on Empirical Methods in Natural Language Processing (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giannakopoulos, T., Dimitropoulos, H., Metaxas, O., Manola, N., Ioannidis, Y. (2013). Supervised Content Visualization of Scientific Publications: A Case Study on the ArXiv Dataset. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-38634-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)