Supervised Content Visualization of Scientific Publications: A Case Study on the ArXiv Dataset

Giannakopoulos, Theodoros; Dimitropoulos, Harry; Metaxas, Omiros; Manola, Natalia; Ioannidis, Yannis

doi:10.1007/978-3-642-38634-3_23

Theodoros Giannakopoulos¹⁸,
Harry Dimitropoulos¹⁸,
Omiros Metaxas¹⁸,
Natalia Manola¹⁸ &
…
Yannis Ioannidis¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7912))

Included in the following conference series:

Intelligent Information Systems Symposium

1057 Accesses
2 Citations

Abstract

A supervised approach to visualization of collections of scientific documents is presented. We have implemented a text classification module, which leads to class probability estimations, along with a dimensionality reduction technique which represents each class in the 2-D space. Integrating those two procedures, any collection of unlabelled documents can be visualized. The arXiv dataset has been adopted for training the classification and visualization modules. We demonstrate the system’s functionality on a corpus of automatically detected publications of particular EU FP7 funding categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

arXiv.org: Cornel University Library article archive
Google Scholar
http://arxiv.org/help/api/index: The arXiv.org API
Manning, C.D., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Rish I.: An empirical study of the naive bayes classifier. In IJCAI 2001 Workshop on Empirical Methods in AI (2001)
Google Scholar
Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics, Stroudsburg (2006)
Chapter Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press (2008)
Google Scholar
Kohonen, T.: Self-Organizing Maps. In: Schroeder, M.R., Huang, T.S. (eds.), 3rd edn., Springer-Verlag New York, Inc., Secaucus (2001)
Google Scholar
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Conference on Empirical Methods in Natural Language Processing (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Management of Data, Information, and Knowledge Group of the Department of Informatics & Telecommunications, University of Athens, 15784, Greece
Theodoros Giannakopoulos, Harry Dimitropoulos, Omiros Metaxas, Natalia Manola & Yannis Ioannidis

Authors

Theodoros Giannakopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Harry Dimitropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Omiros Metaxas
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Manola
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Mieczysław A. Kłopotek , Jacek Koronacki , Małgorzata Marciniak & Agnieszka Mykowiecka , , &
Institute of Computer Science, Polish Academy of Sciences, ul. Brzegi 55, 80-045, Gdańsk, Poland
Sławomir T. Wierzchoń

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giannakopoulos, T., Dimitropoulos, H., Metaxas, O., Manola, N., Ioannidis, Y. (2013). Supervised Content Visualization of Scientific Publications: A Case Study on the ArXiv Dataset. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-38634-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics