Skip to main content

Semantic Clustering of Scientific Articles with Use of DBpedia Knowledge Base

  • Chapter
  • First Online:
Intelligent Tools for Building a Scientific Information Platform

Part of the book series: Studies in Computational Intelligence ((SCI,volume 390))

Abstract

A case study of semantic clustering of scientific articles related to Rough Sets is presented. The proposed method groups the documents on the basis of their content and with assistance of DBpedia knowledge base. The text corpus is first treated with Natural Language Processing tools in order to produce vector representations of the content and then matched against a collection of concepts retrieved from DBpedia. As a result, a new representation is constructed that better reflects the semantics of the texts. With this new representation, the documents are hierarchically clustered in order to form partition of papers that share semantic relatedness. The steps in textual data preparation, utilization of DBpedia and clustering are explained and illustrated with experimental results. Assessment of clustering quality by human experts and by comparison to traditional approach is presented.

The authors are supported by the grant N N516 077837 from the Ministry of Science and Higher Education of the Republic of Poland and by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia – a crystallization point for the web of data. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 7, 154–165 (2009)

    Article  Google Scholar 

  2. Broda, B., Jaworski, D., Piasecki, M.: Parallel, massive processing in SuperMatrix - a general tool for distributional semantic analysis of corpus. In: Proceedings of International Multiconference on Computer Science and Information Technology - IMCSIT 2010, pp. 373–379 (2010)

    Google Scholar 

  3. Feldman, R., Sanger, J. (eds.): The Text Mining Handbook. Cambridge University Press (2007)

    Google Scholar 

  4. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 6–12 (2007)

    Google Scholar 

  5. Grochowalski, P., Suraj, Z.: RSDS - the Rough Set Database System - a bibliographic database on wide aspects of rough sets (2009), http://rsds.univ.rzeszow.pl/

  6. Janusz, A.: Utilization of dynamic reducts to improve performance of the rule-based similarity model for highly-dimensional data. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops, pp. 432–435. IEEE (2010)

    Google Scholar 

  7. Jones, K.S., Willet, P.: Readings in Information Retrieval. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  8. Maguitman, A.G., Menczer, F., Roinestad, H., Vespignani, A.: Algorithmic detection of semantic similarity. In: Ellis, A., Hagino, T. (eds.) WWW, pp. 107–116. ACM (2005)

    Google Scholar 

  9. Oleshchuk, V.A., Pedersen, A.: Ontology based semantic similarity comparison of documents. In: DEXA Workshops, pp. 735–738. IEEE Computer Society (2003)

    Google Scholar 

  10. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2009), http://www.R-project.org

  11. Shinyama, Y.: PDFMiner: Python PDF parser and analyzer (2010), http://www.unixuser.org/~euske/python/pdfminer/

  12. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Boston (2006), http://www-users.cs.umn.edu/~kumar/dmbook/index.php

    Google Scholar 

  13. The DBPedia Community: The DBPedia knowledge base (2011), http://DBpedia.org/

  14. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E.G.M., Milios, E.E.: Semantic similarity methods in wordnet and their application to information retrieval on the web. In: Bonifati, A., Lee, D. (eds.) WIDM, pp. 10–16. ACM (2005)

    Google Scholar 

  15. Wikipedia Community: Wikipedia - the free Encyclopedia (2011), http://en.wikipedia.org/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Szczuka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

Szczuka, M., Janusz, A., Herba, K. (2012). Semantic Clustering of Scientific Articles with Use of DBpedia Knowledge Base. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24809-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24809-2_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24808-5

  • Online ISBN: 978-3-642-24809-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics