Skip to main content

DSCrank: A Method for Selection and Ranking of Datasets

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 672))

Abstract

Considerable efforts have been made to build the Web of Data. One of the main challenges has to do with how to identify the most related datasets to connect to. Another challenge is to publish a local dataset into the Web of Data, following the Linked Data principles. The present work is based on the idea that a set of activities should guide the user on the publication of a new dataset into the Web of Data. It presents the specification and implementation of two initial activities, which correspond to the crawling and ranking of a selected set of existing published datasets. The proposed implementation is based on the focused crawling approach, adapting it to address the Linked Data principles. Moreover, the dataset ranking is based on a quick glimpse into the content of the selected datasets. Additionally, the paper presents a case study in the Biomedical area to validate the implemented approach, and it shows promising results with respect to scalability and performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.w3.org/rdf.

  2. 2.

    http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData.

  3. 3.

    http://www.obofoundry.org/.

  4. 4.

    https://www.w3.org/TR/sparql11-overview/.

  5. 5.

    http://bioknowlogy.biowebdb.org.

  6. 6.

    http://ckan.org/about/.

  7. 7.

    http://download.bio2rdf.org/release/3/release.html.

  8. 8.

    http://rdf4j.org/about.docbook?view.

  9. 9.

    http://wordnet.princeton.edu/wordnet/download.

  10. 10.

    http://info.ils.indiana.edu/~stevecox/unix/s603/man/grep.1.pdf.

  11. 11.

    http://bioknowlogy.biowebdb.org/metaresistomedb/sparql-vt.php.

  12. 12.

    http://ypublish.info/pdf-validation-table.pdf.

  13. 13.

    http://linkeddatacatalog.dws.informatik.uni-mannheim.de/dataset?q=lifescience&sort=score+desc,+metadata_modified+desc.

  14. 14.

    Ratio between relevant datasets retrieved and the number of top ranked datasets.

  15. 15.

    http://download.openbiocloud.org/release/3/release.html.

References

  1. Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  2. Caliskan, K., Ozcan, R.: Comparing classification methods for link context based focused crawlers. In: 2013 International Conference on Electronics, Computer and Computation (ICECCO), pp. 143–146, November 2013

    Google Scholar 

  3. Hausenblas, M.: Exploiting linked data to build web applications. IEEE Internet Comput. 13(4), 68–73 (2009). Accessed 01 May 2016

    Article  MathSciNet  Google Scholar 

  4. Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. (JASIS) 47(1), 70–84 (1996)

    Article  Google Scholar 

  5. Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39200-9_29

    Chapter  Google Scholar 

  6. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  7. Nikolov, A., d’Aquin, M., Motta, E.: What should i link to? identifying relevant sources and classes for data linking. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Horrocks, I., Mizoguchi, R., Wu, Z., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 284–299. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29923-0_19

    Chapter  Google Scholar 

  8. de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 49–55. ACM, New York (2012)

    Google Scholar 

  9. Raman, S., Chaurasiya, V., Venkatesan, S.: Performance comparison of various information retrieval models used in search engines. In: International Conference on Communication, Information Computing Technology (ICCICT), pp. 1–4 (2012)

    Google Scholar 

  10. Salvadores, M., Alexander, P.R., Musen, M.A., Noy, N.F.: Bioportal as a dataset of linked biomedical ontologies and terminologies in RDF. Semant. Web 4(3), 277–284 (2013)

    Google Scholar 

  11. Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998)

    Article  MATH  Google Scholar 

  12. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003)

    Google Scholar 

  13. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially funded by CAPES scholarship, CNPq (proc. 307647/2012-9) and FAPERJ (Proc.E-26/111.147/2011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasmmin Cortes Martins .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Martins, Y.C., da Mota, F.F., Cavalcanti, M.C. (2016). DSCrank: A Method for Selection and Ranking of Datasets. In: Garoufallou, E., Subirats Coll, I., Stellato, A., Greenberg, J. (eds) Metadata and Semantics Research. MTSR 2016. Communications in Computer and Information Science, vol 672. Springer, Cham. https://doi.org/10.1007/978-3-319-49157-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49157-8_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49156-1

  • Online ISBN: 978-3-319-49157-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics