DSCrank: A Method for Selection and Ranking of Datasets

Martins, Yasmmin Cortes; da Mota, Fábio Faria; Cavalcanti, Maria Cláudia

doi:10.1007/978-3-319-49157-8_29

DSCrank: A Method for Selection and Ranking of Datasets

Yasmmin Cortes Martins^14,16,
Fábio Faria da Mota¹⁵ &
Maria Cláudia Cavalcanti¹⁴

Conference paper
First Online: 04 November 2016

859 Accesses
7 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 672))

Abstract

Considerable efforts have been made to build the Web of Data. One of the main challenges has to do with how to identify the most related datasets to connect to. Another challenge is to publish a local dataset into the Web of Data, following the Linked Data principles. The present work is based on the idea that a set of activities should guide the user on the publication of a new dataset into the Web of Data. It presents the specification and implementation of two initial activities, which correspond to the crawling and ranking of a selected set of existing published datasets. The proposed implementation is based on the focused crawling approach, adapting it to address the Linked Data principles. Moreover, the dataset ranking is based on a quick glimpse into the content of the selected datasets. Additionally, the paper presents a case study in the Biomedical area to validate the implemented approach, and it shows promising results with respect to scalability and performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.w3.org/rdf.
2.
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData.
3.
http://www.obofoundry.org/.
4.
https://www.w3.org/TR/sparql11-overview/.
5.
http://bioknowlogy.biowebdb.org.
6.
http://ckan.org/about/.
7.
http://download.bio2rdf.org/release/3/release.html.
8.
http://rdf4j.org/about.docbook?view.
9.
http://wordnet.princeton.edu/wordnet/download.
10.
http://info.ils.indiana.edu/~stevecox/unix/s603/man/grep.1.pdf.
11.
http://bioknowlogy.biowebdb.org/metaresistomedb/sparql-vt.php.
12.
http://ypublish.info/pdf-validation-table.pdf.
13.
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/dataset?q=lifescience&sort=score+desc,+metadata_modified+desc.
14.
Ratio between relevant datasets retrieved and the number of top ranked datasets.
15.
http://download.openbiocloud.org/release/3/release.html.

References

Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)
Article Google Scholar
Caliskan, K., Ozcan, R.: Comparing classification methods for link context based focused crawlers. In: 2013 International Conference on Electronics, Computer and Computation (ICECCO), pp. 143–146, November 2013
Google Scholar
Hausenblas, M.: Exploiting linked data to build web applications. IEEE Internet Comput. 13(4), 68–73 (2009). Accessed 01 May 2016
Article MathSciNet Google Scholar
Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. (JASIS) 47(1), 70–84 (1996)
Article Google Scholar
Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39200-9_29
Chapter Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Nikolov, A., d’Aquin, M., Motta, E.: What should i link to? identifying relevant sources and classes for data linking. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Horrocks, I., Mizoguchi, R., Wu, Z., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 284–299. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29923-0_19
Chapter Google Scholar
de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 49–55. ACM, New York (2012)
Google Scholar
Raman, S., Chaurasiya, V., Venkatesan, S.: Performance comparison of various information retrieval models used in search engines. In: International Conference on Communication, Information Computing Technology (ICCICT), pp. 1–4 (2012)
Google Scholar
Salvadores, M., Alexander, P.R., Musen, M.A., Noy, N.F.: Bioportal as a dataset of linked biomedical ontologies and terminologies in RDF. Semant. Web 4(3), 277–284 (2013)
Google Scholar
Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998)
Article MATH Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003)
Google Scholar
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)
Article Google Scholar

Download references

Acknowledgements

This work was partially funded by CAPES scholarship, CNPq (proc. 307647/2012-9) and FAPERJ (Proc.E-26/111.147/2011).

Author information

Authors and Affiliations

Military Institute of Engineering, Rio de Janeiro, Brazil
Yasmmin Cortes Martins & Maria Cláudia Cavalcanti
IOC/FIOCRUZ, Rio de Janeiro, Brazil
Fábio Faria da Mota
National Laboratory of Scientific Computing, Petrópolis, Brazil
Yasmmin Cortes Martins

Authors

Yasmmin Cortes Martins
View author publications
You can also search for this author in PubMed Google Scholar
Fábio Faria da Mota
View author publications
You can also search for this author in PubMed Google Scholar
Maria Cláudia Cavalcanti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yasmmin Cortes Martins .

Editor information

Editors and Affiliations

Alexander Technological Educational Institute of Thessaloniki, Thessaloniki, Greece
Emmanouel Garoufallou
Alexander Technological Educational Inst , Rome, Italy
Imma Subirats Coll
Sapienza University of Rome , Rome, Italy
Armando Stellato
Drexel University, Philadelphia, Pennsylvania, USA
Jane Greenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martins, Y.C., da Mota, F.F., Cavalcanti, M.C. (2016). DSCrank: A Method for Selection and Ranking of Datasets. In: Garoufallou, E., Subirats Coll, I., Stellato, A., Greenberg, J. (eds) Metadata and Semantics Research. MTSR 2016. Communications in Computer and Information Science, vol 672. Springer, Cham. https://doi.org/10.1007/978-3-319-49157-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-49157-8_29
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49156-1
Online ISBN: 978-3-319-49157-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics