ABSTRACT
From a computational point of view, the semantic annotation of large-scale data collections is an extremely expensive task. One possible way of dealing with this drawback is to distribute the execution of the annotation algorithm in several computing environments. In this paper, we show how the problem of semantically annotating a large-scale collection of learning objects has been conducted. The terms related to each learning object have been processed. The output was an RDF graph computed from the DBpedia database. According to an initial study, the use of a sequential implementation of the annotation algorithm would require more than 1600 CPU-years to deal with the whole set of learning objects (about 15 millions). For this reason, a framework able to integrate a set of heterogeneous computing infrastructures has been used to execute a new parallel version of the algorithm. As a result, the problem was solved in 178 days.
- Aragón Institute of Engineering Research (I3A). http://i3a.unizar.es, 2013. Accessed 24 June 2013.Google Scholar
- AraGrid. http://www.aragrid.es/, 2013. Accessed 24 June 2013.Google Scholar
- S. Araújo, G.-J. Houben, and D. Schwabe. Linkator: Enriching web pages by automatically adding dereferenceable semantic annotations. In the 10th International Conference on Web Engineering (ICWE 2010), volume 6189 of Lecture Notes in Computer Science, pages 355--369. Springer, July 2010. Google ScholarDigital Library
- C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia: A crystallization point for the Web of Data. Journal of Web Semantics, 7(3):154--165, 2009. Google ScholarDigital Library
- N. Carriero and D. Gelernter. Linda in context. Commun. ACM, 32(4):444--458, Apr. 1989. Google ScholarDigital Library
- S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, K. S. McCurley, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. A case for automated large-scale semantic annotation. Web Semantics: Science, Services and Agents on the World Wide Web, 1(1):115--132, 2003.Google ScholarCross Ref
- J. Fabra, S. Hernández, P. Álvarez, and J. Ezpeleta. A framework for the flexible deployment of scientific workflows in grid environments. In the Third International Conference on Cloud Computing, GRIDs, and Virtualization, CLOUD COMPUTING '12, pages 1--8, 2012.Google Scholar
- A. Garcia, M. Szomszor, H. Alani, and Ó. Corcho. Preliminary results in tag disambiguation using DBpedia. In the 1st International Workshop on Collective Knowledge Capturing and Representation (CKCaR 2009), September 2009.Google Scholar
- gLite Middleware. http://glite.cern.ch/, 2013. Accessed 24 June 2013.Google Scholar
- P. Heim, S. Hellmann, J. Lehmann, S. Lohmann, and T. Stegemann. RelFinder: Revealing relationships in RDF knowledge bases. In Semantic Multimedia, volume 5887 of Lecture Notes in Computer Science, pages 182--187. Springer, 2009. Google ScholarDigital Library
- S. Hernández, J. Fabra, P. Álvarez, and J. Ezpeleta. A Simulation-based Scheduling Strategy for Scientific Workflows. In the 2nd International Conference on Simulation and Modeling Methodologies, Technologies and Applications, SIMULTECH '12, pages 61--70, 2012.Google Scholar
- D. Hillmann. Using dublin core. Technical report, Dublin Core Metadata Initiative, Mar. 2005. DCMI Recommended Resource.Google Scholar
- HTCondor Middleware. http://research.cs.wisc.edu/htcondor/, 2013. Accessed 24 June 2013.Google Scholar
- Institute for Biocomputation and Physics of Complex Systems (BIFI). http://bifi.es/en/, 2013. Accessed 24 June 2013.Google Scholar
- P. Kacsuk, G. Dózsa, J. Kovács, R. Lovas, N. Podhorszki, Z. Balaton, and G. Gombás. P-grade: A grid programming environment. J. Grid Comput., 1:171--197, 2003.Google ScholarCross Ref
- P. Kacsuk, T. Kiss, and G. Sipos. Solving the grid interoperability problem by P-GRADE portal at workflow level. Futur. Gener. Comp. Syst., 24(7):744--751, 2008. Google ScholarDigital Library
- A. Kertész and P. Kacsuk. GMBS: A new middleware service for making grids interoperable. Futur. Gener. Comp. Syst., 26(4):542--553, 2010. Google ScholarDigital Library
- G. Kobilarov, T. Scott, Y. Raimond, S. Oliver, C. Sizemore, M. Smethurst, C. Bizer, and R. Lee. Media meets semantic web: How the BBC uses DBpedia and linked data to make connections. In the 6th European Semantic Web Conference (ESWC 2009), volume 5554 of Lecture Notes in Computer Science, pages 723--737. Springer, 2009. Google ScholarDigital Library
- M. Laclavík, M. Ciglan, M. Šeleng, and L. Hluchý. Empowering automatic semantic annotation in grid. In the 7th international conference on Parallel processing and applied mathematics, PPAM'07, pages 302--311. Springer-Verlag, 2008. Google ScholarDigital Library
- M. Laclavík, M. Šeleng, and L. Hluchý. Towards large scale semantic annotation built on mapreduce architecture. In the 8th international conference on Computational Science, Part III, ICCS '08, pages 331--338. Springer-Verlag, 2008. Google ScholarDigital Library
- M. Lama, J. C. Vidal, E. Otero-García, A. Bugarín, and S. Barro. Semantic Linking of Learning Object Repositories to DBpedia. Educational Technology & Society, 15(4):47--61, 2012.Google Scholar
- Learning Technology Standards Committee. Draft standard for learning object metadata. Technical Report IEEE Standard 1484.12.1-2002, Institute of Electrical and Electronics Engineers, July 2002. Final Draft Standard.Google Scholar
- P. Mendes, M. Jakob, A. Garcia-Silva, and C. Bizer. DBpedia Spotlight: Shedding light on the Web of Documents. In the 7th International Conference on Semantic Systems (I-SEMANTICS 2011), September 2011. Google ScholarDigital Library
- R. Mirizzi, A. Ragone, T. D. Noia, and E. D. Sciascio. Semantic tag cloud generation via DBpedia. In the 11th International Conference on E-Commerce and Web Technologies (EC-Web 2010), volume 61 of Lecture Notes in Business Information Processing, pages 36--48. Springer, 2010.Google ScholarCross Ref
- P.-O. Östberg and E. Elmroth. GJMF - a composable service-oriented grid job management framework. Futur. Gener. Comp. Syst., 29(1):144--157, 2013. Google ScholarDigital Library
- PireGrid. http://www.piregrid.eu/, 2013. Accessed 24 June 2013.Google Scholar
- V. Tablan, I. Roberts, H. Cunningham, and K. Bontchev. Gatecloud.net: a platform for large-scale, open-source text processing on the cloud. Philosophical Transactions of the Royal Society A, 371(1983), 2013.Google ScholarCross Ref
- United Nations Educational, Scientific and Cultural Organization (UNESCO). Proposed International Standard Nomenclature for Fields of Science and Technology, Mar. 1988. Accessed 24 June 2013.Google Scholar
- Q. Wu, M. Zhu, Y. Gu, P. Brown, X. Lu, W. Lin, and Y. Liu. A distributed workflow management system with case study of real-life scientific applications on grids. J. Grid Comput., 10:367--393, 2012. Google ScholarDigital Library
Index Terms
- A practical experience concerning the parallel semantic annotation of a large-scale data collection
Recommendations
Semantic Annotation for Web Services Based on DBpedia
SOSE '13: Proceedings of the 2013 IEEE Seventh International Symposium on Service-Oriented System EngineeringThe vast majority of Web services on the Internet lack explicit and sufficient semantic information. As a result, we cannot provide all the relevant services during service discovery, and have difficulty in service composition. In this paper, we propose ...
Integration of grid, cluster and cloud resources to semantically annotate a large-sized repository of learning objects
The Universia repository is composed of more than 15 million of educational resources. The lack of metadata describing these resources complicates their classification, search and recovery. To overcome this drawback, it was decided to semantically ...
Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation
SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic SystemsIn this paper we study whether state-of-the-art techniques for multi-domain and multilingual entity linking can be ported to the clinical domain. To do so, we compare two known entity linking systems, BabelFly and TagMe, that leverage on Wikipedia and ...
Comments