skip to main content
10.1145/2506182.2506191acmotherconferencesArticle/Chapter ViewAbstractPublication PagessemanticsConference Proceedingsconference-collections
research-article

A practical experience concerning the parallel semantic annotation of a large-scale data collection

Published:04 September 2013Publication History

ABSTRACT

From a computational point of view, the semantic annotation of large-scale data collections is an extremely expensive task. One possible way of dealing with this drawback is to distribute the execution of the annotation algorithm in several computing environments. In this paper, we show how the problem of semantically annotating a large-scale collection of learning objects has been conducted. The terms related to each learning object have been processed. The output was an RDF graph computed from the DBpedia database. According to an initial study, the use of a sequential implementation of the annotation algorithm would require more than 1600 CPU-years to deal with the whole set of learning objects (about 15 millions). For this reason, a framework able to integrate a set of heterogeneous computing infrastructures has been used to execute a new parallel version of the algorithm. As a result, the problem was solved in 178 days.

References

  1. Aragón Institute of Engineering Research (I3A). http://i3a.unizar.es, 2013. Accessed 24 June 2013.Google ScholarGoogle Scholar
  2. AraGrid. http://www.aragrid.es/, 2013. Accessed 24 June 2013.Google ScholarGoogle Scholar
  3. S. Araújo, G.-J. Houben, and D. Schwabe. Linkator: Enriching web pages by automatically adding dereferenceable semantic annotations. In the 10th International Conference on Web Engineering (ICWE 2010), volume 6189 of Lecture Notes in Computer Science, pages 355--369. Springer, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia: A crystallization point for the Web of Data. Journal of Web Semantics, 7(3):154--165, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Carriero and D. Gelernter. Linda in context. Commun. ACM, 32(4):444--458, Apr. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, K. S. McCurley, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. A case for automated large-scale semantic annotation. Web Semantics: Science, Services and Agents on the World Wide Web, 1(1):115--132, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Fabra, S. Hernández, P. Álvarez, and J. Ezpeleta. A framework for the flexible deployment of scientific workflows in grid environments. In the Third International Conference on Cloud Computing, GRIDs, and Virtualization, CLOUD COMPUTING '12, pages 1--8, 2012.Google ScholarGoogle Scholar
  8. A. Garcia, M. Szomszor, H. Alani, and Ó. Corcho. Preliminary results in tag disambiguation using DBpedia. In the 1st International Workshop on Collective Knowledge Capturing and Representation (CKCaR 2009), September 2009.Google ScholarGoogle Scholar
  9. gLite Middleware. http://glite.cern.ch/, 2013. Accessed 24 June 2013.Google ScholarGoogle Scholar
  10. P. Heim, S. Hellmann, J. Lehmann, S. Lohmann, and T. Stegemann. RelFinder: Revealing relationships in RDF knowledge bases. In Semantic Multimedia, volume 5887 of Lecture Notes in Computer Science, pages 182--187. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Hernández, J. Fabra, P. Álvarez, and J. Ezpeleta. A Simulation-based Scheduling Strategy for Scientific Workflows. In the 2nd International Conference on Simulation and Modeling Methodologies, Technologies and Applications, SIMULTECH '12, pages 61--70, 2012.Google ScholarGoogle Scholar
  12. D. Hillmann. Using dublin core. Technical report, Dublin Core Metadata Initiative, Mar. 2005. DCMI Recommended Resource.Google ScholarGoogle Scholar
  13. HTCondor Middleware. http://research.cs.wisc.edu/htcondor/, 2013. Accessed 24 June 2013.Google ScholarGoogle Scholar
  14. Institute for Biocomputation and Physics of Complex Systems (BIFI). http://bifi.es/en/, 2013. Accessed 24 June 2013.Google ScholarGoogle Scholar
  15. P. Kacsuk, G. Dózsa, J. Kovács, R. Lovas, N. Podhorszki, Z. Balaton, and G. Gombás. P-grade: A grid programming environment. J. Grid Comput., 1:171--197, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  16. P. Kacsuk, T. Kiss, and G. Sipos. Solving the grid interoperability problem by P-GRADE portal at workflow level. Futur. Gener. Comp. Syst., 24(7):744--751, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kertész and P. Kacsuk. GMBS: A new middleware service for making grids interoperable. Futur. Gener. Comp. Syst., 26(4):542--553, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Kobilarov, T. Scott, Y. Raimond, S. Oliver, C. Sizemore, M. Smethurst, C. Bizer, and R. Lee. Media meets semantic web: How the BBC uses DBpedia and linked data to make connections. In the 6th European Semantic Web Conference (ESWC 2009), volume 5554 of Lecture Notes in Computer Science, pages 723--737. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Laclavík, M. Ciglan, M. Šeleng, and L. Hluchý. Empowering automatic semantic annotation in grid. In the 7th international conference on Parallel processing and applied mathematics, PPAM'07, pages 302--311. Springer-Verlag, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Laclavík, M. Šeleng, and L. Hluchý. Towards large scale semantic annotation built on mapreduce architecture. In the 8th international conference on Computational Science, Part III, ICCS '08, pages 331--338. Springer-Verlag, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Lama, J. C. Vidal, E. Otero-García, A. Bugarín, and S. Barro. Semantic Linking of Learning Object Repositories to DBpedia. Educational Technology & Society, 15(4):47--61, 2012.Google ScholarGoogle Scholar
  22. Learning Technology Standards Committee. Draft standard for learning object metadata. Technical Report IEEE Standard 1484.12.1-2002, Institute of Electrical and Electronics Engineers, July 2002. Final Draft Standard.Google ScholarGoogle Scholar
  23. P. Mendes, M. Jakob, A. Garcia-Silva, and C. Bizer. DBpedia Spotlight: Shedding light on the Web of Documents. In the 7th International Conference on Semantic Systems (I-SEMANTICS 2011), September 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Mirizzi, A. Ragone, T. D. Noia, and E. D. Sciascio. Semantic tag cloud generation via DBpedia. In the 11th International Conference on E-Commerce and Web Technologies (EC-Web 2010), volume 61 of Lecture Notes in Business Information Processing, pages 36--48. Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  25. P.-O. Östberg and E. Elmroth. GJMF - a composable service-oriented grid job management framework. Futur. Gener. Comp. Syst., 29(1):144--157, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. PireGrid. http://www.piregrid.eu/, 2013. Accessed 24 June 2013.Google ScholarGoogle Scholar
  27. V. Tablan, I. Roberts, H. Cunningham, and K. Bontchev. Gatecloud.net: a platform for large-scale, open-source text processing on the cloud. Philosophical Transactions of the Royal Society A, 371(1983), 2013.Google ScholarGoogle ScholarCross RefCross Ref
  28. United Nations Educational, Scientific and Cultural Organization (UNESCO). Proposed International Standard Nomenclature for Fields of Science and Technology, Mar. 1988. Accessed 24 June 2013.Google ScholarGoogle Scholar
  29. Q. Wu, M. Zhu, Y. Gu, P. Brown, X. Lu, W. Lin, and Y. Liu. A distributed workflow management system with case study of real-life scientific applications on grids. J. Grid Comput., 10:367--393, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A practical experience concerning the parallel semantic annotation of a large-scale data collection

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  I-SEMANTICS '13: Proceedings of the 9th International Conference on Semantic Systems
                  September 2013
                  158 pages
                  ISBN:9781450319720
                  DOI:10.1145/2506182

                  Copyright © 2013 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 4 September 2013

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate40of182submissions,22%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader