ABSTRACT
The Linked Open Data cloud consists of more than 26 billion triples, of which less than 3% are links between knowledge bases. However, such links play a central role in key tasks such as cross-ontology question answering, large-scale inferencing and link-based traversal query execution models. The mere size of the Linked Data Cloud makes manual linking impossible. Consequently, Link Discovery Frameworks have been developed over the last years with the aim of providing means to detect links between knowledge bases automatically. Yet, even the current runtime-optimized frameworks for linking lead to unacceptable runtimes when presented with very large datasets. This paper addresses the time complexity of Link Discovery on very large datasets by presenting and evaluating the parallelization of the time-optimized LIMES framework by means of the MapReduce paradigm.
- Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006.Google Scholar
- Sören Auer, Chris Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. DBpedia: A nucleus for a web of open data. In ISWC, pages 722--735. Springer, 2008. Google ScholarDigital Library
- Liang T. Chen and Deepankar Bairagi. Developing Parallel Programs -- A Discussion of Popular Models. Technical report, Oracle Corporation, September 2010.Google Scholar
- Ali Ebnenasir and Rasoul Beik. Developing parallel programs: A design-oriented perspective. In IWMSE '09, pages 1--8, 2009. Google ScholarDigital Library
- Tarek El-Ghazawi and Francois Cantonnet. Upc performance and potential: a npb experimental study. In Supercomputing, pages 1--26, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. Google ScholarDigital Library
- Hugh Glaser, Ian C. Millard, Won-Kyung Sung, Seungwoo Lee, Pyung Kim, and Beom-Jong You. Research on linked data and co-reference resolution. Technical report, University of Southampton, 2009.Google Scholar
- Tom Heath and Christian Bizer. Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, 2011. Google ScholarDigital Library
- John L. Hennessy and David A. Patterson. Computer Architecture - A Quantitative Approach. Morgan Kaufmann, fourth edition, 2007. Google ScholarDigital Library
- Henry Kasim, Verdi March, Rita Zhang, and Simon See. Survey on Parallel Programming Model. In Network and Parallel Computing, pages 266--275. Springer Berlin/Heidelberg, 2008. Google ScholarDigital Library
- Vanessa Lopez, Victoria Uren, Marta Reka Sabou, and Enrico Motta. Cross ontology query answering on the semantic web: an initial evaluation. In K-CAP, pages 17--24, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- James McCusker and Deborah McGuinness. Towards identity in linked data. In Proceedings of OWL Experiences and Directions Seventh Annual Workshop, 2010.Google Scholar
- Axel-Cyrille Ngonga Ngomo and Sören Auer. A time-efficient approach for large-scale link discovery on the web of data. In IJCAI, 2011.Google Scholar
- Michael J. Quinn. Parallel Programming in C with MPI and OpenMP. McGraw-Hill Education Group, 2003. Google ScholarDigital Library
- Yves Raimond, Christopher Sutton, and Mark Sandler. Automatic interlinking of music datasets on the semantic web. In Proceedings of the 1st Workshop about Linked Data on the Web, 2008.Google Scholar
- François Scharffe, Yanbin Liu, and Chuguang Zhou. Rdf-ai: an architecture for rdf datasets matching, fusion and interlink. In Proc. IJCAI 2009 workshop on Identity, reference, and knowledge representation (IR-KR), Pasadena (CA US), 2009.Google Scholar
- Julius Volz, Christian Bizer, Martin Gaedke, and Georgi Kobilarov. Discovering and maintaining links on the web of data. In ISWC, pages 650--665, 2009. Google ScholarDigital Library
Index Terms
- Parallelizing LIMES for large-scale link discovery
Recommendations
RDF, Jena, SparQL and the 'Semantic Web'
SIGUCCS '09: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaborationThe Resource Description Format (RDF) is used to represent information modeled as a "graph": a set of individual objects, along with a set of connections among those objects. In that role, RDF is one of the pillars of the so-called Semantic Web. This ...
A declarative framework for semantic link discovery over relational data
WWW '09: Proceedings of the 18th international conference on World wide webIn this paper, we present a framework for online discovery of semantic links from relational data. Our framework is based on declarative specification of the linkage requirements by the user, that allows matching data items in many real-world scenarios. ...
Towards Transfer Learning of Link Specifications
ICSC '13: Proceedings of the 2013 IEEE Seventh International Conference on Semantic ComputingOver the last years, link discovery frameworks have been employed successfully to create links between knowledge bases. Consequently, repositories of high-quality link specifications have been created and made available on the Web. The basic question ...
Comments