Abstract
The ongoing trend towards open data embraced by the Semantic Web has started to produce a large number of data sources. These data sources are published using RDF vocabularies, and it is possible to navigate throughout the data due to their graph topology. This paper presents LinksB2N, an algorithm for discovering information overlaps in RDF data repositories and performing data integration with no human intervention over data sets that partially share the same domain.
LinksB2N identifies equivalent RDF resources from different data sets with several degrees of confidence. The algorithm relies on a novel approach that uses clustering techniques to analyze the distribution of unique objects that contain overlapping information in different data graphs. Our contribution is illustrated in the context of the Market Blended Insight project by applying the LinksB2N algorithm to data sets in the order of hundreds of millions of RDF triples containing relevant information in the domain of business to business (B2B) marketing analysis.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alani, H., Dasmahapatra, S., Gibbins, N., Glaser, H., Harris, S., Kalfoglou, Y., O’Hara, K., Shadbolt, N.: Managing reference: Ensuring referential integrity of ontologies for the semantic web. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 317–334. Springer, Heidelberg (2002)
Arens, Y., Knoblock, C.A.: Sims: Retrieving and integrating information from multiple sources. In: SIGMOD Conference, pp. 562–563 (1993)
Correndo, G., Alani, H.: Collaborative support for community data sharing. In: The 2nd Workshop on Collective Intelligence in Semantic Web and Social Networks (December 2008)
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Jaffri, A., Glaser, H., Millard, I.: Uri identity management for semantic web data integration and linkage. In: 3rd International Workshop On Scalable Semantic Web Knowledge Base Systems, Springer, Heidelberg (2007)
Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. Knowledge Engineering Review 18(1), 1–31 (2003)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8 (1966)
Mena, E., Illarramendi, A., Kashyap, V., Sheth, A.P.: Observer: An approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distributed and Parallel Databases 8(2), 223–271 (2000)
Newcombe, H.B., Kennedy, J.M.: Record linkage: making maximum use of the discriminating power of identifying information. Commun. ACM 5(11), 563–566 (1962)
Preece, A.D., Hui, K.-y., Gray, W.A., Marti, P., Bench-Capon, T.J.M., Jones, D.M., Cui, Z.: The kraft architecture for knowledge fusion and transformation. Knowl.-Based Syst. 13(2-3), 113–120 (2000)
Salvadores, M., Zuo, L., Imtiaz, S.M.H., Darlington, J., Gibbins, N., Shadbolt, N., Dobree, J.: Market blended insight: Modeling propensity to buy with the semantic web. In: International Semantic Web Conference, pp. 777–789 (2008)
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - A Link Discovery Framework for the Web of Data. In: 18th International World Wide Web Conference (2009)
Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Computer 25(3), 38–49 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Salvadores, M., Correndo, G., Rodriguez-Castro, B., Gibbins, N., Darlington, J., Shadbolt, N.R. (2009). LinksB2N: Automatic Data Integration for the Semantic Web. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2009. OTM 2009. Lecture Notes in Computer Science, vol 5871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05151-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-05151-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05150-0
Online ISBN: 978-3-642-05151-7
eBook Packages: Computer ScienceComputer Science (R0)