Conferences >2016 IEEE 32nd International ...

Playing LEGO with JSON: Probabilistic joins over attribute-value fragments

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Information about an entity can hardly be assumed to be given in one single document, created in a single instance of time. Rather, it is reasonable to assume that inform...Show More

Metadata

Abstract:

Information about an entity can hardly be assumed to be given in one single document, created in a single instance of time. Rather, it is reasonable to assume that information is spread over multiple documents and created/enriched over time-for instance through crowdsourcing facts or mined from social network streams, one after the other. In this work, we consider the problem of assembling entity-centric information out of input comprising small pieces of information; provided in form of JSON document snippets. The final goal is to create a document that (possibly fully) describes an entity by putting related fragments together. What makes this task challenging is the lack of evidence telling which fragments belong together and, hence, can be safely combined. We focus on deciding this question using statistics of the already seen fragments, to justify if a join is reasonable or not. We evaluate our approach using real-world datasets and show that we can achieve high precision and recall.

Published in: 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW)

Date of Conference: 16-20 May 2016

Date Added to IEEE Xplore: 23 June 2016

Electronic ISBN:978-1-5090-2109-3

DOI: 10.1109/ICDEW.2016.7495642

Conference Location: Helsinki, Finland