Abstract:
Information about an entity can hardly be assumed to be given in one single document, created in a single instance of time. Rather, it is reasonable to assume that inform...Show MoreMetadata
Abstract:
Information about an entity can hardly be assumed to be given in one single document, created in a single instance of time. Rather, it is reasonable to assume that information is spread over multiple documents and created/enriched over time-for instance through crowdsourcing facts or mined from social network streams, one after the other. In this work, we consider the problem of assembling entity-centric information out of input comprising small pieces of information; provided in form of JSON document snippets. The final goal is to create a document that (possibly fully) describes an entity by putting related fragments together. What makes this task challenging is the lack of evidence telling which fragments belong together and, hence, can be safely combined. We focus on deciding this question using statistics of the already seen fragments, to justify if a join is reasonable or not. We evaluate our approach using real-world datasets and show that we can achieve high precision and recall.
Date of Conference: 16-20 May 2016
Date Added to IEEE Xplore: 23 June 2016
Electronic ISBN:978-1-5090-2109-3