demonstration

Automatically building probabilistic databases from the web

Authors:
Lorenzo Blanco

Università Roma Tre, Roma, Italy

Università Roma Tre, Roma, Italy
View Profile

,
Mirko Bronzi

Università Roma Tre, Roma, Italy

Università Roma Tre, Roma, Italy
View Profile

,
Valter Crescenzi

Università Roma Tre, Roma, Italy

Università Roma Tre, Roma, Italy
View Profile

,
Paolo Merialdo

Università Roma Tre, Roma, Italy

Università Roma Tre, Roma, Italy
View Profile

,
Paolo Papotti

Università Roma Tre, Roma, Italy

Università Roma Tre, Roma, Italy
View Profile

WWW '11: Proceedings of the 20th international conference companion on World wide webMarch 2011Pages 185–188https://doi.org/10.1145/1963192.1963285

Published:28 March 2011Publication History

WWW '11: Proceedings of the 20th international conference companion on World wide web

Pages 185–188

ABSTRACT

A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.

References

M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007. Google ScholarDigital Library
L. Blanco, M. Bronzi, V. Crescenzi, P. Merialdo, and P. Papotti. Redundancy-driven web data extraction and integration. In WebDB, 2010. Google ScholarDigital Library
L. Blanco, V. Crescenzi, P. Merialdo, and P. Papotti. Supporting the automatic construction of entity aware search engines. In WIDM, pages 149--156, 2008. Google ScholarDigital Library
L. Blanco, V. Crescenzi, P. Merialdo, and P. Papotti. Probabilistic models to reconcile complex data from inaccurate data sources. In CAiSE, pages 83--97, 2010. Google ScholarDigital Library
M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008. Google ScholarDigital Library
N. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7):86--94, 2009. Google ScholarDigital Library
X. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 3(1):1358--1369, 2010. Google ScholarDigital Library

Index Terms

Automatically building probabilistic databases from the web
1. Information systems
  1. World Wide Web
    1. Web interfaces
      1. Browsers

Recommendations

Characterizing the uncertainty of web data: models and experiences
WebQuality '11: Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality

An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting ...
Read More
Wikxhibit: Using HTML and Wikidata to Author Applications that Link Data Across the Web
UIST '22: Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

Wikidata is a companion to Wikipedia that captures a substantial part of the information about most Wikipedia entities in machine-readable structured form. In addition to directly representing information from Wikipedia itself, Wikidata also cross-...
Read More
Automatically learning gazetteers from the deep web
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Wrapper induction faces a dilemma: To reach web scale, it requires automatically generated examples, but to produce accurate results, these examples must have the quality of human annotations. We resolve this conflict with AMBER, a system for fully ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '11: Proceedings of the 20th international conference companion on World wide web
March 2011
552 pages
ISBN:9781450306379
DOI:10.1145/1963192
General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data integration
probabilistic data
web data extraction
Qualifiers
- demonstration
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 158
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatically building probabilistic databases from the web

WWW '11: Proceedings of the 20th international conference companion on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Characterizing the uncertainty of web data: models and experiences

Wikxhibit: Using HTML and Wikidata to Author Applications that Link Data Across the Web

Automatically learning gazetteers from the deep web