skip to main content
10.1145/3208352.3208357acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TrueWeb: A Proposal for Scalable Semantically-Guided Data Management and Truth Finding in Heterogeneous Web Sources

Published:10 June 2018Publication History

ABSTRACT

We envision a responsible web environment, termed TrueWeb, where a user should be able to find out whether any sentence he or she encounters in the web is true or false. The user should be able to track the provenance of any sentence or paragraph in the web. The target of TrueWeb is to compose factual knowledge from Internet resources about any subject of interest and present the collected knowledge in chronological order and distribute facts spatially and temporally as well as assign some belief factor for each fact. Another important target of TrueWeb is to be able to identify whether a statement in the Internet is true or false. The aim is to create an Internet infrastructure that, for each piece of published information, will be able to identify the truthfulness (or the degree of truthfulness) of that piece of information.

References

  1. Daniel J. Abadi, Adam Marcus, Samuel Madden, and Kate Hollenbach. 2009. SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDB J. 18, 2 (2009), 385--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In WikiSym. 26:1--26:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Thomas Adler and Luca de Alfaro. 2007. A content-driven reputation system for the wikipedia. In WWW. 261--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrea Ballatore, Michela Bertolotto, and David C. Wilson. 2012. Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowledge and Information Systems (Oct. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tim Berners-Lee. 2006. Linked Data - Design Issues. (2006).Google ScholarGoogle Scholar
  6. Christian Bizer, Tom Heath, and Tim Berners-Lee. 2009. Linked Data - The Story So Far. Int. J. Semantic Web Information Systems 5, 3 (2009), 1--22.Google ScholarGoogle ScholarCross RefCross Ref
  7. Liliana Calderon-Benavides, Cristina Gonzalez-Caro, and Ricardo Baeza-Yates. 2010. Towards a Deeper Understanding of the User's Query Intent. SIGIR (2010).Google ScholarGoogle Scholar
  8. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam Hruschka Jr., and Tom Mitchell. 2010. Toward an Architecture for Never-Ending Language Learning. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Si-Chi Chin, W. Nick Street, Padmini Srinivasan, and David Eichmann. 2010. Detecting Wikipedia vandalism with active learning and statistical language models. In WICOW. 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Philippe Cudre-Mauroux, Iliya Enchev, Sever Fundatureanu, Paul Groth, Albert Haque, Andreas Harth, Felix Leif Keppmann, Daniel Miranker, Juan F. Sequeda, and Marcin Wylot. 2013. NoSQL databases for RDF: An empirical evaluation. ISWC (2013), 310--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. XL Dong, L Berti-Equille, and Divesh Srivastava. 2013. Data Fusion: Resolving Conflicts from Multiple Sources. Handbook of Data Quality (2013).Google ScholarGoogle Scholar
  12. Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating Conflicting Data: The Role of Source Dependence. VLDB 2, 1 (Aug. 2009), 550--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gregory Druck, Gerome Miklau, and Andrew McCallum. 2008. Learning to Predict the Quality of Contributions to Wikipedia. AAAI 8 (2008), 983--1001.Google ScholarGoogle Scholar
  14. MJ Egenhofer. 2002. Toward the semantic geospatial web. SIGSPATIAL (2002), 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Orri Erling. 2010. Directions and Challenges of SemData. In VLDB Industry position paper.Google ScholarGoogle Scholar
  16. David C. Faye, Olivier Cure, and Guillaume Blin. 2012. A survey of RDF storage approaches. ARIMA J. 15 (2012), 11--35.Google ScholarGoogle Scholar
  17. Miriam Fernández, Iván Cantador, Vanesa López, David Vallet, Pablo Castells, and Enrico Motta. 2011. Semantically enhanced Information Retrieval: An ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web 9, 4 (Dec. 2011), 434--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Michael J. Franklin, Alon Y. Halevy, and David Maier. 2005. From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34, 4 (2005), 27--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alban Galland, Serge Abiteboul, Amélie Marian, and Pierre Senellart. 2010. Corroborating Information from Disagreeing Views. In WSDM. ACM, New York, NY, USA, 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fausto Giunchiglia, Biswanath Dutta, Vincenzo Maltese, and Feroz Farazi. 2012. A Facet-Based Methodology for the Construction of a Large-Scale Geospatial Ontology. Journal on Data Semantics 1, 1 (March 2012), 57--73.Google ScholarGoogle ScholarCross RefCross Ref
  21. Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 24, 2 (2009), 8--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Andreas Harth, Katja Hose, and Ralf Schenkel. 2012. Database techniques for linked data management. In SIGMOD Conference. 597--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Krzysztof Janowicz, Carsten Keß ler, Mirco Schwarz, Marc Wilkes, Ilija Panov, Martin Espeter, and B Boris. 2009. Algorithm, Implementation and Application of the SIM-DL Similarity Server. GEOS (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Krzysztof Janowicz, Simon Scheider, Todd Pehle, and Glen Hart. 2012. Geospatial semantics and linked spatiotemporal dataâĂŞPast, present, and future. Semantic Web 0 (2012), 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zoi Kaoudi and Ioana Manolescu. 2015. RDF in the clouds: a survey. VLDB Journal 24, 1 (2015), 67--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Aniket Kittur, Bongwon Suh, Bryan A. Pendleton, and Ed H. Chi. 2007. He says, she says: conflict and coordination in Wikipedia. In CHI. 453--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM 46, 5 (Sept. 1999), 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Justin Levandoski and Mohamed Mokbel. 2009. RDF Data-Centric Storage. In ICWS. 911--918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lipyeow Lim, H Wang, and Min Wang. 2009. Semantic queries in databases: problems and challenges. CIKM (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lipyeow Lim, Haixun Wang, and Min Wang. 2013. Semantic Queries by Example Categories and Subject Descriptors. EDBT (2013), 347--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Amgad Madkour, Walid G. Aref, and Saleh Basalamah. 2013. Knowledge cubes: A proposal for scalable and semantically-guided management of Big Data. In 2013 IEEE International Conference on Big Data. 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  32. Amgad Madkour, Walid G. Aref, Mohamed Mokbel, and Saleh Basalamah. 2015. Geo-tagging Non-spatial Concepts. (2015), 31--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Christoph Mülligann, Krzysztof Janowicz, Mao Ye, and WC Lee. 2011. Analyzing the spatial-semantic interaction of points of interest in volunteered geographic information. Spatial information theory (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Tamer Özsu. 2016. A Survey of RDF Data Management Systems. arXiv (2016).Google ScholarGoogle Scholar
  35. Nikolaos Papailiou, Ioannis Konstantinou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris. 2013. H2RDF +: High-performance Distributed Joins over Large-scale RDF Graphs. BigData (2013).Google ScholarGoogle Scholar
  36. Jeff Pasternack and Dan Roth. 2012. Latent credibility analysis. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Martin Potthast, Benno Stein, and Robert Gerling. 2008. Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval. Springer, 663--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. B R Kuldeep Reddy and P S Kumar. 2010. Optimizing SPARQL queries over the Web of Linked Data. SemData@VLDB (2010).Google ScholarGoogle Scholar
  39. Dave Reynolds. 2009. Uncertainty Reasoning for Linked Data. In URSW. 85--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stefan Riezler, Yi Liu, and Alexander Vasserman. 2008. Translating queries into snippets for improved query expansion. (Aug. 2008), 737--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kurt Rohloff and Richard E. Schantz. 2010. High-performance, massively scalable distributed systems using the MapReduce software framework. PSIEtA (2010), 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sherif Sakr and Ghazi Al-Naymat. 2009. Relational Processing of RDF Queries: A Survey. SIGMOD Record (6 2009), 23--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hamid Haidarian Shahri. 2010. Semantic Search in Linked Data: Opportunities and Challenges. Twenty-Fourth AAAI Conference on Artificial ... (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Lefteris Sidirourgos, Romulo Goncalves, Martin L. Kersten, Niels Nes, and Stefan Manegold. 2008. Column-store support for RDF data management: not all swans are white. PVLDB 1, 2 (2008), 1553--1563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Koen Smets, Bart Goethals, and Brigitte Verdonk. 2008. Automatic vandalism detection in Wikipedia: Towards a machine learning approach. In AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy. 43--48.Google ScholarGoogle Scholar
  46. Partha Pratim Talukdar, Marie Jacob, Muhammad Salman Mehmood, Koby Crammer, Zachary G. Ives, Fernando Pereira, and Sudipto Guha. 2008. Learning to create data-integrating queries. Proceedings of the VLDB Endowment 1, 1 (Aug. 2008), 785--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Petros Tsialiamanis, Lefteris Sidirourgos, Irini Fundulaki, Vassilis Christophides, and Peter Boncz. 2012. Heuristics-based query optimisation for SPARQL. In EDBT. 324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ba-Quy Vuong, Ee-Peng Lim, Aixin Sun, Minh-Tam Le, and Hady Wirawan Lauw. 2008. On ranking controversies in wikipedia: models and evaluation. In WSDM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Dong Wang, Tarek F. Abdelzaher, Lance M. Kaplan, and Charu C. Aggarwal. 2013. Recursive Fact-Finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications.. In ICDCS. 530--539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2007. Truth Discovery with Multiple Conflicting Information Providers on the Web. In KDD. ACM, New York, NY, USA, 1048--1052. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. VLDB (2 2013), 265--276. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TrueWeb: A Proposal for Scalable Semantically-Guided Data Management and Truth Finding in Heterogeneous Web Sources

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  SBD'18: Proceedings of the International Workshop on Semantic Big Data
                  June 2018
                  40 pages
                  ISBN:9781450357791
                  DOI:10.1145/3208352

                  Copyright © 2018 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 10 June 2018

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed limited

                  Acceptance Rates

                  SBD'18 Paper Acceptance Rate5of9submissions,56%Overall Acceptance Rate30of54submissions,56%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader