skip to main content
Free access

Database and information-retrieval methods for knowledge discovery

Published: 01 April 2009 Publication History


Comprehensive knowledge bases would tap the Web's deepest information sources and relationships to address questions beyond today's keyword-based search engines.


Agichtein, E. Scaling information extraction to large document collections. IEEE Data Engineering Bulletin 28, 4 (Dec. 2005), 3--10.
Amer-Yahia, S, and Lalmas, M. XML search: Languages, INEX, and scoring. ACM SIGMOD Record 35, 4 (Mar. 2006), 16--23.
Anyanwu, K., Maduko, A., and Sheth, A. SPARQ2L: Towards support for subgraph extraction queries in RDF databases. In Proceedings of the 16th International Conference on World Wide Web (Banff, Canada, May 8--12). ACM Press, New York, 2007, 797--806.
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. DBpedia: A nucleus for a Web of open data., In Proceedings of the Sixth International Semantic Web Conference (Pusan, Korea, Nov. 11--15). Springer, Berlin/Heidelberg, 2007, 722--735.
Banko, M., Cafarella, M., Soderland, S., Broadhead, M., and Etzioni, O. Open information extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (Hyderabad, India, Jan. 6--12, 2007), 2670--2676;
Cafarella, M., Re, C., Suciu, D., and Etzioni, O. Structured querying of Web text data: A technical challenge. In Proceedings of the Third Biennial Conference on Innovative Data Systems Research (Asilomar, CA, Jan. 7--10, 2007), 225--234;
Chakrabarti, S. Dynamic personalized PageRank in entity-relation graphs. In Proceedings of the 16th International Conference on World Wide Web (Banff, Canada, May 8--12). ACM Press, New York, 2007, 571--580.
Cheng, T., Yan, X., and Chang, K. Entity Rank Searching entities directly and holistically. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria, Sept. 23--27). ACM Press, New York, 2007, 387--398.
Cohen, W. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Seattle, June 2--4). ACM Press, New York. 1998, 201--212.
Cunningham, H. An introduction to information extraction. In Encyclopedia of Language and Linguistics, Second Edition, K. Brown et al., Eds., Elsevier, Amsterdam, 2005.
DeRose, P., Shen, W., Chen, F., Doan, A.-H., and Ramakrishnan, R. Building structured Web community portals: A top-down, compositional, and incremental approach. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria, Sept. 23--27). ACM Press, New York. 2007, 399--410.
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D., and Yates, A. Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence 165, 1 (June 2005), 91--134.
Fuhr, N. and Rölleke, T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems 15, 1 (Jan. 1997), 32--66.
Fuhr, N. Probabilistic datalog: A logic for powerful retrieval methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle. July 9--13). ACM Press, New York 1995, 282--290.
Getoor, L. and Taskar, B., Eds. Introduction to Statistical Relational Learning, MIT Press, Cambridge, MA, 2007.
Ilyas, I., Beskales, G., and Soliman, M. A survey of top-k query-processing techniques in relational database systems. ACM Computing Surveys 40, 1 (Oct. 2008), 1--58.
Ipeirotis, P., Agichtein, E., Jain, P., and Gravano, L. Towards a query optimizer for text-centric tasks. ACM Transactions on Database Systems 32, 4 (Nov. 2007).
Kasneci, G., Suchanek, F., Ifrim, G., Ramanath, M., and Weikum, G. NAGA: Searching and ranking knowledge. In Proceedings of the 24th International Conference on Data Engineering (Cancun, Mexico, Apr. 7--12). IEEE Computer Society, Washington, D.C., 2008, 953--62.
Navarro, G. and Baeza-Yates, R. Proximal nodes: A model to query document databases by content and structure. ACM Transactions on Information Systems 15, 4(1997), 400--435.
Nie, Z., Ma, Y., Shi, S., Wen, J.-R., and Ma, W.-Y. Web object retrieval. In Proceedings of the 16th International Conference on World Wide Web (Banff, Canada, May 8--12). ACM Press, New York, 2007, 81--90.
Sarawagi, S. Information extraction. Foundations and Trends in Databases 1, 3 (2008), 261--377.
Shen, W., Doan, A.H., Naughton, J., and Ramakrishnan, R. Declarative information extraction using datalog with embedded extraction predicates. In Proceedings of the 33rd International Conference on Very Large Databases (Vienna, Austria, Sept. 23--27). ACM Press, New York, 2007, 1033--1044.
Suchanek, F., Kasneci, G., and Weikum, G. YAGO: A large ontology from Wikipedia and WordNet. Journal of Web Semantics 6, 3 (2008), 203--217.
Suchanek, F., Kasneci, G., and Weikum, G. YAGO: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (Banff, Canada, May 8--12). ACM Press, New York. 2007, 697--706.
Theobald, M., Bast, H., Majumdar, D., Schenkel, R., and Weikum, G. TopX: Efficient and versatile top-k query processing for semistructured data. VLDB Journal 17, 1 (Jan. 2008), 81--115.
Wu, F. and Weld, D. Automatically refining the Wikipedia infobox ontology. In Proceedings of the 17th International Conference on World Wide Web (Beijing, Apr. 21--25). ACM Press, New York, 2008. 635--644.
Wu, F. and Weld, D. Autonomously semantifying Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (Lisbon, Nov. 6--10). ACM Press, New York, 2007, 41--50.
Zhu, J., Nie, Z., Wen, J.-R., Zhang, Bo, and Ma, W.-Y. Simultaneous record detection and attribute labeling in Web data extraction. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Philadelphia. PA, Aug. 20--23). ACM Press, New York, 2006, 494--503.

Cited By

View all
  • (2023)AI Driving Game Changing Trends in Project Delivery and Enterprise PerformanceProceedings of World Conference on Artificial Intelligence: Advances and Applications10.1007/978-981-99-5881-8_4(35-49)Online publication date: 2-Nov-2023
  • (2022)Question Answering (QA) BasicsVisual Question Answering10.1007/978-981-19-0964-1_3(27-31)Online publication date: 13-May-2022
  • (2021)Semantic Relations Between Nominals, Second EditionSynthesis Lectures on Human Language Technologies10.2200/S01078ED2V01Y202002HLT04914:1(1-234)Online publication date: 7-Apr-2021
  • Show More Cited By



Information & Contributors


Published In

cover image Communications of the ACM
Communications of the ACM  Volume 52, Issue 4
A Direct Path to Dependable Software
April 2009
134 pages
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2009
Published in CACM Volume 52, Issue 4


Request permissions for this article.

Check for updates


  • Research-article
  • Popular
  • Refereed


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)202
  • Downloads (Last 6 weeks)13
Reflects downloads up to 01 Mar 2025

Other Metrics


Cited By

View all
  • (2023)AI Driving Game Changing Trends in Project Delivery and Enterprise PerformanceProceedings of World Conference on Artificial Intelligence: Advances and Applications10.1007/978-981-99-5881-8_4(35-49)Online publication date: 2-Nov-2023
  • (2022)Question Answering (QA) BasicsVisual Question Answering10.1007/978-981-19-0964-1_3(27-31)Online publication date: 13-May-2022
  • (2021)Semantic Relations Between Nominals, Second EditionSynthesis Lectures on Human Language Technologies10.2200/S01078ED2V01Y202002HLT04914:1(1-234)Online publication date: 7-Apr-2021
  • (2020)Data Science Techniques in Knowledge-Intensive Business ProcessesInternational Journal of Data Analytics10.4018/IJDA.20200101041:1(52-67)Online publication date: Jan-2020
  • (2016)On efficient conditioning of probabilistic relational databasesKnowledge-Based Systems10.1016/j.knosys.2015.10.01792:C(112-126)Online publication date: 15-Jan-2016
  • (2016)Identification of category associations using a multilabel classifierExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.05.03961:C(327-342)Online publication date: 1-Nov-2016
  • (2015)The Web Within: Leveraging Web Standards and Graph Analysis to Enable Application-Level Integration of Institutional DataTransactions on Large-Scale Data- and Knowledge-Centered Systems XIX10.1007/978-3-662-46562-2_2(26-54)Online publication date: 24-Feb-2015
  • (2014)Mining Knowledge on Relationships between Objects from the WebIEICE Transactions on Information and Systems10.1587/transinf.E97.D.77E97.D:1(77-88)Online publication date: 2014
  • (2014)Querying Regular Graph PatternsJournal of the ACM10.1145/255990561:1(1-54)Online publication date: 1-Jan-2014
  • (2014)Assigning global relevance scores to DBpedia facts2014 IEEE 30th International Conference on Data Engineering Workshops10.1109/ICDEW.2014.6818334(248-253)Online publication date: Mar-2014
  • Show More Cited By

View Options

View options


View or Download as a PDF file.



View online with eReader.


Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access






Share this Publication link

Share on social media