Extending Datalog Intelligence

Kimelfeld, Benny

doi:10.1007/978-3-319-22002-4_1

Benny Kimelfeld¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9209))

Included in the following conference series:

International Conference on Web Reasoning and Rule Systems

419 Accesses
1 Citations

Abstract

Prominent sources of Big Data include technological and social trends, such as mobile computing, blogging, and social networking. The means to analyse such data are becoming more accessible with the development of business models like cloud computing, open-source and crowd sourcing. But that data have characteristics that pose challenges to traditional database systems. Due to the uncontrolled nature by which data is produced, much of it is free text, often in informal natural language, leading to computing environments with high levels of uncertainty and error. In this talk I will offer a vision of a database system that aims to facilitate the development of modern data-centric applications, by naturally unifying key functionalities of databases, text analytics, machine learning and artificial intelligence. I will also describe my past research towards pursuing the vision by extensions of Datalog — a well studied rule-based programming paradigm that features an inherent integration with the database, and has a robust declarative semantics. These extensions allow for incorporating information extraction from text, and for specifying statistical models by probabilistic programming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abiteboul, S., Deutch, D., Vianu, V.: Deduction with contradictions in Datalog. In: ICDT, pp. 143–154 (2014)
Google Scholar
Aone, C., Ramos-Santacruz, M.: Rees: a large-scale relation and event extraction system. In: ANLP, pp. 76–83 (2000)
Google Scholar
Appelt, D.E., Onyshkevych, B.: The common pattern specification language. In: Proceedings of the TIPSTER Text Program: Phase III, pp. 23–30, Baltimore, Maryland, USA (1998)
Google Scholar
Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS, pp. 68–79 (1999)
Google Scholar
Baral, C., Gelfond, M., Rushton, N.: Probabilistic reasoning with answer sets. Theory Pract. Log. Program. 9(1), 57–144 (2009)
Article MathSciNet Google Scholar
Barany, V., Cate, B.T., Kimelfeld, B., Olteanu, D., Vagena, Z.: Declarative statistical modeling with datalog (2014). arXiv preprint arXiv:1412.2221
Barceló, P., Figueira, D., Libkin, L.: Graph logics with rational relations and the generalized intersection problem. In: LICS, pp. 115–124 (2012)
Google Scholar
Barceló, P., Libkin, L., Lin, A.W., Wood, P.T.: Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. 37(4), 31 (2012)
Article Google Scholar
Bikel, D.M., Miller, S., Schwartz, R.M., Weischedel, R.M.: Nymble: a high-performance learning name-finder. In: ANLP, pp. 194–201 (1997)
Google Scholar
Borkar, V.R., Deshmukh, K., Sarawagi, S.: Automatic segmentation of text into structured records. In: SIGMOD Conference, pp. 175–186. ACM (2001)
Google Scholar
Bröcheler, M., Mihalkova, L., Getoor, L.: Probabilistic similarity logic. In: UAI, pp. 73–82. AUAI Press (2010)
Google Scholar
Calì, A., Gottlob, G., Lukasiewicz, T., Marnette, B., Pieris, A.: Datalog+/-: a family of logical knowledge representation and query languages for new applications. In: LICS, pp. 228–242 (2010)
Google Scholar
Chakravarthy, S., Venkatachalam, A., Telang, A., Aery, M.: Infosift: a novel, mining-based framework for document classification. IJNGC 5(2) (2014)
Google Scholar
Chen, F., Feng, X., Re, C., Wang, M.: Optimizing statistical information extraction programs over evolving text. In: ICDE, pp. 870–881. IEEE Computer Society (2012)
Google Scholar
Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan, S.: SystemT: an algebraic approach to declarative information extraction. In: ACL, pp. 128–137 (2010)
Google Scholar
Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP, pp. 827–832. ACL (2013)
Google Scholar
Ciravegna, F.: Adaptive information extraction from text by rule induction and generalisation. In: IJCAI, pp. 1251–1256. Morgan Kaufmann (2001)
Google Scholar
Cohen, S., Kimelfeld, B., Sagiv, Y.: Generating all maximal induced subgraphs for hereditary and connected-hereditary graph properties. J. Comput. Syst. Sci. 74(7), 1147–1159 (2008)
Article MathSciNet Google Scholar
Cunningham, H.: GATE: a general architecture for text engineering. Comput. Humanit. 36(2), 223–254 (2002)
Article Google Scholar
Dylla, M., Miliaraki, I., Theobald, M.: A temporal-probabilistic database model for information extraction. PVLDB 6(14), 1810–1821 (2013)
Google Scholar
Fagin, R., Kimelfeld, B., Kolaitis, P.G.: Dichotomies in the complexity of preferred repairs. In: PODS 2015 (2015) (To appear)
Google Scholar
Fagin, R., Kimelfeld, B., Reiss, F., Vansummeren, S.: Cleaning inconsistencies in information extraction via prioritized repairs. In: PODS. ACM (2014)
Google Scholar
Fagin, R., Kimelfeld, B., Reiss, F., Vansummeren, S.: Document spanners: a formal approach to information extraction. J. ACM (JACM) 62(2), 12 (2015)
Article MathSciNet Google Scholar
Ginsburg, S., Wang, X.S.: Regular sequence operations and their use in database queries. J. Comput. Syst. Sci. 56(1), 1–26 (1998)
Article MathSciNet Google Scholar
Grant, C.E., Gumbs, J., Li, K., Wang, D.Z., Chitouras, G.: Madden: query-driven statistical text analytics. In: CIKM, pp. 2740–2742. ACM (2012)
Google Scholar
Green, T.J., Aref, M., Karvounarakis, G.: LogicBlox, platform and language: a tutorial. In: Barceló, P., Pichler, R. (eds.) Datalog 2.0 2012. LNCS, vol. 7494, pp. 1–8. Springer, Heidelberg (2012)
Chapter Google Scholar
Gupta, S., Manning, C.D.: Improved pattern learning for bootstrapped entity extraction. In: CoNLL, pp. 98–108. ACL (2014)
Google Scholar
Huan, J., Wang, W., Prins, J., Yang, J.: SPIN: mining maximal frequent subgraphs from graph databases. In: KDD, pp. 581–586 (2004)
Google Scholar
Kandel, S., Paepcke, A., Hellerstein, J.M., Heer, J.: Enterprise data analysis and visualization: an interview study. IEEE Trans. Vis. Comput. Graph. 18(12), 2917–2926 (2012)
Article Google Scholar
Kimelfeld, B.: Database principles in information extraction. In: PODS, pp. 156–163. ACM (2014)
Google Scholar
Kimelfeld, B., Kolaitis, P.G.: The complexity of mining maximal frequent subgraphs. ACM Trans. Database Syst. 39(4), 32:1–32:33 (2014)
Article MathSciNet Google Scholar
Kimmig, A., Demoen, B., De Raedt, L., Santos Costa, V., Rocha, R.: On the implementation of the probabilistic logic programming language ProbLog. Theory Pract. Logic Program. 11, 235–262 (2011)
Article Google Scholar
Klein, D., Manning, C.D.: Conditional structure versus conditional estimation in NLP models. In: EMNLP, pp. 9–16. Association for Computational Linguistics (2002)
Google Scholar
Kok, S., Domingos, P.M.: Learning markov logic networks using structural motifs. In: ICML, pp. 551–558. Omnipress (2010)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)
Google Scholar
Leek, T.R.: Information extraction using hidden Markov models. Master’s thesis, UC San Diego (1997)
Google Scholar
Ling, X., Weld, D.S.: Temporal information extraction. In AAAI. AAAI Press (2010)
Google Scholar
Liu, B., Chiticariu, L., Chu, V., Jagadish, H.V., Reiss, F.: Automatic rule refinement for information extraction. PVLDB 3(1), 588–597 (2010)
Google Scholar
Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 301–311. Springer, Heidelberg (2005)
Chapter Google Scholar
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy Markov models for information extraction and segmentation. In: ICML, pp. 591–598 (2000)
Google Scholar
Mihalkova, L., Mooney, R.J.: Bottom-up learning of Markov logic network structure. In: ICML, pp. 625–632. ACM (2007)
Google Scholar
Milch, B., et al: BLOG: probabilistic models with unknown objects. In: IJCAI, pp. 1352–1359 (2005)
Google Scholar
Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS. PVLDB 4(6), 373–384 (2011)
Google Scholar
Niu, F., Zhang, C., Re, C., Shavlik, J.W.: DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In: Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, CEUR Workshop Proceedings, vol. 884, pp. 25–28 (2012). http://CEUR-WS.org
Pons-Porrata, A., Llavori, R.B., Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques. Inf. Process. Manage. 43(3), 752–768 (2007)
Article Google Scholar
Poole, D.: The independent choice logic and beyond. In: De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.) Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911, pp. 222–243. Springer, Heidelberg (2008)
Chapter Google Scholar
Poon, H., Domingos, P.: Joint inference in information extraction. In: Proceedings of the 22nd national conference on Artificial intelligence, AAAI 2007, pp. 913–918. AAAI Press (2007)
Google Scholar
Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.D.: A multi-pass sieve for coreference resolution. In: EMNLP, pp. 492–501. ACL (2010)
Google Scholar
Reiss, F., Raghavan, S., Krishnamurthy, R., Zhu, H., Vaithyanathan, S.: An algebraic approach to rule-based information extraction. In: ICDE, pp. 933–942 (2008)
Google Scholar
Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)
Article Google Scholar
Rink, B., Bejan, C.A., Harabagiu, S.M.: Learning textual graph patterns to detect causal event relations. In: Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference. AAAI Press (2010)
Google Scholar
Sato, T., Kameya, Y.: PRISM: a language for symbolic-statistical modeling. In: IJCAI, pp. 1330–1339 (1997)
Google Scholar
Shen, W., Doan, A., Naughton, J.F., Ramakrishnan, R.: Declarative information extraction using datalog with embedded extraction predicates. In: VLDB, pp. 1033–1044 (2007)
Google Scholar
Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1–3), 233–272 (1999)
Article Google Scholar
Staworko, S., Chomicki, J., Marcinkowski, J.: Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell. 64(2–3), 209–246 (2012)
Article MathSciNet Google Scholar
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2011)
Google Scholar
Thomas, L.T., Valluri, S.R., Karlapalem, K.: Margin: Maximal frequent subgraph mining. TKDD 4(3) (2010)
Google Scholar
Wang, D.Z., Franklin, M.J., Garofalakis, M.N., Hellerstein, J.M., Wick, M.L.: Hybrid in-database inference for declarative information extraction. In: SIGMOD Conference, pp. 517–528. ACM (2011)
Google Scholar
Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003)
MathSciNet Google Scholar
Zhang, C., Baldwin, T., Ho, H., Kimelfeld, B., Li, Y.: Adaptive parser-centric text normalization. In: ACL, vol. 1, pp. 1159–1168. The Association for Computer Linguistics (2013)
Google Scholar
Zhang, C., Kumar, A., Ré, C.: Materialization optimizations for feature selection workloads. In: SIGMOD Conference, pp. 265–276 (2014)
Google Scholar
Zhao, S., Grishman, R.: Extracting relations with integrated information using kernel methods. In ACL. The Association for Computer Linguistics (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Technion, Haifa, Israel
Benny Kimelfeld

Authors

Benny Kimelfeld
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benny Kimelfeld .

Editor information

Editors and Affiliations

University of California, Santa Cruz, USA
Balder ten Cate
National University of Ireland, Galway, Ireland
Alessandra Mileo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kimelfeld, B. (2015). Extending Datalog Intelligence. In: ten Cate, B., Mileo, A. (eds) Web Reasoning and Rule Systems. RR 2015. Lecture Notes in Computer Science(), vol 9209. Springer, Cham. https://doi.org/10.1007/978-3-319-22002-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-22002-4_1
Published: 22 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22001-7
Online ISBN: 978-3-319-22002-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics