Skip to main content

Extending Datalog Intelligence

  • Conference paper
  • First Online:
Web Reasoning and Rule Systems (RR 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9209))

Included in the following conference series:

Abstract

Prominent sources of Big Data include technological and social trends, such as mobile computing, blogging, and social networking. The means to analyse such data are becoming more accessible with the development of business models like cloud computing, open-source and crowd sourcing. But that data have characteristics that pose challenges to traditional database systems. Due to the uncontrolled nature by which data is produced, much of it is free text, often in informal natural language, leading to computing environments with high levels of uncertainty and error. In this talk I will offer a vision of a database system that aims to facilitate the development of modern data-centric applications, by naturally unifying key functionalities of databases, text analytics, machine learning and artificial intelligence. I will also describe my past research towards pursuing the vision by extensions of Datalog — a well studied rule-based programming paradigm that features an inherent integration with the database, and has a robust declarative semantics. These extensions allow for incorporating information extraction from text, and for specifying statistical models by probabilistic programming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abiteboul, S., Deutch, D., Vianu, V.: Deduction with contradictions in Datalog. In: ICDT, pp. 143–154 (2014)

    Google Scholar 

  2. Aone, C., Ramos-Santacruz, M.: Rees: a large-scale relation and event extraction system. In: ANLP, pp. 76–83 (2000)

    Google Scholar 

  3. Appelt, D.E., Onyshkevych, B.: The common pattern specification language. In: Proceedings of the TIPSTER Text Program: Phase III, pp. 23–30, Baltimore, Maryland, USA (1998)

    Google Scholar 

  4. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS, pp. 68–79 (1999)

    Google Scholar 

  5. Baral, C., Gelfond, M., Rushton, N.: Probabilistic reasoning with answer sets. Theory Pract. Log. Program. 9(1), 57–144 (2009)

    Article  MathSciNet  Google Scholar 

  6. Barany, V., Cate, B.T., Kimelfeld, B., Olteanu, D., Vagena, Z.: Declarative statistical modeling with datalog (2014). arXiv preprint arXiv:1412.2221

  7. Barceló, P., Figueira, D., Libkin, L.: Graph logics with rational relations and the generalized intersection problem. In: LICS, pp. 115–124 (2012)

    Google Scholar 

  8. Barceló, P., Libkin, L., Lin, A.W., Wood, P.T.: Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. 37(4), 31 (2012)

    Article  Google Scholar 

  9. Bikel, D.M., Miller, S., Schwartz, R.M., Weischedel, R.M.: Nymble: a high-performance learning name-finder. In: ANLP, pp. 194–201 (1997)

    Google Scholar 

  10. Borkar, V.R., Deshmukh, K., Sarawagi, S.: Automatic segmentation of text into structured records. In: SIGMOD Conference, pp. 175–186. ACM (2001)

    Google Scholar 

  11. Bröcheler, M., Mihalkova, L., Getoor, L.: Probabilistic similarity logic. In: UAI, pp. 73–82. AUAI Press (2010)

    Google Scholar 

  12. Calì, A., Gottlob, G., Lukasiewicz, T., Marnette, B., Pieris, A.: Datalog+/-: a family of logical knowledge representation and query languages for new applications. In: LICS, pp. 228–242 (2010)

    Google Scholar 

  13. Chakravarthy, S., Venkatachalam, A., Telang, A., Aery, M.: Infosift: a novel, mining-based framework for document classification. IJNGC 5(2) (2014)

    Google Scholar 

  14. Chen, F., Feng, X., Re, C., Wang, M.: Optimizing statistical information extraction programs over evolving text. In: ICDE, pp. 870–881. IEEE Computer Society (2012)

    Google Scholar 

  15. Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan, S.: SystemT: an algebraic approach to declarative information extraction. In: ACL, pp. 128–137 (2010)

    Google Scholar 

  16. Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP, pp. 827–832. ACL (2013)

    Google Scholar 

  17. Ciravegna, F.: Adaptive information extraction from text by rule induction and generalisation. In: IJCAI, pp. 1251–1256. Morgan Kaufmann (2001)

    Google Scholar 

  18. Cohen, S., Kimelfeld, B., Sagiv, Y.: Generating all maximal induced subgraphs for hereditary and connected-hereditary graph properties. J. Comput. Syst. Sci. 74(7), 1147–1159 (2008)

    Article  MathSciNet  Google Scholar 

  19. Cunningham, H.: GATE: a general architecture for text engineering. Comput. Humanit. 36(2), 223–254 (2002)

    Article  Google Scholar 

  20. Dylla, M., Miliaraki, I., Theobald, M.: A temporal-probabilistic database model for information extraction. PVLDB 6(14), 1810–1821 (2013)

    Google Scholar 

  21. Fagin, R., Kimelfeld, B., Kolaitis, P.G.: Dichotomies in the complexity of preferred repairs. In: PODS 2015 (2015) (To appear)

    Google Scholar 

  22. Fagin, R., Kimelfeld, B., Reiss, F., Vansummeren, S.: Cleaning inconsistencies in information extraction via prioritized repairs. In: PODS. ACM (2014)

    Google Scholar 

  23. Fagin, R., Kimelfeld, B., Reiss, F., Vansummeren, S.: Document spanners: a formal approach to information extraction. J. ACM (JACM) 62(2), 12 (2015)

    Article  MathSciNet  Google Scholar 

  24. Ginsburg, S., Wang, X.S.: Regular sequence operations and their use in database queries. J. Comput. Syst. Sci. 56(1), 1–26 (1998)

    Article  MathSciNet  Google Scholar 

  25. Grant, C.E., Gumbs, J., Li, K., Wang, D.Z., Chitouras, G.: Madden: query-driven statistical text analytics. In: CIKM, pp. 2740–2742. ACM (2012)

    Google Scholar 

  26. Green, T.J., Aref, M., Karvounarakis, G.: LogicBlox, platform and language: a tutorial. In: Barceló, P., Pichler, R. (eds.) Datalog 2.0 2012. LNCS, vol. 7494, pp. 1–8. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  27. Gupta, S., Manning, C.D.: Improved pattern learning for bootstrapped entity extraction. In: CoNLL, pp. 98–108. ACL (2014)

    Google Scholar 

  28. Huan, J., Wang, W., Prins, J., Yang, J.: SPIN: mining maximal frequent subgraphs from graph databases. In: KDD, pp. 581–586 (2004)

    Google Scholar 

  29. Kandel, S., Paepcke, A., Hellerstein, J.M., Heer, J.: Enterprise data analysis and visualization: an interview study. IEEE Trans. Vis. Comput. Graph. 18(12), 2917–2926 (2012)

    Article  Google Scholar 

  30. Kimelfeld, B.: Database principles in information extraction. In: PODS, pp. 156–163. ACM (2014)

    Google Scholar 

  31. Kimelfeld, B., Kolaitis, P.G.: The complexity of mining maximal frequent subgraphs. ACM Trans. Database Syst. 39(4), 32:1–32:33 (2014)

    Article  MathSciNet  Google Scholar 

  32. Kimmig, A., Demoen, B., De Raedt, L., Santos Costa, V., Rocha, R.: On the implementation of the probabilistic logic programming language ProbLog. Theory Pract. Logic Program. 11, 235–262 (2011)

    Article  Google Scholar 

  33. Klein, D., Manning, C.D.: Conditional structure versus conditional estimation in NLP models. In: EMNLP, pp. 9–16. Association for Computational Linguistics (2002)

    Google Scholar 

  34. Kok, S., Domingos, P.M.: Learning markov logic networks using structural motifs. In: ICML, pp. 551–558. Omnipress (2010)

    Google Scholar 

  35. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)

    Google Scholar 

  36. Leek, T.R.: Information extraction using hidden Markov models. Master’s thesis, UC San Diego (1997)

    Google Scholar 

  37. Ling, X., Weld, D.S.: Temporal information extraction. In AAAI. AAAI Press (2010)

    Google Scholar 

  38. Liu, B., Chiticariu, L., Chu, V., Jagadish, H.V., Reiss, F.: Automatic rule refinement for information extraction. PVLDB 3(1), 588–597 (2010)

    Google Scholar 

  39. Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 301–311. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  40. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy Markov models for information extraction and segmentation. In: ICML, pp. 591–598 (2000)

    Google Scholar 

  41. Mihalkova, L., Mooney, R.J.: Bottom-up learning of Markov logic network structure. In: ICML, pp. 625–632. ACM (2007)

    Google Scholar 

  42. Milch, B., et al: BLOG: probabilistic models with unknown objects. In: IJCAI, pp. 1352–1359 (2005)

    Google Scholar 

  43. Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS. PVLDB 4(6), 373–384 (2011)

    Google Scholar 

  44. Niu, F., Zhang, C., Re, C., Shavlik, J.W.: DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In: Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, CEUR Workshop Proceedings, vol. 884, pp. 25–28 (2012). http://CEUR-WS.org

  45. Pons-Porrata, A., Llavori, R.B., Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques. Inf. Process. Manage. 43(3), 752–768 (2007)

    Article  Google Scholar 

  46. Poole, D.: The independent choice logic and beyond. In: De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.) Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911, pp. 222–243. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  47. Poon, H., Domingos, P.: Joint inference in information extraction. In: Proceedings of the 22nd national conference on Artificial intelligence, AAAI 2007, pp. 913–918. AAAI Press (2007)

    Google Scholar 

  48. Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.D.: A multi-pass sieve for coreference resolution. In: EMNLP, pp. 492–501. ACL (2010)

    Google Scholar 

  49. Reiss, F., Raghavan, S., Krishnamurthy, R., Zhu, H., Vaithyanathan, S.: An algebraic approach to rule-based information extraction. In: ICDE, pp. 933–942 (2008)

    Google Scholar 

  50. Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)

    Article  Google Scholar 

  51. Rink, B., Bejan, C.A., Harabagiu, S.M.: Learning textual graph patterns to detect causal event relations. In: Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference. AAAI Press (2010)

    Google Scholar 

  52. Sato, T., Kameya, Y.: PRISM: a language for symbolic-statistical modeling. In: IJCAI, pp. 1330–1339 (1997)

    Google Scholar 

  53. Shen, W., Doan, A., Naughton, J.F., Ramakrishnan, R.: Declarative information extraction using datalog with embedded extraction predicates. In: VLDB, pp. 1033–1044 (2007)

    Google Scholar 

  54. Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1–3), 233–272 (1999)

    Article  Google Scholar 

  55. Staworko, S., Chomicki, J., Marcinkowski, J.: Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell. 64(2–3), 209–246 (2012)

    Article  MathSciNet  Google Scholar 

  56. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2011)

    Google Scholar 

  57. Thomas, L.T., Valluri, S.R., Karlapalem, K.: Margin: Maximal frequent subgraph mining. TKDD 4(3) (2010)

    Google Scholar 

  58. Wang, D.Z., Franklin, M.J., Garofalakis, M.N., Hellerstein, J.M., Wick, M.L.: Hybrid in-database inference for declarative information extraction. In: SIGMOD Conference, pp. 517–528. ACM (2011)

    Google Scholar 

  59. Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003)

    MathSciNet  Google Scholar 

  60. Zhang, C., Baldwin, T., Ho, H., Kimelfeld, B., Li, Y.: Adaptive parser-centric text normalization. In: ACL, vol. 1, pp. 1159–1168. The Association for Computer Linguistics (2013)

    Google Scholar 

  61. Zhang, C., Kumar, A., Ré, C.: Materialization optimizations for feature selection workloads. In: SIGMOD Conference, pp. 265–276 (2014)

    Google Scholar 

  62. Zhao, S., Grishman, R.: Extracting relations with integrated information using kernel methods. In ACL. The Association for Computer Linguistics (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benny Kimelfeld .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kimelfeld, B. (2015). Extending Datalog Intelligence. In: ten Cate, B., Mileo, A. (eds) Web Reasoning and Rule Systems. RR 2015. Lecture Notes in Computer Science(), vol 9209. Springer, Cham. https://doi.org/10.1007/978-3-319-22002-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22002-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22001-7

  • Online ISBN: 978-3-319-22002-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics