Skip to main content

Querying the Web with Statistical Machine Learning

  • Chapter
  • First Online:
Book cover Towards the Internet of Services: The THESEUS Research Program

Part of the book series: Cognitive Technologies ((COGTECH))

Abstract

The traditional means of extracting information from the Web are keyword-based search and browsing. The Semantic Web adds structured information (i.e., semantic annotations and references) supporting both activities. One of the most interesting recent developments is Linked Open Data (LOD), where information is presented in the form of facts – often originating from published domain-specific databases – that can be accessed both by a human and a machine via specific query endpoints. In this article, we argue that machine learning provides a new way to query web data, in particular LOD, by analyzing and exploiting statistical regularities. We discuss challenges when applying machine learning to the Web and discuss the particular learning approaches we have been pursuing in THESEUS. We discuss a number of applications where the Web is queried via machine learning and describe several extensions to our approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although the world might be governed by scientific laws and logical constraints in general, at the level of abstraction that we and our applications have to function, the world partially appears to be governed by probabilities and statistical patterns.

  2. 2.

    http://challenge.semanticWeb.org/2011/

  3. 3.

    In particular, the probability that a relationship between two entities exists given the knowledge base KB is estimated as

    $$\displaystyle\begin{array}{rcl} \hat{P}((\mathit{Jane},\mathit{likes},\mathit{Jack})\vert KB) =\sum _{ i=1}^{L}f_{ i}^{\mathit{Jane}}f_{ i}^{\mathit{likes},\mathit{Jack}}& & {}\\ \end{array}$$

    where \(\{f_{i}^{{\it \text{Jane}}}\}_{i=1}^{L}\) are the L factors describing Jane, and \(\{f_{i}^{{\it \text{likes}},{\it \text{Jack}}}\}_{i=1}^{L}\) are the L factors describing Jack in his role as an object of the predicate “likes”. There are a number of approaches for calculating the factors. In our work in the SUNS framework (Tresp et al. 2009; Huang et al. 2010), we have employed regularized factorization of the associated data matrices. In our three-way tensor approach RESCAL (Nickel et al. 2011), we estimate

    $$\displaystyle\begin{array}{rcl} \hat{P}((\mathit{Jane},\mathit{likes},\mathit{Jack})\vert \mathit{KB}) =\sum _{ i=1}^{L}f_{ i}^{\mathit{Jane}}R^{\mathit{likes}}f_{ i}^{\mathit{Jack}}\;.& & {}\\ \end{array}$$

    Each entity has a unique latent representation, here \(\{f_{i}^{\mathit{Jane}}\}_{i=1}^{L}\) and \(\{f_{i}^{{\it \text{Jack}}}\}_{i=1}^{L}\), and the relation type specific interaction is modeled by the matrix \(R^{{\it \text{likes}}}\).

  4. 4.

    http://www.livejournal.com/

  5. 5.

    https://twitter.com/

  6. 6.

    http://www.larkc.eu/

References

  • S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, DBpedia: a nucleus for a web of open data, in Proceedings of the 6th International Semantic Web Conference (ISWC’08), Karlsruhe. Volume 4825 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2008), pp. 722–735

    Google Scholar 

  • M. Balduini, I. Celino, D. Dell’Aglio, E.D. Valle, Y. Huang, T. Lee, S.H. Kim, V. Tresp, Reality mining on micropost streams: deductive and inductive reasoning for personalized and location-based recommendations. Semant. Web Interoperability Usability Applicability 2, 1–16 (2013)

    Google Scholar 

  • C. Bizer, T. Heath, T. Berners-Lee, Linked data – the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1–22 (2009)

    Google Scholar 

  • D. Brickley, L. Miller, The Friend of a Friend (FOAF) project, http://www.foaf-project.org/

  • D. Fensel, F. van Harmelen, B. Andersson, P. Brennan, H. Cunningham, E.D. Valle, F. Fischer, Z. Huang, A. Kiryakov, T.K. il Lee, L. Schooler, V. Tresp, S. Wesner, M. Witbrock, N. Zhong, Towards LarKC: a platform for web-scale reasoning, in Proceedings of the IEEE International Conference on Semantic Computing, Santa Clara, Aug 2008, pp. 524–529

    Google Scholar 

  • Y. Huang, V. Tresp, M. Bundschus, A. Rettinger, H.P. Kriegel, Multivariate structured prediction for learning on semantic web, in Proceedings of the 20th International Conference on Inductive Logic Programming (ILP’10), Florence, ed. by P. Frasconi, F.A. Lisi. Volume 6489 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2010), pp. 92–104

    Google Scholar 

  • Y. Huang, V. Tresp, M. Nickel, A. Rettinger, H.P. Kriegel, A scalable approach for statistical learning in semantic graphs. Semant. Web Interoperability Usability Applicability 1, 1–18 (2013)

    Google Scholar 

  • X. Jiang, Y. Huang, M. Nickel, V. Tresp, Combining information extraction, deductive reasoning and machine learning for relation prediction, in Proceedings of the 9th Extended Semantic Web Conference (ESWC’12), Heraklion, ed. by E. Simperl, P. Cimiano, A. Polleres, O. Corcho, V. Presutti. Volume 7295 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2012a), pp. 164–178. http://dblp.uni-trier.de/db/conf/esws/eswc2012.html#JiangHNT12

  • X. Jiang, V. Tresp, Y. Huang, M. Nickel, Link prediction in multi-relational graphs using additive models, in Proceedings of the 11th International Workshop on Semantic Technologies Meet Recommender Systems & Big Data, ed. by M. de Gemmis, T.D. Noia, P. Lops, T. Lukasiewicz, G. Semeraro. Volume 919 of CEUR Workshop Proceedings, 2012b, pp. 1–12, CEUR-WS.org. http://dblp.uni-trier.de/db/conf/semweb/sersy2012.html#JiangTHN12.

  • X. Jiang, V. Tresp, Y. Huang, M. Nickel, H.P. Kriegel, Scalable relation prediction exploiting both intrarelational correlation and contextual information, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’12), Bristol, ed. by P.A. Flach, T.D. Bie, N. Cristianini. Volume 7523 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2012c), pp. 601–616. http://dblp.uni-trier.de/db/conf/pkdd/pkdd2012-1.html#JiangTHNK12

  • M.G. Kann, Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief. Bioinform. 11(1), 96–110 (2010). http://dblp.uni-trier.de/db/journals/bib/bib11.html#Kann10

  • M. Nickel, H.P. Kriegel, V. Tresp, A three-way model for collective learning on multi-relational data, in Proceedings of the 28th International Conference on Machine Learning (ICML’11), Bellevue, 2011

    Google Scholar 

  • M. Nickel, V. Tresp, H.P. Kriegel, Factorizing YAGO: scalable machine learning for linked data, in Proceedings of the 21st International World Wide Web Conference, Lyon, ed. by A. Mille, F.L. Gandon, J. Misselis, M. Rabinovich, S. Staab (ACM, 2012), pp. 271–280. http://dblp.uni-trier.de/db/conf/www/www2012.html#NickelTK12

  • V. Tresp, Y. Huang, M. Bundschus, A. Rettinger, Materializing and querying learned knowledge, in Proceedings of the First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (IRMLeS’09), Heraklion, vol. 474 (RWTH Aachen, 2009)

    Google Scholar 

  • V. Tresp, Y. Huang, X. Jiang, A. Rettinger, Graphical models for relations – modeling relational context, in Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR’11), Paris, Oct 2011

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Volker Tresp .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Tresp, V., Huang, Y., Nickel, M. (2014). Querying the Web with Statistical Machine Learning. In: Wahlster, W., Grallert, HJ., Wess, S., Friedrich, H., Widenka, T. (eds) Towards the Internet of Services: The THESEUS Research Program. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-06755-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06755-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06754-4

  • Online ISBN: 978-3-319-06755-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics