Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases

Zhang, Wei Emma; Sheng, Quan Z.

doi:10.1007/978-3-319-49340-4_13

Wei Emma Zhang³ &
Quan Z. Sheng³

7197 Accesses
1 Citations

Abstract

Knowledge bases (KBs) are computer systems that store complex structured and unstructured facts, i.e., knowledge. KB are described as open shared database of the world’s knowledge and typically use the entity-relational model. Most of the existing knowledge bases make their data in the RDF format. Tools including querying, inferencing and reasoning on facts are developed to consume the knowledge. In this chapter, we introduce a client-side caching framework aiming at accelerating the overall query response speed. In particular, we improve a suboptimal graph edit distance function to estimate the similarity of SPARQL queries and develop an approach to transform the SPARQL queries to feature vectors. Machine learning algorithms are leveraged using these feature vectors to identify similar queries that could potentially be the subsequent queries. We adapt multiple dimensional reduction algorithms to reduce the identification time. We then prefetch and cache the results of these queries aiming to improve the overall querying performance. We also develop a forecasting method, namely Modified Simple Exponential Smoothing, to implement the cache replacement. Our approach has been evaluated by using a very large set of real world queries. The empirical results show that our approach has great potential to enhance the cache hit rate and accelerate the querying speed on SPARQL endpoints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://lod-cloud.net/.
2.
http://www.cambridgesemantics.com/semantic-university/.
3.
http://linkeddata.org/.
4.
http://dbpedia.org/sparql/.
5.
http://linkedgeodata.org/sparql.
6.
Graph Matching Toolkit: http://www.fhnw.ch/wirtschaft/iwi/gmt.
7.
http://wiki.aksw.org/Projects/QueryCache.

References

J. Bao, N. Duan, M. Zhou, T. Zhao, Knowledge-based question answering as machine translation, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, USA (2014), pp. 967–976
Google Scholar
J. Berant, A. Chou, R. Frostig, P. Liang, Semantic parsing on freebase from question-answer pairs, in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), Seattle, USA (2013), pp. 1533–1544
Google Scholar
J. Berant, P. Liang, Semantic parsing via paraphrasing, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, USA (2014), pp. 1415–1425
Google Scholar
K.D. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collaboratively created graph database for structuring human knowledge, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2008), Vancouver, Canada (2008), pp. 1247–1250
Google Scholar
H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, H. Li, Context-aware query suggestion by mining click-through and session data, in Proceeding of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2008), Las Vegas, Nevada, USA (2008), pp. 875–883
Google Scholar
S. Dar, M.J. Franklin, B.T. Jónsson, D. Srivastava, M. Tan, Semantic data caching and replacement, in Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB1996), Mumbai (Bombay), India (1996), pp. 330–341
Google Scholar
P.J. Denning, The working set model for program behaviour. Commun. ACM 11(5), 323–333 (1968)
Article MathSciNet MATH Google Scholar
S. Elbassuoni, M. Ramanath, G. Weikum, Query relaxation for entity-relationship search, in Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Heraklion, Crete, Greece (2011), pp. 62–76
Google Scholar
A. Fader, L. Zettlemoyer, O. Etzioni, Open question answering over curated and extracted knowledge bases, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2014), New York, USA (2014), pp. 1156–1165
Google Scholar
D.A. Ferrucci, E.W. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. Kalyanpur, A. Lally, J.W. Murdock, E. Nyberg, J.M. Prager, N. Schlaefer, C.A. Welty, Building Watson: an overview of the DeepQA project. AI Magazine 31(3), 59–79 (2010)
Google Scholar
G. Fokou, S. Jean, A. Hadjali, M. Baron, Cooperative techniques for SPARQL query relaxation in RDF databases, in Proceedings of the 12th Extended Semantic Web Conference (ESWC 2015), Portoroz, Slovenia (2015), pp. 237–252
Google Scholar
J.H. Friedman, J.L. Bentley, R.A. Finkel, An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Article MATH Google Scholar
E.S. Gardner, Exponential smoothing: the state of the art-part II. Int. J. Forecast. 22(4), 637–666 (2006)
Article Google Scholar
P. Godfrey, J. Gryz, Answering queries by semantic caches, In Proceedings of the 10th International Conference on Database and Expert Systems Applications (DEXA 1999), Florence, Italy (1999), pp. 485–498
Google Scholar
R. Hasan, Predicting SPARQL query performance and explaining linked data, in Proceedings of the 11th Extended Semantic Web Conference (ESWC 2014), Anissaras, Crete, Greece (2014), pp. 795–805
Google Scholar
H. Hotelling, Relations between two sets of variates. Biometrika (1936), pp. 321–377
Google Scholar
N.L. Johnson, A.W. Kemp, S. Kotz, Univariate Discrete Distributions, 2nd edn. (Wiley, New Jersey, 1993)
Google Scholar
I. Jolliffe, Principal Component Analysis, Wiley Online Library (2002)
Google Scholar
L. Kaufman, P. Rousseeuw, Clustering by Means of Medoids, (North-Holland, Amsterdam, 1987)
Google Scholar
D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article Google Scholar
J. Lehmann, L. Bühmann, AutoSPARQL: let users query your knowledge base, in Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Heraklion, Crete, Greece (2011), pp. 63–79
Google Scholar
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P.N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, C. Bizer, DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6(2), 167–195 (2015)
Google Scholar
J.J. Levandoski, P. Larson, R. Stoica, Identifying hot and cold data in main-memory databases, in Proceedings of 29th International Conference on Data Engineering (ICDE 2013), Brisbane, Australia (2013), pp. 26–37
Google Scholar
J. Lorey, F. Naumann, Detecting SPARQL query templates for data prefetching, in Proceedings of the 10th Extended Semantic Web Conference (ESWC 2013), Montpellier, France (2013), pp. 124–139
Google Scholar
M. Martin, J. Unbehauen, S. Auer, Improving the performance of semantic web applications with SPARQL query caching, in Proceedings of the 7th Extended Semantic Web Conference (ESWC 2010), Heraklion, Crete, Greece (2010), pp. 304–318
Google Scholar
N. Megiddo, D.S. Modha, ARC: a self-tuning, low overhead replacement cache, in Proceedings of the Conference on File and Storage Technologies (FAST, San Francisco, California, USA (2003)
Google Scholar
M. Morsey, J. Lehmann, S. Auer, A.N. Ngomo, Usage-centric benchmarking of RDF triple stores, in Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI 2012), Toronto, Canada (2012)
Google Scholar
J.R. Movellan, A quickie on exponential smoothing. http://mplab.ucsd.edu/tutorials/ExpSmoothing.pdfa/
E.J. O’Neil, P.E. O’Neil, G. Weikum, The LRU-K page replacement algorithm for database disk buffering, in Proceedings of the International Conference on Management of Data (SIGMOD 1993), Washington, D.C., USA (1993), pp. 297–306
Google Scholar
N. Papailiou, D. Tsoumakos, P. Karras, N. Koziris, Graph-aware, workload-adaptive SPARQL query caching, in Proceedings of the International Conference on Management of Data (SIGMOD 2015), Melbourne, Australia (2015), pp. 1777–1792
Google Scholar
J. Pérez, M. Arenas, C. Gutierrez, Semantics and complexity of SPARQL. ACM Trans. Database Sys. 34(3) (2009)
Google Scholar
R. Punnoose, A. Crainiceanu, D. Rapp, SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)
Article Google Scholar
S. Reid, Knowledge-based systems concepts, Techniques, Examples. http://www.reidgsmith.com/ (1985)
Q. Ren, M.H. Dunham, V. Kumar, Semantic caching and query processing. IEEE Trans. Knowl. Data Eng. 15(1), 192–210 (2003)
Article Google Scholar
A. Sanfeliu, K. Fu, A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Sys. Man Cybern. 13(3), 353–362 (1983)
Article MATH Google Scholar
Y. Shu, M. Compton, H. Müller, K. Taylor, Towards content-aware SPARQL query caching for semantic web applications, in Proceedings of the 14th International Conference on Web Information Systems Engineering (WISE 2013), Nanjing, China (2013), pp. 320–329
Google Scholar
F.M. Suchanek, G. Kasneci, G. Weikum. Yago: a core of semantic knowledge, in Proceedings of the 16th International World Wide Web Conference (WWW 2007), Banff, Canada (2007), pp. 697–706
Google Scholar
R. Verborgh, O. Hartig, B.D. Meester, G. Haesendonck, L.D. Vocht, M.V. Sande, R. Cyganiak, P. Colpaert, E. Mannens, R.V. de Walle, Querying datasets on the web with high availability, in Proceedings of the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy (2014), pp. 180–196
Google Scholar
M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp, G. Weikum, Natural language questions for the web of data, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), Jeju Island, Korea (2012), pp. 379–390
Google Scholar
M. Yang, G. Wu, Caching intermediate result of SPARQL queries, in Proceedings of the 20th International World Wide Web Conference (WWW 2011), Hyderabad, India (2011), pp. 159–160
Google Scholar
P. Yin, N. Duan, B. Kao, J. Bao, M. Zhou, Answering questions with complex semantic constraints on open knowledge bases, in Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia (2015), pp. 1301–1310
Google Scholar
W.E. Zhang, Q.Z. Sheng, Y. Qin, K. Taylor, L. Yao, A. Shemshadi, SECF: improving SPARQL querying performance with proactive fetching and caching, in Proceedings of the 31st ACM Symposium on Applied Computing(SAC 2016), Pisa, Italy (2016), (To appear)
Google Scholar
W.E. Zhang, Q.Z. Sheng, K. Taylor, Y. Qin, Identifying and caching hot triples for efficient RDF query processing, in Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA 2015), Hanoi, Vietnam (2015), pp. 259–274
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Adelaide, North Terrace, Adelaide, SA, 5005, Australia
Wei Emma Zhang & Quan Z. Sheng

Authors

Wei Emma Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Z. Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quan Z. Sheng .

Editor information

Editors and Affiliations

School of Information Technologies, The University of Sydney, Sydney, New South Wales, Australia
Albert Y. Zomaya
The School of Computer Science, The University of New South Wales, Eveleigh, New South Wales, Australia
Sherif Sakr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, W.E., Sheng, Q.Z. (2017). Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-49340-4_13
Published: 26 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49339-8
Online ISBN: 978-3-319-49340-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics