Abstract
Knowledge bases (KBs) are computer systems that store complex structured and unstructured facts, i.e., knowledge. KB are described as open shared database of the world’s knowledge and typically use the entity-relational model. Most of the existing knowledge bases make their data in the RDF format. Tools including querying, inferencing and reasoning on facts are developed to consume the knowledge. In this chapter, we introduce a client-side caching framework aiming at accelerating the overall query response speed. In particular, we improve a suboptimal graph edit distance function to estimate the similarity of SPARQL queries and develop an approach to transform the SPARQL queries to feature vectors. Machine learning algorithms are leveraged using these feature vectors to identify similar queries that could potentially be the subsequent queries. We adapt multiple dimensional reduction algorithms to reduce the identification time. We then prefetch and cache the results of these queries aiming to improve the overall querying performance. We also develop a forecasting method, namely Modified Simple Exponential Smoothing, to implement the cache replacement. Our approach has been evaluated by using a very large set of real world queries. The empirical results show that our approach has great potential to enhance the cache hit rate and accelerate the querying speed on SPARQL endpoints.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Bao, N. Duan, M. Zhou, T. Zhao, Knowledge-based question answering as machine translation, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, USA (2014), pp. 967–976
J. Berant, A. Chou, R. Frostig, P. Liang, Semantic parsing on freebase from question-answer pairs, in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), Seattle, USA (2013), pp. 1533–1544
J. Berant, P. Liang, Semantic parsing via paraphrasing, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, USA (2014), pp. 1415–1425
K.D. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collaboratively created graph database for structuring human knowledge, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2008), Vancouver, Canada (2008), pp. 1247–1250
H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, H. Li, Context-aware query suggestion by mining click-through and session data, in Proceeding of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2008), Las Vegas, Nevada, USA (2008), pp. 875–883
S. Dar, M.J. Franklin, B.T. Jónsson, D. Srivastava, M. Tan, Semantic data caching and replacement, in Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB1996), Mumbai (Bombay), India (1996), pp. 330–341
P.J. Denning, The working set model for program behaviour. Commun. ACM 11(5), 323–333 (1968)
S. Elbassuoni, M. Ramanath, G. Weikum, Query relaxation for entity-relationship search, in Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Heraklion, Crete, Greece (2011), pp. 62–76
A. Fader, L. Zettlemoyer, O. Etzioni, Open question answering over curated and extracted knowledge bases, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2014), New York, USA (2014), pp. 1156–1165
D.A. Ferrucci, E.W. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. Kalyanpur, A. Lally, J.W. Murdock, E. Nyberg, J.M. Prager, N. Schlaefer, C.A. Welty, Building Watson: an overview of the DeepQA project. AI Magazine 31(3), 59–79 (2010)
G. Fokou, S. Jean, A. Hadjali, M. Baron, Cooperative techniques for SPARQL query relaxation in RDF databases, in Proceedings of the 12th Extended Semantic Web Conference (ESWC 2015), Portoroz, Slovenia (2015), pp. 237–252
J.H. Friedman, J.L. Bentley, R.A. Finkel, An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
E.S. Gardner, Exponential smoothing: the state of the art-part II. Int. J. Forecast. 22(4), 637–666 (2006)
P. Godfrey, J. Gryz, Answering queries by semantic caches, In Proceedings of the 10th International Conference on Database and Expert Systems Applications (DEXA 1999), Florence, Italy (1999), pp. 485–498
R. Hasan, Predicting SPARQL query performance and explaining linked data, in Proceedings of the 11th Extended Semantic Web Conference (ESWC 2014), Anissaras, Crete, Greece (2014), pp. 795–805
H. Hotelling, Relations between two sets of variates. Biometrika (1936), pp. 321–377
N.L. Johnson, A.W. Kemp, S. Kotz, Univariate Discrete Distributions, 2nd edn. (Wiley, New Jersey, 1993)
I. Jolliffe, Principal Component Analysis, Wiley Online Library (2002)
L. Kaufman, P. Rousseeuw, Clustering by Means of Medoids, (North-Holland, Amsterdam, 1987)
D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
J. Lehmann, L. Bühmann, AutoSPARQL: let users query your knowledge base, in Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Heraklion, Crete, Greece (2011), pp. 63–79
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P.N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, C. Bizer, DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6(2), 167–195 (2015)
J.J. Levandoski, P. Larson, R. Stoica, Identifying hot and cold data in main-memory databases, in Proceedings of 29th International Conference on Data Engineering (ICDE 2013), Brisbane, Australia (2013), pp. 26–37
J. Lorey, F. Naumann, Detecting SPARQL query templates for data prefetching, in Proceedings of the 10th Extended Semantic Web Conference (ESWC 2013), Montpellier, France (2013), pp. 124–139
M. Martin, J. Unbehauen, S. Auer, Improving the performance of semantic web applications with SPARQL query caching, in Proceedings of the 7th Extended Semantic Web Conference (ESWC 2010), Heraklion, Crete, Greece (2010), pp. 304–318
N. Megiddo, D.S. Modha, ARC: a self-tuning, low overhead replacement cache, in Proceedings of the Conference on File and Storage Technologies (FAST, San Francisco, California, USA (2003)
M. Morsey, J. Lehmann, S. Auer, A.N. Ngomo, Usage-centric benchmarking of RDF triple stores, in Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI 2012), Toronto, Canada (2012)
J.R. Movellan, A quickie on exponential smoothing. http://mplab.ucsd.edu/tutorials/ExpSmoothing.pdfa/
E.J. O’Neil, P.E. O’Neil, G. Weikum, The LRU-K page replacement algorithm for database disk buffering, in Proceedings of the International Conference on Management of Data (SIGMOD 1993), Washington, D.C., USA (1993), pp. 297–306
N. Papailiou, D. Tsoumakos, P. Karras, N. Koziris, Graph-aware, workload-adaptive SPARQL query caching, in Proceedings of the International Conference on Management of Data (SIGMOD 2015), Melbourne, Australia (2015), pp. 1777–1792
J. Pérez, M. Arenas, C. Gutierrez, Semantics and complexity of SPARQL. ACM Trans. Database Sys. 34(3) (2009)
R. Punnoose, A. Crainiceanu, D. Rapp, SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)
S. Reid, Knowledge-based systems concepts, Techniques, Examples. http://www.reidgsmith.com/ (1985)
Q. Ren, M.H. Dunham, V. Kumar, Semantic caching and query processing. IEEE Trans. Knowl. Data Eng. 15(1), 192–210 (2003)
A. Sanfeliu, K. Fu, A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Sys. Man Cybern. 13(3), 353–362 (1983)
Y. Shu, M. Compton, H. Müller, K. Taylor, Towards content-aware SPARQL query caching for semantic web applications, in Proceedings of the 14th International Conference on Web Information Systems Engineering (WISE 2013), Nanjing, China (2013), pp. 320–329
F.M. Suchanek, G. Kasneci, G. Weikum. Yago: a core of semantic knowledge, in Proceedings of the 16th International World Wide Web Conference (WWW 2007), Banff, Canada (2007), pp. 697–706
R. Verborgh, O. Hartig, B.D. Meester, G. Haesendonck, L.D. Vocht, M.V. Sande, R. Cyganiak, P. Colpaert, E. Mannens, R.V. de Walle, Querying datasets on the web with high availability, in Proceedings of the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy (2014), pp. 180–196
M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp, G. Weikum, Natural language questions for the web of data, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), Jeju Island, Korea (2012), pp. 379–390
M. Yang, G. Wu, Caching intermediate result of SPARQL queries, in Proceedings of the 20th International World Wide Web Conference (WWW 2011), Hyderabad, India (2011), pp. 159–160
P. Yin, N. Duan, B. Kao, J. Bao, M. Zhou, Answering questions with complex semantic constraints on open knowledge bases, in Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia (2015), pp. 1301–1310
W.E. Zhang, Q.Z. Sheng, Y. Qin, K. Taylor, L. Yao, A. Shemshadi, SECF: improving SPARQL querying performance with proactive fetching and caching, in Proceedings of the 31st ACM Symposium on Applied Computing(SAC 2016), Pisa, Italy (2016), (To appear)
W.E. Zhang, Q.Z. Sheng, K. Taylor, Y. Qin, Identifying and caching hot triples for efficient RDF query processing, in Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA 2015), Hanoi, Vietnam (2015), pp. 259–274
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Zhang, W.E., Sheng, Q.Z. (2017). Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-49340-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49339-8
Online ISBN: 978-3-319-49340-4
eBook Packages: Computer ScienceComputer Science (R0)