Abstract
By avoiding the ‘data not invented here’ syndrome (NIH) (Data not invented here (NIH) syndrome is a mindset that consists in focusing solely on using data created inside the walls of a business (https://urlz.fr/9Yo9)), companies realized the benefit of including external sources in their data cube. In this context, Linked Open Data (LOD) is a promising external source that may contain valuable data and query-logs materializing the exploration of data by end users. Paradoxically, the dataset of this external source is structured whereas logs are “ugly”, and in the case, they are turned into rich structured data, they will contribute to building valuable data cubes. In this paper, we claim that the NIH syndrome must be also considered for query-logs. As a consequence, we propose an approach that investigates the particularity of SPARQL query logs performed on the LOD and augmented by the LOD to discover multidimensional patterns when leveraging and enriching a data cube. To show the effectiveness of our approach, different scenarios are proposed and evaluated using DBpedia.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
Property path expressions are negligible in our corpus, our study do not focus on these expressions.
- 10.
Details about the operations can be found in W3C recommandation https://urlz.fr/9Yqk.
- 11.
A Sparql endpoint is an HTTP-based query service that executes SPARQL queries over the linked dataset. eg. http://dbpedia.org/sparql.
- 12.
- 13.
- 14.
References
Abelló, A., et al.: Using semantic web technologies for exploratory OLAP: a survey. IEEE Trans. Knowl. Data Eng. 27(2), 571–588 (2015)
Abelló Gamazo, A., Gallinucci, E., Golfarelli, M., Rizzi Bach, S., Romero Moral, Ó.: Towards exploratory OLAP on linked data. In: 2016 24th Italian Symposium on Advanced Database Systems, SEBD 2016, Italy, June 2016, pp. 86–93 (2016)
Aligon, J., Gallinucci, E., Golfarelli, M., Marcel, P., Rizzi, S.: A collaborative filtering approach for recommending olap sessions. DSS 69, 20–30 (2015)
Baldacci, L., Golfarelli, M., Graziani, S., Rizzi, S.: QETL: an approach to on-demand etl from non-owned data sources. DKE 112, 17–37 (2017)
Bonchi, F., et al.: Web log data warehousing and mining for intelligent web caching. Data Knowl. Eng. 39(2), 165–189 (2001)
Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. Proc. VLDB Endowment 11(2), 149–161 (2017)
Cyganiak, R., Reynolds, D., Tennison, J.: The RDF Data Cube Vocabulary. World Wide Web Consortium, Cambridge (2014)
Etcheverry, L., Vaisman, A.A.: QB4OLAP: a new vocabulary for OLAP cubes on the semantic web. In: Proceedings of COLD (2012)
Gallinucci, E., Golfarelli, M., Rizzi, S., Abelló, A., Romero, O.: Interactive multidimensional modeling of linked data for exploratory OLAP. IS 77, 86–104 (2018)
Hilal, M.: A proposal for self-service OLAP endpoints for linked RDF datasets. In: Ciancarini, P., et al. (eds.) EKAW 2016. LNCS (LNAI), vol. 10180, pp. 245–250. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58694-6_38
Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: International Conference on Data Engineering ICDE, pp. 717–728. IEEE (2005)
Khouri, S., Bellatreche, L.: LOD query-logs as an asset for multidimensional modeling. In: Benczúr, A., et al. (eds.) ADBIS 2018. CCIS, vol. 909, pp. 45–53. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00063-9_6
Kimball, R.: Newly emerging best practices for big data. Whitepaper, Kimball Group, September 2012
Komamizu, T., Amagasa, T., Kitagawa, H.: SPOOL: a SPARQL-based ETL framework for OLAP over linked data. In: IIWAS, p. 49. ACM (2015)
Marx, E., Zaveri, A., Moussallem, D., Rautenberg, S.: Dbtrends: exploring query logs for ranking RDF data. In: Semantic Systems, pp. 9–16. ACM (2016)
Mazumdar, S., et al.: SEMLEX-A framework for visually exploring semantic query log analysis. In: Semantic Web Conference-Poster and Demo Session (2011)
Ravat, F., Song, J.: Enabling OLAP analyses on the web of data. In: 2016 Eleventh International Conference on Digital Information Management (ICDIM), pp. 215–224. IEEE (2016)
Romero, O., Abelló, A.: Automatic validation of requirements to support multidimensional design. Data Knowl. Eng. 69(9), 917–942 (2010)
Sabharwal, S., Nagpal, S., Aggarwal, G.: Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques. Int. J. Syst. Assur. Eng. Manag. 8(2), 703–715 (2017)
Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15
Wang, X., Staab, S., Tiropanis, T.: ASPG: generating OLAP queries for SPARQL benchmarking. In: Li, Y.-F., et al. (eds.) JIST 2016. LNCS, vol. 10055, pp. 171–185. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50112-3_13
Zhang, J., Ling, T.W., Bruckner, R.M., Tjoa, A.M.: Building XML data warehouse based on frequent patterns in user queries. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 99–108. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45228-7_11
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Khouri, S., Lanasri, D., Saidoune, R., Boudoukha, K., Bellatreche, L. (2019). LogLInc: LoG Queries of Linked Open Data Investigator for Cube Design. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-27615-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)