Abstract
The key discovery problem has been recently investigated for symbolical RDF data and tested on large datasets such as DBpedia and YAGO. The advantage of such methods is that they allow the automatic extraction of combinations of properties that uniquely identify every resource in a dataset (i.e., ontological rules). However, none of the existing approaches is able to treat real world numerical data. In this paper we propose a novel approach that allows to handle numerical RDF datasets for key discovery. We test the significance of our approach on the context of an oenological application and consider a wine dataset that represents the different chemical based flavourings. Discovering keys in this context contributes in the investigation of complementary flavors that allow to distinguish various wine sorts amongst themselves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Atencia, M., Chein, M., Croitoru, M., David, J., Leclère, M., Pernelle, N., Saïs, F., Scharffe, F., Symeonidou, D.: Defining key semantics for the RDF datasets: experiments and evaluations. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds.) ICCS 2014. LNCS, vol. 8577, pp. 65–78. Springer, Heidelberg (2014)
Atencia, M., David, J., Euzenat, J.: Data interlinking through robust linkkey extraction. In: ECAI 2014–21st European Conference on Artificial Intelligence, pp. 18–22 , Prague, Czech Republic - Including Prestigious Applications of Intelligent Systems (PAIS 2014), pp. 15–20, August 2014
Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 144–153. Springer, Heidelberg (2012)
Chen, P.Y., Popovitch, P.M.: Correlation: Parametric and Nonparametric Measures. Sage University Papers Series on Quantitative Applications in the Social Sciences (2002)
Husson, F., Lê, S., Pagé, J.: Analyse de données avec R, 2éme édition revue et augmentée (2016)
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)
Holmes, S.: Multivariate analysis: the french way, pp. 1–14 (2006)
Hyndman, R.J., Fan, Y.: Sample quantiles in statistical packages. Am. Stat. 50, 361–365 (1996)
Pernelle, N., Saïs, F., Symeonidou, D.: An automatic key discovery approach for data. J. Web Sem. 23, 16–30 (2013)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015)
Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: Gordian: efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)
Soru, T., Marx, E., Ngomo, A.-C.N.: ROCKER - a refinement operator for key discovery. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015 (2015)
Symeonidou, D., Armant, V., Pernelle, N., Saïs, F.: SAKey: Scalable Almost Key discovery in RDF data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 33–49. Springer, Heidelberg (2014)
Acknowledgments
The third author acknowledges the support of ANR grants ASPIQ (ANR-12-BS02-0003), QUALINCA (ANR-12-0012) and DURDUR (ANR-13-ALID-0002). The work of the third author has been carried out part of the research delegation at INRA MISTEA Montpellier and INRA IATE CEPIA Axe 5 Montpellier.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Symeonidou, D. et al. (2016). Key Discovery for Numerical Data: Application to Oenological Practices. In: Haemmerlé, O., Stapleton, G., Faron Zucker, C. (eds) Graph-Based Representation and Reasoning. ICCS 2016. Lecture Notes in Computer Science(), vol 9717. Springer, Cham. https://doi.org/10.1007/978-3-319-40985-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-40985-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40984-9
Online ISBN: 978-3-319-40985-6
eBook Packages: Computer ScienceComputer Science (R0)