Abstract
Object identification is a crucial step in most information systems. Nowadays, we have many different ways to identify entities such as surrogates, keys, and object identifiers. However, not all of them guarantee the entity identity. Many works have been introduced in the literature for discovering meaningful identifiers (i.e., guaranteeing the entity identity according to the semantics of the universe of discourse), but all of them work at the logical or data level and they share some constraints inherent to the kind of approach. Addressing it at the logical level, we may miss some important data dependencies, while the cost to identify data dependencies purely at the data level may not be affordable. In this paper, we propose an approach for discovering meaningful identifiers driven by domain ontologies. In our approach, we guide the process at the conceptual level and we introduce a set of pruning rules for improving the performance by reducing the number of identifier hypotheses generated and to be verified with data. Finally, we also introduce a simulation over a case study to show the feasibility of our method.
Similar content being viewed by others
References
Abelló A, Romero O (2010) Using ontologies to discover fact IDs. In: ACM 13th international workshop on data warehousing and OLAP, ACM, (to appear)
Abelló A, Samos J, Saltor F (2006) YAM 2 (Yet another multidimensional model): an extension of UML. Inf Syst 31(6): 541–567
Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Reading
Artale A, Calvanese D, Kontchakov R, Ryzhikov V, Zakharyaschev M (2007) Reasoning over Extended ER Models. In: Proceedings of 26th international conference on conceptual modeling, vol 4801 of Lecture notes in computer science, Springer, pp 277–292
Assawamekin N, Sunetnanta T, Pluempitiwiriyawej C (2010) Ontology-based multiperspective requirements traceability framework. Knowl Inf Syst 25(3): 493–522
Berardi G, Calvanese D, Giacomo D (2005) Reasoning on UML class diagrams. Artif Intell 168(1-2): 70–118
Berlanga R, Jiménez-Ruiz E, Nebot V, Sanz I (2010) Faeton: form analysis and extraction tool for ontology construction. J Comput Appl Technol 39(4): 224–233
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5)
Bondu A, Boullé M, Lemaire V (2010) A non-parametric semi-supervised discretization method. Knowl Inf Syst 24(1): 35–57
Monash C (2008) The 1-Petabyte barrier is crumbling. http://www.networkworld.com/community/node/31439 (last access 22/11/2010)
Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Rosati R (2008) Path-based identification constraints in description logics. In: 11th international conference on principles of knowledge representation and reasoning, AAAI Press, pp 231–241
Chen PPS (1976) The entity-relationship model: toward a unified view of data. ACM Trans Database Syst 1(1): 9–36
Codd EF (1990) The relational model for database management, version 2. Addison-Wesley, Reading
Dánger R, Berlanga R (2009) Generating complex ontology instances from documents. J Algorithms 64(1): 16–30
Demetrovics J, Thi VD (1995) Some remarks on generating Armstrong & inferring functional dependencies Relation. Acta Cybernet 12(2): 167–180
Francisco V, Gervás P, Peinado F (2010) Ontological reasoning for improving the treatment of emotions in text. Knowl Inf Syst 25(3): 421–443
Frías L, Queralt A, Olivé A (2003) EU-Rent car rentals specification, technical report, Department de Llenguatges i Sistemes Informàtics. http://www.lsi.upc.edu/dept/techreps/llistat_detallat.php?id=690
Gaševic D, Djuric D, Devedžic V (2007) MDA-based automatic OWL ontology development. Int J Softw Tools Technol Transf 9(2): 103–117
Giorgini P, Rizzi S, Garzetti M (2005) Goal-oriented requirement analysis for data warehouse design. In: Proceedings of 8th international workshop on data warehousing and OLAP. ACM Press, pp 47–56
Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data Warehouses. Int J Cooperat Inf Syst 7(2–3): 215–247
Golfarelli M, Rizzi S (2009) Data warehouse design. McGraw-Hill, New York
Hainaut J, Chandelon M, Tonneau C, Joris M (1993) Contribution to a theory of database reverse engineering. In: Proceedigs of the 1st working conference on reverse engineering IEEE, pp 161–170
Halpin T, Morgan T (2008) Information modeling and relational databases. Morgan Kauffman, San Francisco
Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H (1999) Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2): 100–111
Jensen MR, Holmgren T, Pedersen TB (2004) Discovering multidimensional structure in relational data. In: 6th international conferences on data warehousing and knowledge discovery. Springer, Berlin, pp 138–148
Kimball R, Reeves L, Thornthwaite W, Ross M (1998) The data warehouse lifecycle toolkit: expert methods for designing, developing and deploying data warehouses. Wiley, New York
King RS, Legendre JJ (2003) Discovery of functional and approximate functional dependencies in relational databases. J Appl Math Data Syst 7(1): 49–59
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine learning, pp 249–256
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2): 273–324
Lenzerini M (2002) Data integration: a theoretical perspective. In: Proceedings of 21th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, ACM, pp 233–246
Lopes S, Petit J-M, Lakhal L (2000) Efficient discovery of functional dependencies and armstrong relations. In: 7th international conference on extending database technology, EDBT’00’, vol 1777 of Lecture notes in computer science, Springer, Berlin, pp 350–364
Kantola M, Mannila H, Räihä K-J, Siirtola H (1992) Discovering functional and inclusion dependencies in relational databases. Int J Intell Syst 7(7): 591–607
Miller RJ, Haas LM, Hernández MA (2000) Schema mapping as query discovery. In: Proceedings of 26th international conference on very large data bases, VLDB’00, Morgan Kaufmann, San Francisco, pp 77–88
Nebot V, Berlanga R (2009) Building tailored ontologies from very large knowledge resources. In: Proceedings of the 11th international conference on enterprise information systems, ICEIS’09’, pp 144–151
Novelli N, Cicchetti R (2001) Fun: an efficient algorithm for mining functional and embedded dependencies. In: 8th International conference on database theory, ICDT’01’, vol 1973 of Lecture notes in computer science, Springer, pp 189–203
Olivé A (2004) On the role of conceptual schemas in information systems development. In: 9th International conference on reliable software technologies, Springer
OMG (2010) Unified modeling language (UML), version 2.3. http://www.omg.org (last access 20/10/10)
Phipps C, Davis KC (2002) Automating data warehouse conceptual schema design and evaluation. In: 4th International workshop on design and management of data warehouses, vol 58, CEUR-WS.org, pp 23–32
Poggi A, Lembo D, Calvanese D, De Giacomo G, Lenzerini M, Rosati R (2008) Linking data to ontologies. J Data Semant 10: 133–173
Ramakrishnan R, Gehrke J (2003) Database management systems. McGraw Hill, New York
Romero O (2010) Automating the multidimensional design of data warehouses. PhD thesis, Universitat Politècnica de Catalunya, Barcelona, Spain. http://www.tesisenxarxa.net/TESIS_UPC/AVAILABLE/TDX-0528110-134628//TORM1de1.pdf
Romero O, Abelló A (2010) A framework for multidimensional design of data warehouses from ontologies. Data Knowl Eng 69(11): 1138–1157
Romero O, Abelló A (2010) Automatic validation of requirements to support multidimensional design. Data Knowl Eng 69(9): 917–942
Sismanis Y, Brown P, Haas PJ, Reinwald B (2006) GORDIAN: efficient and scalable discovery of composite keys. In: 32nd international conference on very large data bases, ACM, pp 691–702
Skoutas D, Simitsis A (2007) Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int J Semant Web Inf Syst 3(4): 1–24
Soutou C (1998) Relational database reverse engineering: algorithms to extract cardinality constraints. Data Knowl Eng 28(2): 161–207
Stanford Center for Biomedical Informatics Research (2009) Protégé-OWL API. Last access 17/12/2009. http://protege.stanford.edu/plugins/owl/api
Tan HBK, Zhao Y (2004) Automated elicitation of functional dependencies from source codes of database transactions. Inf Softw Technol 46(2): 109–117
W3C (2009) OWL web ontology language overview. http://www.w3.org/TR/owl-features/. Last access 17/12/2009
Wieringa R, de Jonge W (1995) Object identifiers, keys, and surrogates: object identifiers revisited. Theory Pract Object Syst 1(2): 101–114
Wonnacott TH, Wonnacott RJ (1990) Introductory statistics. Wiley, New York
Yang X, Procopic CM, Srivastava D (2009) Summarizing Relational Databases. In: International conference on very large databases (VLDB), ACM, pp 634–645
Yao H, Hamilton HJ (2008) Mining functional dependencies from data. Data Min Knowl Discov 16(2): 197–219
Yeh D, Li Y, Chu WC (2008) Extracting E-R diagram from a table-based legacy database. J Syst Softw 81(5): 764–771
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abelló, A., Romero, O. Ontology driven search of compound IDs. Knowl Inf Syst 32, 191–216 (2012). https://doi.org/10.1007/s10115-011-0418-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0418-0