Ontology driven search of compound IDs

Abelló, Alberto; Romero, Oscar

doi:10.1007/s10115-011-0418-0

Ontology driven search of compound IDs

Regular Paper
Published: 28 May 2011

Volume 32, pages 191–216, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Alberto Abelló¹ &
Oscar Romero¹

177 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Object identification is a crucial step in most information systems. Nowadays, we have many different ways to identify entities such as surrogates, keys, and object identifiers. However, not all of them guarantee the entity identity. Many works have been introduced in the literature for discovering meaningful identifiers (i.e., guaranteeing the entity identity according to the semantics of the universe of discourse), but all of them work at the logical or data level and they share some constraints inherent to the kind of approach. Addressing it at the logical level, we may miss some important data dependencies, while the cost to identify data dependencies purely at the data level may not be affordable. In this paper, we propose an approach for discovering meaningful identifiers driven by domain ontologies. In our approach, we guide the process at the conceptual level and we introduce a set of pruning rules for improving the performance by reducing the number of identifier hypotheses generated and to be verified with data. Finally, we also introduce a simulation over a case study to show the feasibility of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abelló A, Romero O (2010) Using ontologies to discover fact IDs. In: ACM 13th international workshop on data warehousing and OLAP, ACM, (to appear)
Abelló A, Samos J, Saltor F (2006) YAM ² (Yet another multidimensional model): an extension of UML. Inf Syst 31(6): 541–567
Article Google Scholar
Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Reading
MATH Google Scholar
Artale A, Calvanese D, Kontchakov R, Ryzhikov V, Zakharyaschev M (2007) Reasoning over Extended ER Models. In: Proceedings of 26th international conference on conceptual modeling, vol 4801 of Lecture notes in computer science, Springer, pp 277–292
Assawamekin N, Sunetnanta T, Pluempitiwiriyawej C (2010) Ontology-based multiperspective requirements traceability framework. Knowl Inf Syst 25(3): 493–522
Article Google Scholar
Berardi G, Calvanese D, Giacomo D (2005) Reasoning on UML class diagrams. Artif Intell 168(1-2): 70–118
Article MATH Google Scholar
Berlanga R, Jiménez-Ruiz E, Nebot V, Sanz I (2010) Faeton: form analysis and extraction tool for ontology construction. J Comput Appl Technol 39(4): 224–233
Article Google Scholar
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5)
Bondu A, Boullé M, Lemaire V (2010) A non-parametric semi-supervised discretization method. Knowl Inf Syst 24(1): 35–57
Article Google Scholar
Monash C (2008) The 1-Petabyte barrier is crumbling. http://www.networkworld.com/community/node/31439 (last access 22/11/2010)
Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Rosati R (2008) Path-based identification constraints in description logics. In: 11th international conference on principles of knowledge representation and reasoning, AAAI Press, pp 231–241
Chen PPS (1976) The entity-relationship model: toward a unified view of data. ACM Trans Database Syst 1(1): 9–36
Article Google Scholar
Codd EF (1990) The relational model for database management, version 2. Addison-Wesley, Reading
MATH Google Scholar
Dánger R, Berlanga R (2009) Generating complex ontology instances from documents. J Algorithms 64(1): 16–30
Article MathSciNet MATH Google Scholar
Demetrovics J, Thi VD (1995) Some remarks on generating Armstrong & inferring functional dependencies Relation. Acta Cybernet 12(2): 167–180
MathSciNet MATH Google Scholar
Francisco V, Gervás P, Peinado F (2010) Ontological reasoning for improving the treatment of emotions in text. Knowl Inf Syst 25(3): 421–443
Article Google Scholar
Frías L, Queralt A, Olivé A (2003) EU-Rent car rentals specification, technical report, Department de Llenguatges i Sistemes Informàtics. http://www.lsi.upc.edu/dept/techreps/llistat_detallat.php?id=690
Gaševic D, Djuric D, Devedžic V (2007) MDA-based automatic OWL ontology development. Int J Softw Tools Technol Transf 9(2): 103–117
Article Google Scholar
Giorgini P, Rizzi S, Garzetti M (2005) Goal-oriented requirement analysis for data warehouse design. In: Proceedings of 8th international workshop on data warehousing and OLAP. ACM Press, pp 47–56
Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data Warehouses. Int J Cooperat Inf Syst 7(2–3): 215–247
Article Google Scholar
Golfarelli M, Rizzi S (2009) Data warehouse design. McGraw-Hill, New York
Google Scholar
Hainaut J, Chandelon M, Tonneau C, Joris M (1993) Contribution to a theory of database reverse engineering. In: Proceedigs of the 1st working conference on reverse engineering IEEE, pp 161–170
Halpin T, Morgan T (2008) Information modeling and relational databases. Morgan Kauffman, San Francisco
Google Scholar
Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H (1999) Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2): 100–111
Article MATH Google Scholar
Jensen MR, Holmgren T, Pedersen TB (2004) Discovering multidimensional structure in relational data. In: 6th international conferences on data warehousing and knowledge discovery. Springer, Berlin, pp 138–148
Kimball R, Reeves L, Thornthwaite W, Ross M (1998) The data warehouse lifecycle toolkit: expert methods for designing, developing and deploying data warehouses. Wiley, New York
Google Scholar
King RS, Legendre JJ (2003) Discovery of functional and approximate functional dependencies in relational databases. J Appl Math Data Syst 7(1): 49–59
MathSciNet MATH Google Scholar
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine learning, pp 249–256
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2): 273–324
Article MATH Google Scholar
Lenzerini M (2002) Data integration: a theoretical perspective. In: Proceedings of 21th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, ACM, pp 233–246
Lopes S, Petit J-M, Lakhal L (2000) Efficient discovery of functional dependencies and armstrong relations. In: 7th international conference on extending database technology, EDBT’00’, vol 1777 of Lecture notes in computer science, Springer, Berlin, pp 350–364
Kantola M, Mannila H, Räihä K-J, Siirtola H (1992) Discovering functional and inclusion dependencies in relational databases. Int J Intell Syst 7(7): 591–607
Article MATH Google Scholar
Miller RJ, Haas LM, Hernández MA (2000) Schema mapping as query discovery. In: Proceedings of 26th international conference on very large data bases, VLDB’00, Morgan Kaufmann, San Francisco, pp 77–88
Nebot V, Berlanga R (2009) Building tailored ontologies from very large knowledge resources. In: Proceedings of the 11th international conference on enterprise information systems, ICEIS’09’, pp 144–151
Novelli N, Cicchetti R (2001) Fun: an efficient algorithm for mining functional and embedded dependencies. In: 8th International conference on database theory, ICDT’01’, vol 1973 of Lecture notes in computer science, Springer, pp 189–203
Olivé A (2004) On the role of conceptual schemas in information systems development. In: 9th International conference on reliable software technologies, Springer
OMG (2010) Unified modeling language (UML), version 2.3. http://www.omg.org (last access 20/10/10)
Phipps C, Davis KC (2002) Automating data warehouse conceptual schema design and evaluation. In: 4th International workshop on design and management of data warehouses, vol 58, CEUR-WS.org, pp 23–32
Poggi A, Lembo D, Calvanese D, De Giacomo G, Lenzerini M, Rosati R (2008) Linking data to ontologies. J Data Semant 10: 133–173
Google Scholar
Ramakrishnan R, Gehrke J (2003) Database management systems. McGraw Hill, New York
MATH Google Scholar
Romero O (2010) Automating the multidimensional design of data warehouses. PhD thesis, Universitat Politècnica de Catalunya, Barcelona, Spain. http://www.tesisenxarxa.net/TESIS_UPC/AVAILABLE/TDX-0528110-134628//TORM1de1.pdf
Romero O, Abelló A (2010) A framework for multidimensional design of data warehouses from ontologies. Data Knowl Eng 69(11): 1138–1157
Article Google Scholar
Romero O, Abelló A (2010) Automatic validation of requirements to support multidimensional design. Data Knowl Eng 69(9): 917–942
Article Google Scholar
Sismanis Y, Brown P, Haas PJ, Reinwald B (2006) GORDIAN: efficient and scalable discovery of composite keys. In: 32nd international conference on very large data bases, ACM, pp 691–702
Skoutas D, Simitsis A (2007) Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int J Semant Web Inf Syst 3(4): 1–24
Article Google Scholar
Soutou C (1998) Relational database reverse engineering: algorithms to extract cardinality constraints. Data Knowl Eng 28(2): 161–207
Article MATH Google Scholar
Stanford Center for Biomedical Informatics Research (2009) Protégé-OWL API. Last access 17/12/2009. http://protege.stanford.edu/plugins/owl/api
Tan HBK, Zhao Y (2004) Automated elicitation of functional dependencies from source codes of database transactions. Inf Softw Technol 46(2): 109–117
Article Google Scholar
W3C (2009) OWL web ontology language overview. http://www.w3.org/TR/owl-features/. Last access 17/12/2009
Wieringa R, de Jonge W (1995) Object identifiers, keys, and surrogates: object identifiers revisited. Theory Pract Object Syst 1(2): 101–114
Google Scholar
Wonnacott TH, Wonnacott RJ (1990) Introductory statistics. Wiley, New York
Google Scholar
Yang X, Procopic CM, Srivastava D (2009) Summarizing Relational Databases. In: International conference on very large databases (VLDB), ACM, pp 634–645
Yao H, Hamilton HJ (2008) Mining functional dependencies from data. Data Min Knowl Discov 16(2): 197–219
Article MathSciNet Google Scholar
Yeh D, Li Y, Chu WC (2008) Extracting E-R diagram from a table-based legacy database. J Syst Softw 81(5): 764–771
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Enginyeria de Serveis i Sistemes d’Informació, Universitat Politècnica de Catalunya, 08034, Barcelona, Spain
Alberto Abelló & Oscar Romero

Authors

Alberto Abelló
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Romero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Abelló.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abelló, A., Romero, O. Ontology driven search of compound IDs. Knowl Inf Syst 32, 191–216 (2012). https://doi.org/10.1007/s10115-011-0418-0

Download citation

Received: 24 November 2010
Revised: 08 March 2011
Accepted: 11 May 2011
Published: 28 May 2011
Issue Date: July 2012
DOI: https://doi.org/10.1007/s10115-011-0418-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ontology driven search of compound IDs

Abstract

Access this article

Similar content being viewed by others

Detecting Meaningful Compounds in Complex Class Labels

When Peculiarity Makes a Difference: Object Characterisation in Heterogeneous Information Networks

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ontology driven search of compound IDs

Abstract

Access this article

Similar content being viewed by others

Detecting Meaningful Compounds in Complex Class Labels

When Peculiarity Makes a Difference: Object Characterisation in Heterogeneous Information Networks

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation