Skip to main content
Log in

Ontology driven search of compound IDs

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Object identification is a crucial step in most information systems. Nowadays, we have many different ways to identify entities such as surrogates, keys, and object identifiers. However, not all of them guarantee the entity identity. Many works have been introduced in the literature for discovering meaningful identifiers (i.e., guaranteeing the entity identity according to the semantics of the universe of discourse), but all of them work at the logical or data level and they share some constraints inherent to the kind of approach. Addressing it at the logical level, we may miss some important data dependencies, while the cost to identify data dependencies purely at the data level may not be affordable. In this paper, we propose an approach for discovering meaningful identifiers driven by domain ontologies. In our approach, we guide the process at the conceptual level and we introduce a set of pruning rules for improving the performance by reducing the number of identifier hypotheses generated and to be verified with data. Finally, we also introduce a simulation over a case study to show the feasibility of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abelló A, Romero O (2010) Using ontologies to discover fact IDs. In: ACM 13th international workshop on data warehousing and OLAP, ACM, (to appear)

  2. Abelló A, Samos J, Saltor F (2006) YAM 2 (Yet another multidimensional model): an extension of UML. Inf Syst 31(6): 541–567

    Article  Google Scholar 

  3. Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Reading

    MATH  Google Scholar 

  4. Artale A, Calvanese D, Kontchakov R, Ryzhikov V, Zakharyaschev M (2007) Reasoning over Extended ER Models. In: Proceedings of 26th international conference on conceptual modeling, vol 4801 of Lecture notes in computer science, Springer, pp 277–292

  5. Assawamekin N, Sunetnanta T, Pluempitiwiriyawej C (2010) Ontology-based multiperspective requirements traceability framework. Knowl Inf Syst 25(3): 493–522

    Article  Google Scholar 

  6. Berardi G, Calvanese D, Giacomo D (2005) Reasoning on UML class diagrams. Artif Intell 168(1-2): 70–118

    Article  MATH  Google Scholar 

  7. Berlanga R, Jiménez-Ruiz E, Nebot V, Sanz I (2010) Faeton: form analysis and extraction tool for ontology construction. J Comput Appl Technol 39(4): 224–233

    Article  Google Scholar 

  8. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5)

  9. Bondu A, Boullé M, Lemaire V (2010) A non-parametric semi-supervised discretization method. Knowl Inf Syst 24(1): 35–57

    Article  Google Scholar 

  10. Monash C (2008) The 1-Petabyte barrier is crumbling. http://www.networkworld.com/community/node/31439 (last access 22/11/2010)

  11. Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Rosati R (2008) Path-based identification constraints in description logics. In: 11th international conference on principles of knowledge representation and reasoning, AAAI Press, pp 231–241

  12. Chen PPS (1976) The entity-relationship model: toward a unified view of data. ACM Trans Database Syst 1(1): 9–36

    Article  Google Scholar 

  13. Codd EF (1990) The relational model for database management, version 2. Addison-Wesley, Reading

    MATH  Google Scholar 

  14. Dánger R, Berlanga R (2009) Generating complex ontology instances from documents. J Algorithms 64(1): 16–30

    Article  MathSciNet  MATH  Google Scholar 

  15. Demetrovics J, Thi VD (1995) Some remarks on generating Armstrong & inferring functional dependencies Relation. Acta Cybernet 12(2): 167–180

    MathSciNet  MATH  Google Scholar 

  16. Francisco V, Gervás P, Peinado F (2010) Ontological reasoning for improving the treatment of emotions in text. Knowl Inf Syst 25(3): 421–443

    Article  Google Scholar 

  17. Frías L, Queralt A, Olivé A (2003) EU-Rent car rentals specification, technical report, Department de Llenguatges i Sistemes Informàtics. http://www.lsi.upc.edu/dept/techreps/llistat_detallat.php?id=690

  18. Gaševic D, Djuric D, Devedžic V (2007) MDA-based automatic OWL ontology development. Int J Softw Tools Technol Transf 9(2): 103–117

    Article  Google Scholar 

  19. Giorgini P, Rizzi S, Garzetti M (2005) Goal-oriented requirement analysis for data warehouse design. In: Proceedings of 8th international workshop on data warehousing and OLAP. ACM Press, pp 47–56

  20. Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data Warehouses. Int J Cooperat Inf Syst 7(2–3): 215–247

    Article  Google Scholar 

  21. Golfarelli M, Rizzi S (2009) Data warehouse design. McGraw-Hill, New York

    Google Scholar 

  22. Hainaut J, Chandelon M, Tonneau C, Joris M (1993) Contribution to a theory of database reverse engineering. In: Proceedigs of the 1st working conference on reverse engineering IEEE, pp 161–170

  23. Halpin T, Morgan T (2008) Information modeling and relational databases. Morgan Kauffman, San Francisco

    Google Scholar 

  24. Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H (1999) Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2): 100–111

    Article  MATH  Google Scholar 

  25. Jensen MR, Holmgren T, Pedersen TB (2004) Discovering multidimensional structure in relational data. In: 6th international conferences on data warehousing and knowledge discovery. Springer, Berlin, pp 138–148

  26. Kimball R, Reeves L, Thornthwaite W, Ross M (1998) The data warehouse lifecycle toolkit: expert methods for designing, developing and deploying data warehouses. Wiley, New York

    Google Scholar 

  27. King RS, Legendre JJ (2003) Discovery of functional and approximate functional dependencies in relational databases. J Appl Math Data Syst 7(1): 49–59

    MathSciNet  MATH  Google Scholar 

  28. Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine learning, pp 249–256

  29. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2): 273–324

    Article  MATH  Google Scholar 

  30. Lenzerini M (2002) Data integration: a theoretical perspective. In: Proceedings of 21th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, ACM, pp 233–246

  31. Lopes S, Petit J-M, Lakhal L (2000) Efficient discovery of functional dependencies and armstrong relations. In: 7th international conference on extending database technology, EDBT’00’, vol 1777 of Lecture notes in computer science, Springer, Berlin, pp 350–364

  32. Kantola M, Mannila H, Räihä K-J, Siirtola H (1992) Discovering functional and inclusion dependencies in relational databases. Int J Intell Syst 7(7): 591–607

    Article  MATH  Google Scholar 

  33. Miller RJ, Haas LM, Hernández MA (2000) Schema mapping as query discovery. In: Proceedings of 26th international conference on very large data bases, VLDB’00, Morgan Kaufmann, San Francisco, pp 77–88

  34. Nebot V, Berlanga R (2009) Building tailored ontologies from very large knowledge resources. In: Proceedings of the 11th international conference on enterprise information systems, ICEIS’09’, pp 144–151

  35. Novelli N, Cicchetti R (2001) Fun: an efficient algorithm for mining functional and embedded dependencies. In: 8th International conference on database theory, ICDT’01’, vol 1973 of Lecture notes in computer science, Springer, pp 189–203

  36. Olivé A (2004) On the role of conceptual schemas in information systems development. In: 9th International conference on reliable software technologies, Springer

  37. OMG (2010) Unified modeling language (UML), version 2.3. http://www.omg.org (last access 20/10/10)

  38. Phipps C, Davis KC (2002) Automating data warehouse conceptual schema design and evaluation. In: 4th International workshop on design and management of data warehouses, vol 58, CEUR-WS.org, pp 23–32

  39. Poggi A, Lembo D, Calvanese D, De Giacomo G, Lenzerini M, Rosati R (2008) Linking data to ontologies. J Data Semant 10: 133–173

    Google Scholar 

  40. Ramakrishnan R, Gehrke J (2003) Database management systems. McGraw Hill, New York

    MATH  Google Scholar 

  41. Romero O (2010) Automating the multidimensional design of data warehouses. PhD thesis, Universitat Politècnica de Catalunya, Barcelona, Spain. http://www.tesisenxarxa.net/TESIS_UPC/AVAILABLE/TDX-0528110-134628//TORM1de1.pdf

  42. Romero O, Abelló A (2010) A framework for multidimensional design of data warehouses from ontologies. Data Knowl Eng 69(11): 1138–1157

    Article  Google Scholar 

  43. Romero O, Abelló A (2010) Automatic validation of requirements to support multidimensional design. Data Knowl Eng 69(9): 917–942

    Article  Google Scholar 

  44. Sismanis Y, Brown P, Haas PJ, Reinwald B (2006) GORDIAN: efficient and scalable discovery of composite keys. In: 32nd international conference on very large data bases, ACM, pp 691–702

  45. Skoutas D, Simitsis A (2007) Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int J Semant Web Inf Syst 3(4): 1–24

    Article  Google Scholar 

  46. Soutou C (1998) Relational database reverse engineering: algorithms to extract cardinality constraints. Data Knowl Eng 28(2): 161–207

    Article  MATH  Google Scholar 

  47. Stanford Center for Biomedical Informatics Research (2009) Protégé-OWL API. Last access 17/12/2009. http://protege.stanford.edu/plugins/owl/api

  48. Tan HBK, Zhao Y (2004) Automated elicitation of functional dependencies from source codes of database transactions. Inf Softw Technol 46(2): 109–117

    Article  Google Scholar 

  49. W3C (2009) OWL web ontology language overview. http://www.w3.org/TR/owl-features/. Last access 17/12/2009

  50. Wieringa R, de Jonge W (1995) Object identifiers, keys, and surrogates: object identifiers revisited. Theory Pract Object Syst 1(2): 101–114

    Google Scholar 

  51. Wonnacott TH, Wonnacott RJ (1990) Introductory statistics. Wiley, New York

    Google Scholar 

  52. Yang X, Procopic CM, Srivastava D (2009) Summarizing Relational Databases. In: International conference on very large databases (VLDB), ACM, pp 634–645

  53. Yao H, Hamilton HJ (2008) Mining functional dependencies from data. Data Min Knowl Discov 16(2): 197–219

    Article  MathSciNet  Google Scholar 

  54. Yeh D, Li Y, Chu WC (2008) Extracting E-R diagram from a table-based legacy database. J Syst Softw 81(5): 764–771

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Abelló.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abelló, A., Romero, O. Ontology driven search of compound IDs. Knowl Inf Syst 32, 191–216 (2012). https://doi.org/10.1007/s10115-011-0418-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0418-0

Keywords

Navigation