Abstract
Clustering can be considered the most important unsupervised learning technique finding similar behaviors (clusters) on large collections of data. Data warehouses (DWs) can help users to analyze stored data, because they contain preprocessed data for analysis purposes. Furthermore, the multidimensional (MD) model of DWs, intuitively represents the system underneath. However, most of the clustering data mining are applied at a low-level of abstraction to complex unstructured data. While there are several approaches for clustering on DWs, there is still not a conceptual model for clustering that facilitates modeling with this technique on the multidimensional (MD) model of a DW. Here, we propose (i) a conceptual model for clustering that helps focusing on the data-mining process at the adequate abstraction level and (ii) an extension of the unified modeling language (UML) by means of the UML profiling mechanism allowing us to design clustering data-mining models on top of the MD model of a DW. This will allow us to avoid the duplication of the time-consuming preprocessing stage and simplify the clustering design on top of DWs improving the discovery of knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Comput. Surv. 31(3), 264–323 (1999)
Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge Discovery in Databases: An Overview. In: Knowledge Discovery in Databases, pp. 1–30. AAAI/MIT Press (1991)
Inmon, W.H.: Building the Data Warehouse, 2nd edn. John Wiley & Sons, Inc., New York, NY, USA (1996)
Object Management Group: Unified Modeling Language (UML), version 2.1.1 (February 2007), http://www.omg.org/technology/documents/formal/uml.htm
Luján-Mora, S., Trujillo, J., Song, I.-Y.: A UML profile for multidimensional modeling in data warehouses. Data Knowl. Eng. 59(3), 725–769 (2006)
Zubcoff, J.J., Trujillo, J.: Extending the UML for Designing Association Rule Mining Models for Data Warehouses. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 11–21. Springer, Heidelberg (2005)
Zubcoff, J., Trujillo, J.: A UML 2.0 profile to design Association Rule mining models in the multidimensional conceptual modeling of data warehouses. Data Knowl. Eng. (in press), doi:10.1016/j.datak.2006.10.007
Zubcoff, J.J., Trujillo, J.: Conceptual modeling for classification mining in data warehouses. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 566–575. Springer, Heidelberg (2006)
Rasmussen, E.M.: Clustering Algorithms. Information Retrieval: Data Structures & Algorithms, 419–442 (1992)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Object Management Group: Object Constraint Language (OCL), version 2.0. (May 2006), http://www.omg.org/technology/documents/formal/ocl.htm
Object Management Group: Common Warehouse Metamodel (CWM), version 1.1 (March 2003), http://www.omg.org/technology/documents/formal/cwm.htm
Data Mining Group: Predictive Model Markup Language (PMML), version 3.1 (visited April 2007), http://www.dmg.org/pmml-v3-1.html
Rizzi, S., Bertino, E., Catania, B., Golfarelli, M., Halkidi, M., Terrovitis, M., Vassiliadis, P., Vazirgiannis, M., Vrachnos, E.: Towards a Logical Model for Patterns. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 77–90. Springer, Heidelberg (2003)
Rizzi, S.: UML-Based Conceptual Modeling of Pattern-Bases. In: PaRMa (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zubcoff, J., Pardillo, J., Trujillo, J. (2007). Integrating Clustering Data Mining into the Multidimensional Modeling of Data Warehouses with UML Profiles. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-74553-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)