Skip to main content

Multi Level Mining of Warehouse Schema

  • Conference paper
Networked Digital Technologies (NDT 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 136))

Included in the following conference series:

Abstract

The two mature disciplines, namely Data Mining and Data Warehousing have broadly the same set of objectives. Yet, they have developed largely separate from each other resulting in different techniques being used in each discipline. It has been recognized that mining techniques developed for pattern recognition such as Clustering and Visualization can assist in designing data warehouse schema. However, a suitable methodology is required for the seamless integration of mining methods in the design of warehouse schema. In previous work, we presented a methodology that employs hierarchical clustering to derive a tree structure that can be used by a data warehouse designer to build a schema. We believe that, in order to strengthen the decision making process, there is a strong need for a method that automatically extracts knowledge present at different levels of abstraction from a warehouse. We demonstrate with examples how mining at different levels of a hierarchical warehouse schema can give new insights about the underlying data cluster which not only helps in building more meaningful dimensions and facts for data warehouse design but can also improve the decision making process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering 14(4), 673–690 (2002)

    Article  Google Scholar 

  2. Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering 63(2), 503–527 (2007)

    Article  Google Scholar 

  3. Rosario, G.E., Rundensteiner, E.A., Brown, D.C., et al.: Mapping nominal values to numbers for effective visualization. Information Visualization 3(2), 80–95 (2004)

    Article  Google Scholar 

  4. Ankerst, M., Berchtold, S., Keim, D.A.: Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In: Proceedings of the IEEE Symposium on Information Visualization(InfoVis), p. 52 (1998)

    Google Scholar 

  5. Fua, Y.H., Ward, M.O., Rundensteiner, E.A.: Hierarchical parallel coordinates for exploration of large datasets, pp. 43–50

    Google Scholar 

  6. Chen, J.X., Wang, S.: Data visualization: parallel coordinates and dimension reduction. Computing in Science & Engineering 3(5), 110–112 (2001)

    Article  Google Scholar 

  7. Artero, A.O., de Oliveira, M.C.F., Levkowitz, H.: Enhanced high dimensional data visualization through dimension reduction and attribute arrangement, pp. 707–712

    Google Scholar 

  8. Dori, D., Feldman, R., Sturm, A.: From conceptual models to schemata: An object-process-based data warehouse construction method. Information Systems 33(6), 567–593 (2008)

    Article  Google Scholar 

  9. Kohavi. R., Becker. B.: UCI repository of machine learning databases, (January 20, 2011), http://archive.ics.uci.edu/ml/datasets/Adult , http://archive.ics.uci.edu/ml/datasets/Adult

  10. Seo, J., Bakay, M., Zhao, P., et al.: Interactive color mosaic and dendrogram displays for signal/noise optimization in microarray data analysis, pp. 461–464

    Google Scholar 

  11. Ward, M.O.: Xmdvtool: Integrating multiple methods for visualizing multivariate data, pp. 326–333

    Google Scholar 

  12. Soni, S., Kurtz, W.: Analysis Services: optimizing cube performance using Microsoft SQL server 2000 Analysis Services. Microsoft SQL Server 2000 Technical Articles (2001)

    Google Scholar 

  13. Milenova, B.L., Campos, M.M.: O-cluster: scalable clustering of large high dimensional data sets, pp. 290–297

    Google Scholar 

  14. Milenova, B.L., Campos, M.M.: Clustering large databases with numeric and nominal values using orthogonal projections

    Google Scholar 

  15. Doring, C., Borgelt, C., Kruse, R.: Fuzzy clustering of quantitative and qualitative data, pp. 84–89

    Google Scholar 

  16. Luo, H., Kong, F., Li, Y.: Clustering mixed data based on evidence accumulation. Advanced Data Mining and Applications 4093, 348–355 (2006)

    Article  Google Scholar 

  17. McCane, B., Albert, M.: Distance functions for categorical and mixed variables. Pattern Recognition Letters 29(7), 986–993 (2008)

    Article  Google Scholar 

  18. Hsu, C.C., Chen, C.L., Su, Y.W.: Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences 177(20), 4474–4492 (2007)

    Article  Google Scholar 

  19. Artero, A.O., de Oliveira, M.C.F., Levkowitz, H.: Uncovering clusters in crowded parallel coordinates visualizations. In: Proceedings of the IEEE Symposium on Information Visualization(InfoVis), pp. 81–88 (2004)

    Google Scholar 

  20. Pardillo, J., Mazón, J.N.: Designing OLAP schemata for data warehouses from conceptual models with MDA. Decision Support Systems (2010)

    Google Scholar 

  21. Palopoli, L., Pontieri, L., Terracina, G., et al.: A novel three-level architecture for large data warehouses* 1. Journal of Systems Architecture 47(11), 937–958 (2002)

    Article  Google Scholar 

  22. Song, I.Y., Khare, R., An, Y., et al.: Samstar: An automatic tool for generating star schemas from an entity-relationship diagram, pp. 522–523

    Google Scholar 

  23. Usman, M., Asghar, S., Fong, S.: A Conceptual Model for Combining Enhanced OLAP and Data Mining Systems. In: 2009 Fifth International Joint Conference on INC, IMS and IDC, pp. 1958–1963 (2009)

    Google Scholar 

  24. Usman, M., Asghar, S., Fong, S.: Integrated Performance and Visualization Enhancement of OLAP Using Growing Self Organizing Neural Networks. Journal of Advances in Information Technology 1(1), 26–37 (2010)

    Article  Google Scholar 

  25. Asghar, S., Alahakoon, D., Hsu, A.: Enhancing OLAP functionality using self-organizing neural networks. Neural, Parallel & Scientific Computations 12(1), 1–20 (2004)

    MATH  Google Scholar 

  26. Goil, S., Choudhary, A.: PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining. Journal of parallel and distributed computing 61(3), 285–321 (2001)

    Article  MATH  Google Scholar 

  27. Usman, M., Pears, R.: A methodology for integrating and exploiting data mining techniques in the design of data warehouses. In: Proceedings of ICMIA2010 2nd International Conference on Data Mining and Intelligent Information Technology Applications, Seoul (November 2010)

    Google Scholar 

  28. Kohavi, R., Becker, B.: Adult dataset (1996), http://archive.ics.uci.edu/ml/datasets/Adult

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Usman, M., Pears, R. (2011). Multi Level Mining of Warehouse Schema. In: Fong, S. (eds) Networked Digital Technologies. NDT 2011. Communications in Computer and Information Science, vol 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22185-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22185-9_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22184-2

  • Online ISBN: 978-3-642-22185-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics