Abstract
This paper introduces an approach to generate a database summary systematically using the distributed query discovery system, MASSON. Our approach is first to create an object-view and partition the database based on the object-view into clusters with similar properties, and then to generate the summary for each cluster. For this purpose, we propose a data set representation framework and introduce a proper similarity measure framework. The paper also describes the techniques used to generalize the generated primitive summary descriptions by MASSON and to improve the performance of the system using clustered computers and CORBA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. ACM SIGMOD, pp. 207–216 (1993)
Anderson, E., Culler, D., Paterson, D.: A Case of NOW (Network of Workstations). IEEE Micro 15(1), 54–64 (1995)
Chen, M.-S., Han, J., Yu, P.S.: Data mining: An Overview from a database perspective. IEEE Transactions on knowledge and data engineering 8(6) (1996)
Dhar, V., Tuzhilin, A.: Abstract-Driven Pattern Discovery in Databases. IEEE Transactions on Knowledge and Data engineering 5 (1993)
Dumant, B., Tran, D., Horn, F., Stefani, J.-B.: Jonathan: an Open Distributed Processing Environment in Java. In: Middleware 1998: IFIP International Conference on Distributed Systems and Open Distributed Processing, The Lake District, UK (1998)
DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., Pregibon, D.: Squashing Flat Files Flatter. In: Proc. Of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999), San Diego, California, USA (1999)
Everitt, B.S.: Cluster Analysis, 3rd edn. Edward Arnold, Copublished by Halsted Press and imprint of John Wiley & Sons Inc (1993)
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering Categorical Data: An Approach Based on Dynamical Systems. In: Proc. of the 24th International Conference on Very Large Databases, New York, USA (1998)
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27, 857–872 (1971)
Hoschka, P., Klösgen, W.: A Support System for Interpreting Statistical Data. Knowledge Discovery in Databases. MIT Press, Cambridge, MA (1991)
Kimm, H.L., Ryu, T.W.: A Framework for Distributed Knowledge Discovery System over Heterogeneous Networks using CORBA. In: Proc. of the ACM SIGKDD 2000 Workshop on Distributed and Parallel Knowledge Discovery, Boston, Massachusetts (2000)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1990)
Lee, D.H., Kim, M.H.: Discovering Database Summaries through Refinements of Fuzzy Hypotheses. IEEE Transactions on Knowledge and Data engineering 5 (1993)
Mitra, S., Hayashi, Y.: Neuro-Fuzzy Rule Generation: Survey in Soft Computing Framework. IEEE Transactions on Neural Networks 11(3), 748–768 (2000)
Neri, F., Giordana, A.: A parallel genetic algorithm for concept learning. In: Proc. 6th International Conference on Genetic Algorithms, pp. 436–443 (1995)
Otte, R., Patrick, P., Roy, M.: Understanding CORBA: The common object request broker architecture. Prentice Hall, Englewood Cliffs (1996)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefix- Span: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc. of the 17th International Conference on Data Engineering, Heidelberg, Germany (2001)
Ryu, T.W., Eick, C.F.: Deriving Queries from Results using Genetic Programming. In: Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining, Portland, Oregon (1996)
Ryu, T.W., Eick, C.F.: Similarity Measures for Multi-valued Attributes for Database Clustering. In: Proc. of the International Conference on SMART ENGINEERING SYSTEM DESIGN (ANNIE 1998), St. Louis, Missouri (1998)
Ryu, T.W., Chung, H., Chang, W., Salameh, H.: Database Clustering vs. Flat File Data Clustering. In: Proc. of the International Conference on Artificial Intelligence, Las Vegas (2001)
Tversky, A.: Feature of Similarity. Psychological review 84(4), 327–352 (1977)
Wilson, D.R., Martinez, T.R.: Improved Heterogeneous Distance Functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)
Zhong, N., Ohsuga, S.: Managing/refining structural characteristics discovered from databases. In: Proc. of the 24th Hawaii International Conference on System Sciences, vol. 3, pp. 283–292 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ryu, T.W., Eick, C.F. (2004). A Systematic Database Summary Generation Using the Distributed Query Discovery System. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3046. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24768-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-24768-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22060-2
Online ISBN: 978-3-540-24768-5
eBook Packages: Springer Book Archive