A Systematic Database Summary Generation Using the Distributed Query Discovery System

Ryu, Tae W.; Eick, Christoph F.

doi:10.1007/978-3-540-24768-5_20

Tae W. Ryu²⁰ &
Christoph F. Eick²¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3046))

Included in the following conference series:

International Conference on Computational Science and Its Applications

922 Accesses

Abstract

This paper introduces an approach to generate a database summary systematically using the distributed query discovery system, MASSON. Our approach is first to create an object-view and partition the database based on the object-view into clusters with similar properties, and then to generate the summary for each cluster. For this purpose, we propose a data set representation framework and introduce a proper similarity measure framework. The paper also describes the techniques used to generalize the generated primitive summary descriptions by MASSON and to improve the performance of the system using clustered computers and CORBA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. ACM SIGMOD, pp. 207–216 (1993)
Google Scholar
Anderson, E., Culler, D., Paterson, D.: A Case of NOW (Network of Workstations). IEEE Micro 15(1), 54–64 (1995)
Article Google Scholar
Chen, M.-S., Han, J., Yu, P.S.: Data mining: An Overview from a database perspective. IEEE Transactions on knowledge and data engineering 8(6) (1996)
Google Scholar
Dhar, V., Tuzhilin, A.: Abstract-Driven Pattern Discovery in Databases. IEEE Transactions on Knowledge and Data engineering 5 (1993)
Google Scholar
Dumant, B., Tran, D., Horn, F., Stefani, J.-B.: Jonathan: an Open Distributed Processing Environment in Java. In: Middleware 1998: IFIP International Conference on Distributed Systems and Open Distributed Processing, The Lake District, UK (1998)
Google Scholar
DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., Pregibon, D.: Squashing Flat Files Flatter. In: Proc. Of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999), San Diego, California, USA (1999)
Google Scholar
Everitt, B.S.: Cluster Analysis, 3rd edn. Edward Arnold, Copublished by Halsted Press and imprint of John Wiley & Sons Inc (1993)
Google Scholar
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering Categorical Data: An Approach Based on Dynamical Systems. In: Proc. of the 24th International Conference on Very Large Databases, New York, USA (1998)
Google Scholar
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27, 857–872 (1971)
Article Google Scholar
Hoschka, P., Klösgen, W.: A Support System for Interpreting Statistical Data. Knowledge Discovery in Databases. MIT Press, Cambridge, MA (1991)
Google Scholar
Kimm, H.L., Ryu, T.W.: A Framework for Distributed Knowledge Discovery System over Heterogeneous Networks using CORBA. In: Proc. of the ACM SIGKDD 2000 Workshop on Distributed and Parallel Knowledge Discovery, Boston, Massachusetts (2000)
Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1990)
Google Scholar
Lee, D.H., Kim, M.H.: Discovering Database Summaries through Refinements of Fuzzy Hypotheses. IEEE Transactions on Knowledge and Data engineering 5 (1993)
Google Scholar
Mitra, S., Hayashi, Y.: Neuro-Fuzzy Rule Generation: Survey in Soft Computing Framework. IEEE Transactions on Neural Networks 11(3), 748–768 (2000)
Article Google Scholar
Neri, F., Giordana, A.: A parallel genetic algorithm for concept learning. In: Proc. 6th International Conference on Genetic Algorithms, pp. 436–443 (1995)
Google Scholar
Otte, R., Patrick, P., Roy, M.: Understanding CORBA: The common object request broker architecture. Prentice Hall, Englewood Cliffs (1996)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefix- Span: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc. of the 17th International Conference on Data Engineering, Heidelberg, Germany (2001)
Google Scholar
Ryu, T.W., Eick, C.F.: Deriving Queries from Results using Genetic Programming. In: Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining, Portland, Oregon (1996)
Google Scholar
Ryu, T.W., Eick, C.F.: Similarity Measures for Multi-valued Attributes for Database Clustering. In: Proc. of the International Conference on SMART ENGINEERING SYSTEM DESIGN (ANNIE 1998), St. Louis, Missouri (1998)
Google Scholar
Ryu, T.W., Chung, H., Chang, W., Salameh, H.: Database Clustering vs. Flat File Data Clustering. In: Proc. of the International Conference on Artificial Intelligence, Las Vegas (2001)
Google Scholar
Tversky, A.: Feature of Similarity. Psychological review 84(4), 327–352 (1977)
Article Google Scholar
Wilson, D.R., Martinez, T.R.: Improved Heterogeneous Distance Functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)
MATH MathSciNet Google Scholar
Zhong, N., Ohsuga, S.: Managing/refining structural characteristics discovered from databases. In: Proc. of the 24th Hawaii International Conference on System Sciences, vol. 3, pp. 283–292 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, California State University, Fullerton, CA, 92834, USA
Tae W. Ryu
Department of Computer Science, University of Houston, Houston, TX, 77204, USA
Christoph F. Eick

Authors

Tae W. Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Christoph F. Eick
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
OptimaNumerics Ltd., Cathedral House, 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ryu, T.W., Eick, C.F. (2004). A Systematic Database Summary Generation Using the Distributed Query Discovery System. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3046. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24768-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-24768-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22060-2
Online ISBN: 978-3-540-24768-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics