Incremental Mining of Schema for Semistructured Data

Zhou, Aoying; Jinwen; Shuigeng, Zhou; Tian, Zenping

doi:10.1007/3-540-48912-6_22

Aoying Zhou³,
Jinwen³,
Zhou Shuigeng³ &
…
Zenping Tian³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1574))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1016 Accesses
2 Citations

Abstract

Semistructured data is specified by the lack of any fixed and rigid schema, even though typically some implicit structure appears in the data. The huge amounts of on-line applications make it important and imperative to mine schema of semistructured data, both for the users (e.g., to gather useful information and facilitate querying) and for the systems (e.g., to optimize access). The critical problem is to discover the implicit structure in the semistructured data. Current methods in extracting Web data structure are either in a general way independent of application background [8], [9], or bound in some concrete environment such as HTML etc [13], [14], [15]. But both face the burden of expensive cost and difficulty in keeping along with the frequent and complicated variances of Web data. In this paper, we first deal with the problem of incremental mining of schema for semistructured data after the update of the raw data. An algorithm for incrementally mining schema of semistructured data is provided, and some experimental results are also given, which shows that our incremental mining for semistructured data is more efficient than non-incremental mining.

This work was supported by the National Natural Science Foundation of China and the National Doctoral Subject Foundation of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

U.M Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining. AAAVMIT Press, 1996.
Google Scholar
M. S. Chen, J.H. Han, and P. S. Yu, Data Mining: An Overview from a Database Perspective. IEEE Trans. KDE, vo1.8, No.6, pp866–883, December 1996.
Google Scholar
R. Agrawa1, T. Imielinski,and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proc. of the ACM SIGMOD Conference on Management of Data. Washington, D.C.,May 1993.
Google Scholar
R. Agrawa1, R Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int’l Conference on Very Large Databases. Santiago, Chile, Sept., 1994.
Google Scholar
R. Srikant, R. Agrawa1. Mining Generalized Association Rules. In Proc. of the 21st Int’l Conference on Very Large Databases. Zurich, Switzerland, Sept., 1995.
Google Scholar
Y. Fu and J. Han. Meta-rule-guided mining of association rules in relational databases. In Proc. of 1st Int’l Workshop on Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD’95), pp.39–46, Singapore, Dec., 1995.
Google Scholar
K. Koperski and J. Han. Discovery of Spatial Association Rules in Geographic Information Databases. In Advances in Spatial Databases, Proceedings of 4’h Symposium, SSD’95. (Aug.6–9, Portland, Maine). Springer-Verlag, Berlin
Google Scholar
S. Nestorov, S. Abitebou1, and R. Motwani, Inferring Structure in Semistructured data. (http://www.cs.stanford.edu/-rajeev)
K. Wang, H.Q. Liu, Schema Discovery for Semistructured Data. In Proc. of KDD’97.
Google Scholar
Y. Papakonstantinow, H. Garcia-Marlia, and J. Widom, Object Exchange Across Heterogeneous Information Sources. In Proc. of ICDE, pp.251–260, Taiwan, March 1995.
Google Scholar
R. Agrawa1, R Srikant, Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int’l Conference on Very Large Databases, Santiago, Chile, Sept., 1994.
Google Scholar
D.W. Cheung, J. Han, and C.Y. Wong, Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique, In Proc. of ICDE, New Orleans, LA., Feb., 1996.
Google Scholar
G.O. Arocena and A.O. Mendelzon. “WebOQL: Restructuring Documents, Databases and Webs”, In Proc. of ICDE, Orlando, Florida, USA, February 1998
Google Scholar
L. Lakshmanan, F. Sadri, and I. Subramanian. “A Declarative Language for Querying and Restructuring the Web“, In Proc. of 6^th Int’l Workshop on Research Issues in Data Engineering, New Orleans, 1996.
Google Scholar
A.O. Mendelzon, G. Mihaila, and T. Milo. “Querying the World Wide Web”, In Proc. of PDIS’96, Miami, December 1996
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Fudan University, 200433, P.R.China
Aoying Zhou, Jinwen, Zhou Shuigeng & Zenping Tian

Authors

Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jinwen
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Shuigeng
View author publications
You can also search for this author in PubMed Google Scholar
Zenping Tian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Systems Engineering, Yamaguchi University, Tokiwa-Dai, 2557, Ube, 755, Japan
Ning Zhong
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Lizhu Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, A., Jinwen, Shuigeng, Z., Tian, Z. (1999). Incremental Mining of Schema for Semistructured Data. In: Zhong, N., Zhou, L. (eds) Methodologies for Knowledge Discovery and Data Mining. PAKDD 1999. Lecture Notes in Computer Science(), vol 1574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48912-6_22

Download citation

DOI: https://doi.org/10.1007/3-540-48912-6_22
Published: 24 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65866-5
Online ISBN: 978-3-540-48912-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics