Skip to main content
Log in

Extracting local schema from semistructured data based on graph-oriented semantic model

  • Correspondence
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Many modern applications (e-commerce, digital library, etc.) require integrated access to various information sources (from traditional RDBMS to semistructured Web repositories). Extracting schema from semistructured data is a prerequisite to integrate heterogeneous information sources. The traditional method that extracts global schema may require time (and space) to increase exponentially with the number of objects and edges in the source. A new method is presented in this paper, which is about extracting local schema. In this method, the algorithm controls the scale of extracting schema within the “schema diameter” by examining the semantic distance of the target set and using the Hash class and its path distance operation. This method is very efficient for restraining schema from expanding. The prototype validates the new approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Serge Abiteboul. Querying semi-structured data.Lecture Notes in Computer Science 1186, Foto Afrati, Phokion Kolaties (eds.), InProc. International Conference on Database Theory, New York: Springer-Verlag, 1997, pp.1–18.

    Google Scholar 

  2. Nestorov S, Abiteboul S, Motwani R. Extracting schema from semistructured data. InProceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, Washington, May, 1998, pp.295–306.

  3. Buneman P, Davidson S, Fernandez M, Suciu D. Adding structure to unstructured data. InProceedings of the International Conference on Database Theory, Delphi, Greece, January, 1997, pp.335–350.

  4. Goldman R, Widom J. Data Guide: Enabling query formulation and optimization in semistructured database. InProceedings of the Twenty-Third International Conference on Very Large Data Base, Bymatthias Jarke (ed.), Athens, Greece: Morgan Kaufmann, 1997. pp.436–445.

    Google Scholar 

  5. Nestorov S, Ullman J, Wiener J, Chawathe S. Representative objects: Concise representations of semistructured, hierarchical data. InProceedings of International Conference on Data Engineering, Birmingham, U.K., April, 1997, pp.79–90.

  6. Prasenjit Mitra, Gio Wiederhold, Martin Kersten. A graph-oriented model for articulation of ontology interdependencies. InProceedings of Conference on Extending Database Technology (EDBT 2000), Konstanz, Germany, Mar., 2000, pp.86–100.

  7. Papakonstantinou Y, Garcia-Molina H, Widom J. Object exchange across heterogeneous information source. InProceedings of the Eleventh International Conference on Data Engineering, Philip S Yu, Arbeee L P Chen (eds.), Taipei: IEEE Computer Society, 1995, pp.251–260.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wang Tengjiao.

Additional information

This work is supported by the NKBRSF under grant No.G1999032705.

WANG Tengjiao was born in 1973. He received the B.E. an M.E. degrees in computer software from Shandong University, respectively in 1996 and 1999. He is currently a Ph.D. candidate at Department of Computer Science & Technology of Peking University. His research interests include Web mining, database and information integration.

TANG Shiwei was born in 1939. He is a professor and Ph.D. supervisor at the Department of Computer Science & Technology of Peking University. His research interests include data warehouse, information integration and DBMS.

YANG Dongqing was born in 1945. She is a professor and Ph.D supervisor at the Department of Computer Science & Technology of Peking University. Her research interests include DBMS, Web mining and information integration.

LIU Yunfeng was born in 1973. Her research interests include information integration, data warehouse and information security.

LIN Bin was born in 1975. His research interests include DBMS, data warehouse and Web mining.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Tang, S., Yang, D. et al. Extracting local schema from semistructured data based on graph-oriented semantic model. J. Comput. Sci. & Technol. 16, 560–566 (2001). https://doi.org/10.1007/BF02943240

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02943240

Keywords

Navigation