Using Wide Table to manage web data: a survey

Yang, Bin; Qian, Weining; Zhou, Aoying

doi:10.1007/s11704-008-0050-7

Using Wide Table to manage web data: a survey

Review Article
Published: 12 August 2008

Volume 2, pages 211–223, (2008)
Cite this article

Frontiers of Computer Science in China Aims and scope Submit manuscript

Bin Yang¹,
Weining Qian² &
Aoying Zhou²

62 Accesses
9 Citations
Explore all metrics

Abstract

With the development of World Wide Web (www), storage and utilization of web data has become a big challenge for data management research community. Web data are essentially heterogeneous data, and may change schema frequently, traditional relational data model is inappropriate for web data management. A new data model, called Wide Table (or WT for simplicity), was introduced for this task. There are several characteristics of the WT model. First, WT is usually highly sparsely populated so that most data can be fit into a line or record. Second, queries are composed on only a small subset of the attributes. Thus, existing query processing and optimization techniques for relational database with normalized tables will not work efficiently anymore. Furthermore, WT is usually of extremely large volume. It is thought that only large-scale distributed storage can accommodate themassive data set. In this paper, requirements and challenges to web data management are discussed. Existing techniques for WT, including logical presentation, physical storage, and query processing, are introduced and analyzed in detail.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Somani A, Xu Y. Storage and querying of e-commerce data. In: Proceedings of the 27th International Conference on Very Large Data Bases, 2001, 149–158
Agrawal R, Srikant R, Xu Y. Database technologies for electronic commerce. In: Proceedings of the 28th International Conference on Very Large Data Bases, 2002, 28: 1055–1058
Article Google Scholar
Delicious website. http://del.icio.us.
Flickr website. http://www.flickr.com.
Google co-op website. http://www.google.com/coop.
Google base website. http://base.google.com.
Madhavan J, Halevy A, Cohen S, et al. Structured data meets the Web: a few observations. Data Engineering, 2006, 31:19–26
Google Scholar
Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 1998, 30(1–7):107–117
Google Scholar
Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency — Practice and Experience, 2005, 17(2–4):323–356
Article Google Scholar
Copeland G P, Khoshafian S N. A decomposition storage model. ACM SIGMOD Record, 1985, 14(4):268–279
Article Google Scholar
Khoshafian S, Copeland G P, Jagodis T, et al. A query processing strategy for the decomposed storage model. In ICDE, 1987, 636–643
Chang F, Dean J, Ghemawat S, et al. Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI06), 2006, 205–218
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of 6th Symposium on Operating System Design and Implementation, 2004, 137–150
Hbase website. http://wiki.apache.org/lucene-hadoop/Hbase
Hadoop website. http://lucene.apache.org/hadoop
Garcia-Molina H, Ullman J, Widom J. Database Systems: The Complete Book. Prentice-Hall, 2001
Beckmann J L, Halverson A, Krishnamurthy R, et al. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), 2006
Yu B, Li G, Ooi B C, et al. One Table Stores All: Enabling Painless Free-and-Easy Data Publishing and Sharing. 2007
Abadi d j. Column stores for wide and sparse data. In: Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007
Stonebraker M, O’Neil E, O’Neil P, et al. C-store: a column-oriented DBMS. In: Proceedings of the 31st International Conference on Very Large Data Bases, 2005, 553–564
Boncz P, Zukowski M, Nes N. MonetDB/X100: hyper-pipelining query execution. In: Proceedings of the Second Biennial Conference on Innovative Data Systems Research (CIDR), 2005
Hoque A S M L. Storage and querying of high dimensional sparsely populated data in compressed representation. In: Proceedings of the First EurAsian Conference on Information and Communication Technology, 2002, 418–425
Ghemawat S, Gobioff H, Leung S T. The Google file system. ACM SIGOPS Operating Systems Review, 2003, 37(5): 29–43
Article Google Scholar
Burrows M. The Chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th conference on USENIX Symposium on Operating Systems Design and Implementation (OSDI), Volume 7, 2006, 24
Google Scholar
Hadoop distributed file sytetem website. http://hadoop.apache.org/core/docs/current/hdfs design
Stonebraker M. The case for shared nothing. Database Engineering Bulletin, 1986, 9(1):4–9
Google Scholar
Cunningham C, Galindo-Legaria C A, Graefe G. PIVOT and UNPIVOT: optimization and execution strategies in an RDBMS. In: Proceedings of the 30th International Conference on Very Large Data Bases-Volume 30, 2004, 998–1009
Stonebraker M. The case for partial indexes. ACM SIGMOD Record, 1989, 18(4):4–11
Article Google Scholar
Chu E, Beckmann J, Naughton J. The case for a wide-table approach to manage sparse relational data sets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, 2007, 821–832
Agrawal S, Narasayya V, Yang B. Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004, 359–370
Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. Addison Wesley Longman, 1999
Hristidis V, Papakonstantinou Y. Discover: keyword search in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases-Volume 28, 2002, 670–681
Madhavan J, Jeffery S, Cohen S, et al. Web-scale data integration: you can only afford to pay as you go. In: Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007, 342–350
Wordnet website. http://wordnet.princeton.edu
Fellbaum C, et al. WordNet: An Electronic Lexical Database. Cambridge. Mass: MIT Press, 1998
MATH Google Scholar
Brin S, Page L, Motwanl R, et al. The pagerank citation ranking: Bring order to the web. Technical report, Stanford University, 1999
Julien Masanes. Web Archiving. Springer, 2006
Brewer E A. Combining systems and databases: a search engine retrospective. In: Hellerstein J M, Stonebraker M, eds. Readings in Database Systems, 2005, 711–724
Agrawal P, Kifer D, Olston C. Scheduling shared scans of large data files. VLDB 2008 (in press)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Fudan University, Shanghai, 200433, China
Bin Yang
Institute of Massive Computing, East China Normal University, Shanghai, 200062, China
Weining Qian & Aoying Zhou

Authors

Bin Yang
View author publications
Search author on:PubMed Google Scholar
Weining Qian
View author publications
Search author on:PubMed Google Scholar
Aoying Zhou
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Weining Qian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, B., Qian, W. & Zhou, A. Using Wide Table to manage web data: a survey. Front. Comput. Sci. China 2, 211–223 (2008). https://doi.org/10.1007/s11704-008-0050-7

Download citation

Received: 10 May 2008
Accepted: 21 July 2008
Published: 12 August 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s11704-008-0050-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Wide Table to manage web data: a survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Web Data Management

WDFed: Exploiting Cloud Databases Using Metadata and RESTful APIs

Research on Web Table Positioning Technology Based on Table Structure and Heuristic Rules

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Using Wide Table to manage web data: a survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Web Data Management

WDFed: Exploiting Cloud Databases Using Metadata and RESTful APIs

Research on Web Table Positioning Technology Based on Table Structure and Heuristic Rules

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now