Skip to main content
Log in

Recent progress on selected topics in database research — A report by nine young Chinese researchers working in the United States

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The study on database technologies, or more generally, the technologies of data and information management, is an important and active research field. Recently, many exciting results have been reported. In this fast growing field, Chinese researchers play more and more active roles. Research papers from Chinese scholars, both in China and abroad, appear in prestigious academic forums.

In this paper, we, nine young Chinese researchers working in the United States, present concise surveys and report our recent progress on the selected fields that we are working on. Although the paper covers only a small number of topics and the selection of the topics is far from balanced, we hope that such an effort would attract more and more researchers, especially those in China, to enter the frontiers of database research and promote collaborations. For the obvious reason, the authors are listed alphabetically, while the sections are arranged in the order of the author list.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Steve Rozen, Dennis Shasha. A framework for automating physical database design. InVLDB'1991, 1991, pp. 401–411.

  2. Surajit Chaudhuri, Vivek R Narasayya. An efficient cost-driven index selection tool for Microsoft SQL server. InVLDB'1997, 1997, pp.146–155.

  3. Surajit Chaudhuri, Vivek R. Narasayya. Index merging. InICDE'1999, pp.296–303.

  4. Gary Valentin, Michael Zuliani, Daniel C Zilio, Guy M Lohman, Alan Skelley. DB2 advisor: An optimizer smart enough to recommend its own indexes. InICDE'2000, 2000, pp.101–11.

  5. Sanjay Agrawal, Surajit Chaudhuri, Vivek R. Narasayya. Automated selection of materialized views and indexes in SQL databases. InVLDB'2000, 2000, pp.496–505.

  6. Jun Rao, Chun Zhang, Nimrod Megiddo, Guy M Lohman. Automating physical database design in a parallel database. InSIGMOD'2002, 2002, pp.558–569.

  7. Philippe Bonnet, Dennis Elliott Shasha. Database Tuning: Principles, Experiments, and Troubleshooting Techniques. Morgan Kaufman, 2002.

  8. Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das, Vivek Narasayya. Automating layout of relational databases. InICDE'2003, 2003.

  9. Carey M Jet al. Towards heterogeneous multimedia information systems: The Garlic approach. InProc. RIDE-DOM'95, 1995, pp.124–131.

  10. Levy A, Rajaraman A, Ordille J J. Querying heterogeneous information sources using source descriptions. InProc. VLDB, 1996, pp.251–262.

  11. Chawathe S S,et al. The TSIMMIS project: Integration of heterogeneous information sources.IPSJ, 1994, pp.7–18.

  12. Wiederhold G. Mediators in the architecture of future information systems.IEEE Computer, 1992, 25(3): 38–49.

    Google Scholar 

  13. Afrati F, Li C, Ullman J D. Generating efficient plans using views. InSIGMOD, 2001, pp.319–330.

  14. Halevy A. Answering queries using views: A survey.Very Large Database Journal, 2001, pp.270–294.

  15. Rajaraman A, Sagiv Y, Ullman J D. Answering queries using templates with binding patterns. InPODS, 1995, pp.105–112.

  16. Yerneni R, Li C, Garcia-Molina H, Ullman J D. Computing capabilities of mediators. InSIGMOD, 1999, pp.443–454.

  17. The Piazza Project. University of Washington.

  18. The Raccoon Project on Distributed Data Integration and Sharing. University of California, Irvine.

  19. Raman V, Hellerstein J M. Potter's wheel: An interactive data cleaning system.The VLDB Journal, 2001, pp.381–390.

  20. The Flamingo Project on Data Cleansing. University of California. Irvine.

  21. Babcock B, Babu S, Datar S, Motwani R, Widom J. Models and issues in data stream systems. InProc. 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'02), Madison, WI, June 2002, pp.1–16.

  22. Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows (extended abstract). citeseer.nj.nec.com/491746.html.

  23. Dobra A, Garofalakis M, Gehrke J, Rastogi R. Processing complex aggregate queries over data streams. InProc. 2002 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'02), Madison, Wisconsin, June 2002, pp.61–72.

  24. Gehrke J, Korn F, Srivastava D. On computing correlated aggregates over continuous data streams. InProc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp.13–24.

  25. Chen Y, Dong G, Han J, Wah B W, Wang J. Multi-dimensional regression analysis of time-series data streams. InProc. 2002 Int. Conf. Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002, pp.323–334.

  26. Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. InProc. IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, 2000, pp.359–366.

  27. Domingos P, Hulten G. Mining high-speed data streams. InProc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'00), Boston, MA, Aug. 2000, pp.71–80.

  28. Hulten G, Spencer L, Domingos P. Mining time-changing data streams. InProc. 2001 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'01), San Francisco, CA, Aug. 2001, pp.97–106.

  29. Garofalakis M, Gehrke J, Rastogi R. Querying and mining data streams: You only get one look. InTutorial Notes, 2002 Int. Conf. Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002, pp.171–226.

  30. Dong G, Li J. Efficient mining of emerging patterns: Discovering trends and differences. InProc. 1999 Int. Conf. Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, Aug. 1999, pp.43–52.

  31. Ganti V, Gehrke J, Ramakrishnan R. A framework for measuring changes in data characteristics. InProceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31–June 2, 1999, Philadelphia, Pennsylvania, ACM Press, 1999, pp.126–137.

    Google Scholar 

  32. Dong G, Han J, Lakshmanan L V Set al. Online mining of changes from data streams: Research problems and preliminary results. InProc. the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams, San Diego, CA, June 2003.

  33. Pei J, Ariwala S R, Jiang D. Online mining changes of clusters in data streams.Submitted for publication.

  34. Forlizzi L, Guting R, Nardelli R, Schneider M. A data model and data structures for moving objects databases. InSIGMOD, 2000.

  35. Gutting R, Bohlen M, Erwig Met al. A foundation for representing and querying moving objects.TODS, 2000, 25(1): 1–42.

    Article  Google Scholar 

  36. Sistla A, Wolfson O, Chamberlain S, Dao S. Modeling and querying moving objects.ICDE, 1997.

  37. Choi Y, Chung C. Selectivity estimation for spatiotemporal queries to moving objects. InSIGMOD, 2002.

  38. Hadjieleftheriou M, Kollios G, Tsotras V. Performance evaluation of spatio-temporal selectivity techniques.SSDBM, 2003.

  39. Tao Y, Sun J, Papadias D. Selectivity estimation for predictive spatio-temporal queries.ICDE, 2003.

  40. Pfoser D, Jensen C, Theodoridis Y. Novel approches to the indexing of moving object trajectories.VLDB, 2000.

  41. Tao Y, Papadias D. Time-parameterized queries in spatio-temporal databases.SIGMOD, 2002.

  42. Benetis R, Jensen C, Karciauskas G, Saltenis S. Nearest neighbor and reverse nearest neighbor queries for moving objects. InIDEAS, 2002.

  43. Song Z, Roussopoulos N. K-nearest neighbor search for moving query point.SSTD, 2001.

  44. Tao Y, Papadias D. Spatial queries in dynamic environments.To appear in TODS, 2003.

  45. Tao Y, Papadias D, Shen Q. Continuous nearest neighbor search. InVLDB, 2002.

  46. Zhang J, Zhu M, Papadias D, Tao Y, Lee D. Location-based spatial queries.SIGMOD, 2003.

  47. Beckmann N, Kriegel H, Schneider R, Seeger B. Ther *-tree: An efficient and robust access method for points and rectangles. InSIGMOD, 1990.

  48. Guttman A. R-trees: A dynamic index structure for spatial searching. InSIGMOD, 1984.

  49. Becker B, Gschwind S, Ghler T, Seeger B, Widmayer P. An asymptotically optimal multiversion B-trees.VLDB Journal, 1996, 5(4): 264–275.

    Article  Google Scholar 

  50. Salzberg B, Tsotras V. A comparison of access methods for temporal data.ACM Computing Survey, 1999, 31(2): 158–221.

    Article  Google Scholar 

  51. Nascimento M, Silva J. Towards historical R-trees. InACM Symposium on Applied Computing, 1998.

  52. Tao Y, Papadias D, Zhang J. Cost models for overlapping and multi-version structures.TODS, 2002, 27(3): 299–342.

    Article  Google Scholar 

  53. Kumar A, Tsotras V, Faloutsos C. Designing access methods for bitemporal databases.TKDE, 1998, 10(1): 1–20.

    Google Scholar 

  54. Hadjieleftheriou M, Kollios G, Tsotras V, Gunopulos D. Efficient indexing of spatiotemporal objects. InEDBT, 2002.

  55. Kollios G, Gunopulos D, Tsotras V, Delis A, Hadjieleftheriou M. Indexing animated objects using spatiotemporal access methods.TKDE, 2001.

  56. Tao Y, Papadias D. The mv3r-tree: A spatio-temporal access method for timestamp and interval queries. InVLDB, 2001.

  57. Vazirgiannis M, Theodoridis Y, Sellis T. Spatio-temporal composition and indexing for large multimedia applications.Multimedia Systems, 1998, 6(4): 284–298.

    Article  Google Scholar 

  58. Tayeb J, Ulusoy O, Wolfson O. A quadtree-based dynamic attribute indexing method.The Computer Journal, 1998, 41(3): 185–200.

    Article  MATH  Google Scholar 

  59. Samet H. The Design and Analysis of Spatial Data Structures. Addison-Wesley Publishing Company, 1990.

  60. Saltenis S, Jensen C, Leutenegger S, Lopez M. Indexing the positions of continuously moving objects. InSIGMOD, 2000.

  61. Saltenis S, Jensen C. Indexing of moving objects for location-based services.ICDE, 2002.

  62. Tao Y, Papadias D, Sun J. The tpr*-tree: An optimized spatio-temporal access method for predictive queries. InVLDB, 2003.

  63. Agarwal P, Arge L, Erickson J. Indexing moving points.PODS, 2000.

  64. Kollios G, Gunopulos D, Tsotras V. On indexing mobile objects.PODS, 1999.

  65. Procopiuc C, Agarwal P, Har-Peled S. Star-tree: An efficient self-adjusting index for moving points.ALENEX, 2000.

  66. Tao Y, Mamoulis N, Papadias D. Validity information retrieval for spatio-temporal queries.SSTD, 2003.

  67. Chung C, Min J, Shim K. APEX: An adaptive path index for XML data. InACM SIGMOD June 2002.

  68. Cooper B F, Sample N, Franklin M, Hjaltason G, Shadmon M. A fast index for semistructured data. InVLDB, September 2001, pp.341–350.

  69. Goldman R, Widom J. Data Guides: Enable query formulation and optimization in semistructured databases. InVLDB August 1997, pp.436–445.

  70. Kaushik R, Bohannon P, Naughton J, Korth H. Covering indexes for branching path queries. InACM SIGMOD, June 2002.

  71. Li Q, Moon B. Indexing and querying XML data for regular path expressions. InVLDB, September 2001, pp.361–370.

  72. Milo T, Suciu D. Index structures for path expression. InProc. 7th International Conference on Database Theory (ICDT), January 1999, pp.277–295.

  73. Haixun Wang, Sanghyun Park, Wei Fan, Philip S Yu. VIST: A dynamic index method for querying XML data by tree structures. InSIGMOD, 2003.

  74. Gusfield D. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.

  75. Yang J, Wang W. CLUSEQ: Efficient and efficient sequence clustering. InProc. 19th IEEE Int. Conf. Data Engineering (ICDE), 2003, pp.101–112.

  76. Han J, Dong G, Yin Y. Efficient mining partial periodic patterns in time series database. InProc. Int. Conf. Data Engineering, 1999, pp.106–115.

  77. Yang J, Wang W, Yu P. Mining asynchronous periodic patterns in time series data. InProc. the 6th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2000, pp.275–279.

  78. Yang J, Wang W, Yu P. Info-miner: Mining surprising periodic patterns. InProc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp.395–400.

  79. Ashish Gupta, Inderpal Singh Mumick (eds.). Materialized Views: Techniques, Implementations and Applications. MIT Press, June 1999.

  80. Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology.ACM SIGMOD Record, 1997, 26(1): 65–74.

    Article  Google Scholar 

  81. Gibbons P B, Matias Y. Synopsis data structures for massive data sets.DIMACS Series in Discrete Mathematics and Theoretical Computer Science: Special Issue on External Memory Algorithms and Visualization, 1999, A: 39–70.

  82. Dar S, Franklin M J, Jónsson B, Srivastava D, Tan M. Semantic data caching and replacement. InProc. the 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp.330–341.

  83. Candan K S, Li W S, Luo Q, Hsiung W-P, Agrawal D. Enabling dynamic content caching for database-driven web sites. InProc. the 2001 ACM SIGMOD Int. Conf. Management of Data, Santa Barbara, California, USA, May 2001.

  84. Amiri K, Park S, Tewari R, Padmanabhan S. DBProxy: A dynamic data cache for web applications. InProc. the 2003 Int. Conf. Data Engineering, Bangalore, India, March 2003, pp.821–831.

  85. Luo Q, Naughton J F, Krishnamurthy R, Cao P, Li Y. Active query caching for database web servers. InProc. the 2000 Int. Workshop on the Web and Databases, Dallas, Texas, USA, May 2000, pp.92–104.

  86. Akinde M O, Jensen O G, Böhlen M H. Minimizing detail data in warehouses. InProc. the 1998 Int. Conf. Extending Database Technology, 1998, pp.293–307.

  87. Quass D, Gupta G, Mumick I S, Widom J. Making views self-maintainable for data warehousing. InProc. the 1996 Int. Conf. Parallel and Distributed Information Systems, December 1996, pp.158–169.

  88. Yang J, Widom J. Temporal view self-maintenance in a warehousing environment. InProc. the 2000 Int. Conf. Extending Database Technology, Konstanz, Germany, March 2000, pp.395–412.

  89. Yi K, Yu H, Yang J, Xia G, Chen Y. Efficient maintenance of top-k views. InProc. the 2003 Int. Conf. Data Engineering, Bangalore, India, March 2003, pp.189–200.

  90. Yang J, Widom J. Incremental computation and maintenance of temporal aggregates. InProc. the 2001 Int. Conf. Data Engineering, Heidelberg, Germany, April 2001.

  91. Olston C, Jiang J, Widom J. Adaptive filters for continuous queries over distributed data streams. InProc. the 2003 ACM SIGMOD Int. Conf. Management of Data, San Diego, California, USA, June 2003.

  92. Olston C, Widom J. Best-effort cache synchronization with source cooperation. InProc. the 2002 ACM SIGMOD Int. Conf. on Management of Data, Madison, Wisconsin, USA, June 2002.

  93. Yang J, Widom J. Incremental computation and maintenance of temporal aggregates. InProc. Int. Conf. Data Engineering (ICDE), 2001.

  94. Zhang D, Gunopulos D, Tsotras V J, Seeger B. Temporal aggregation over data streams using multiple granularities. InProc. Int. Conf. Extending Database Technology (EDBT), 2002.

  95. Zhang D, Markowetz A, Tsotras V J, Gunopulos D, Seeger B. Efficient computation of temporal aggregates with range predicates. InACM Int. Symp. Principles of Database Systems (PODS), 2001.

  96. Lazaridis I, Mehrotra S. Progressive approximate aggregate queries with a multi-resolution tree structure. InProceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), 2001.

  97. Papadias D, Kalnis P, Zhang J, Tao Y. Efficient olap operations in spatial data warehouses. InProc. Symp. Spatial and Temporal Databases (SSTD) 2001.

  98. Zhang D, Tsotras V J, Gunopulos D. Efficient aggregation over objects with extent. InACM Int. Symp. Principles of Database Systems (PODS), 2002.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhiyuan Chen or Jian Pei.

Additional information

Zhiyuan Chen is currently a post-doctoral researcher at Microsoft Research, Redmond, WA. He received his B.S. (1995) and M.S. (1997) degrees from Fudan University, China. From 1997 to 2002, he was a graduate student at Cornell University under the guidance of Prof. Johanues Gehrke and Praveen Seshadri. He received his Ph.D. degree in August, 2002. His research interests include automatic database tuning and administration, database compression, selectivity estimation, and XML indexing.

Chen Li is an assistant professor in the School of Information and Computer Science at the University of California, Irvine. He received his Ph.D. degree in computer science from Stanford University in 2001, and his B.S. degree in computer science from Tsinghua University, China. His research interests are in the fields of database and information systems, including data integration, data warehouses, data cleansing, multimedia databases, and XML.

Jian Pei received the B.Eng and the M.Eng degrees, both in computer science, from Shanghai Jiaotong University, China, in 1991 and 1993, respectively, and the Ph.D. degree in Computing Science from Simon Fraser University, Canada, in 2002. He was a Ph.D. candidate in Peking University in 1997–1999. He is currently an Assistant Professor of Computer Science and Engineering, the State University of New York at Buffalo, USA. His research interests include data mining, data warehousing, online analytical processing, database systems, and bio-informatics. He is a member of the editorial board of the ACM SIGMOD Digital Symposium Collection (DiSC) and a guest editor of Journal of Computer Science and Technology. He has served in the program committees of international conferences and workshops, and has been a reviewer for some leading academic journals in his fields. He is a member of the ACM, the ACM SIGMOD, the ACM SIGKDD and the IEEE Computer Society, and a professional member of the ASEE. He is the Chair of the East Coast Regional Chapter of ACM SIGKDD.

Yufei Tao became a research associate in Department of Computer Science, Hong Kong University of Science and Technology, after obtaining his Ph.D. degree from the same dapartment in July 2002. Currently he is a visiting scientist at the CS Department of Carnegie Mellon University, USA. His research mainly focuses on the development of efficient query algorithms in spatio-temporal databases, as well as the application of related techniques to other areas such as temporal, spatial databases, and data warehouses. He has been awarded the Hong Kong Young Scientist Award 2002 (in physical and mathematical science) by the Hong Kong Institution of Science (HKIS) for his work on spatio-temporal data.

Haixun Wang received his Ph.D. degree in computer science from UCLA in 2000. He also holds the B.S. degree in computer science and the M.S. degree in computer science from Shanghai Jiao Tong University. In 2000, he joined IBM T. J. Watson Research Center as a research staff member. His research interests include data mining, machine learning, database language and systems, bioinformatics, and XML.

Wei Wang received the M.S. degree from the State University of New York at Binghamton in 1995 and the Ph.D. degree in computer science from the University of California at Los Angeles in 1999. She is currently an assistant professor in the Department of Computer Science at the University of North Carolina at Chapel Hill. Her research interests include data mining, database systems and bioinformatics. She is the member of the editorial board of the Journal of Data Management and has published more than 50 research papers in international journals and referred conference proceedings.

Jiong Yang is currently a visiting assistant professor at University of Illinois at Urbana-Champaign. His Current research interests include data mining, data integration, bio-informatics, mobile computing, and sensor networks. Dr. Yang received the B.S. degree from the University of California at Berkeley at 1994, the M.S. and Ph.D. degrees in computer science from the University of California, Los Angeles in 1996 and 1999, respectively. He is the author of more than forty research papers.

Jun Yang is an assistant professor of computer science at Duke University. Jun received his Ph.D. degree in computer science from Stanford University in the area of data warehousing. He is broadly interested in database and information management, with special emphasis on the management of derived data. He received his B.A. in computer science from University of California at Berkeley.

Donghui Zhang's primary research area is database systems. He has been doing researches on temporal, spatial and spatio-temporal database indexing, aggregation queries and join processing; efficiently storing and querying XML documents which evolve over time. His current research interests include data streams, indexing and querying moving objects, image database processing, biological data processing. Prof. Zhang earned his Ph.D. degree from University of California, Riverside in August 2002. His dissertation title is “Aggregation Computation over Complex Objects”, in which he addressed the problem of computing aggregates over a large set of temporal and spatial objects. He proposed algorithms to compute such queries in logarithmic time, while previous approaches have linear performance.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhiyuan, C., Chen, L., Pei, J. et al. Recent progress on selected topics in database research — A report by nine young Chinese researchers working in the United States. J. Comput. Sci. & Technol. 18, 538–552 (2003). https://doi.org/10.1007/BF02947114

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02947114

Keywords

Navigation