Skip to main content
Log in

Exploiting sequential access when declustering data over disks and MEMS-based storage

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Due to the large difference between seek time and transfer time in current disk technology, it is advantageous to perform large I/O using a single sequential access rather than multiple small random I/O accesses. However, prior optimal cost and data placement approaches for processing range queries over two-dimensional datasets do not consider this property. In particular, these techniques do not consider the issue of sequential data placement when multiple I/O blocks need to be retrieved from a single device. In this paper, we reevaluate the optimal cost of range queries by declustering two-dimensional datasets over multiple devices, and prove that, in general, it is impossible to achieve the new optimal cost. This is because disks cannot facilitate two-dimensional sequential access which is required by the new optimal cost. Then we revisit the existing data allocation schemes under the new optimal cost, and show that none of them can achieve the new optimal cost. Fortunately, MEMS-based storage is being developed to reduce I/O cost. We first show that the two-dimensional sequential access requirement can not be satisfied by simply modeling MEMS-based storage as conventional disks. Then we propose a new placement scheme that exploits the physical properties of MEMS-based storage to solve this problem. Our theoretical analysis and experimental results show that the new scheme achieves almost optimal I/O costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. CMU CHIP project, 2002, http://www.lcs.ece.cmu.edu/research/MEMS.

  2. Hewlett-packard laboratories atomic resolution storage, 2003, http://www.hpl.hp.com/research/storage.html.

  3. K.A.S. Abdel-Ghaffar and A. El Abbadi, “Optimal allocation of two-dimensional data,” in International Conference on Database Theory, 1997, pp. 408–418.

  4. M.J. Atallah and S. Prabhakar, “(Almost) optimal parallel block access for range queries,” in Nineteenth ACM Symposium on Principles of Database Systems, PODS, 2000, pp. 205–215.

  5. R. Bhatia, R.K. Sinha, and C.M. Chen, “Declustring using golden ratio sequences,” in Proc. of International Conference on Data Engineering, 2000, pp. 271–280.

  6. C. Chen and C.T. Cheng, “From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries,” Journal of the ACM, vol. 51, no. 1, 2004.

  7. H.C. Du and J.S. Sobolewski, “Disk allocation for cartesian product files on multiple-disk systems,” ACM Transactions of Database Systems, vol. 7, no. 1, pp. 82–101, 1982.

    Article  MATH  Google Scholar 

  8. C. Faloutsos and P. Bhagwat, “Declustring using fractals,” in Proc. of the 2nd Int. Conf. on Parallel and Distributed Information Systems, 1993, pp. 18–25.

  9. C. Faloutsos and Y. Rong, “Spatial access methods using fractals: Algorithms and performance evaluation,” in Tech. Report. UMIACS-TR-89-31, CR-TR-2214, Department of Computer Science, University of Maryland, 1989.

  10. K. Frikken, M.J. Atallah, S. Prabhakar, and R. Safavi-Naini, “Optimal parallel I/O for range queries through replication,” in Proceedings of the 13th International Conference on Database and Expert Systems Applications, 2002, pp. 669–678.

  11. J. Griffin, S. Schlosser, G. Ganger, and D. Nagle, “Modeling and performance of MEMSBased storage devices,” in Proceedings of ACM SIGMETRICS, 2000, pp. 56–65.

  12. J. Griffin, S. Schlosser, G. Ganger, and D. Nagle, “Operating systems management of MEMS based storage devices,” in Symposium on Operating Systems Design and Implementation (OSDI), 2000.

  13. H.V. Jagadish, “Linear clustering of objects with multiple attributes,” in Proc. Int. Conf. on Management of Data (SIGMOD), 1990, pp. 332–342.

  14. M.H. Kim and S. Pramanik, “Optimal file distribution for partial match retrieval,” in Proc. Int. Conf. on Management of Data (SIGMOD), 1988, pp. 173–182.

  15. S. Prabhakar, K.A.S. Abdel-Ghaffar, D. Agrawal, and A. El Abbadi, “Cyclic allocation of two-dimensional data,” in International Conference on Data Engineering, 1998, pp. 94–101.

  16. B. Seeger, “An analysis of schedules for performing multi-page requests,” Information Systems, vol. 21, no. 5, pp. 387–407, 1996.

    Article  MathSciNet  Google Scholar 

  17. P. Vettider, M. Despont, U. Durig, W. Haberle, M.I. Lutwyche, H.E. Rothuizen, R. Stuz, R. Widmer, and G.K. Binnig, “The “millipede”-more than one thousand tips for future afm storage,” IBM Journal of Research and Development, vol. 44, no. 3, pp. 323–340, 2000.

    Article  Google Scholar 

  18. H. Yu, D. Agrawal, and A. El Abbadi, “Tabular placement of relational data on MEMS based storage devices,” in 29th International Conference on Very Large Data Bases, 2003, pp. 680–693.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hailing Yu.

Additional information

Recommended by: Sunil Prabhakar

This research is supported by the NSF grants under IIS-0220152 and CNF-0423336.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, H., Agrawal, D. & Abbadi, A.E. Exploiting sequential access when declustering data over disks and MEMS-based storage. Distrib Parallel Databases 19, 147–168 (2006). https://doi.org/10.1007/s10619-006-8485-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-006-8485-z

Keywords

Navigation