Skip to main content
Log in

Deduplication in the Backup System with Information Storage in a Database

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

Prevention of data loss from digital media includes processes such as backup. It can be done manually by copying data to external media or automatically on schedule using special software. There are also remote backup systems, when data are saved over the network to some remote repository. Such systems are multi-user and process large amounts of data. A shared storage can have files containing the same fragments. The elimination of repeated data is based on the mechanism of deduplication. It is a method of information compression, when the search for copies is carried out in the entire dataset rather than within a single file. The main advantage of using this technology is significant saving of disk space. However, the mechanism of eliminating repetitive data can significantly reduce the rate of saving and restoring information. This paper is devoted to the problem of implementing such a mechanism in the backup system with information storage in a relational database. In this work we consider an example of implementation of such a system working in two modes: with and without data deduplication. This paper illustrates a class diagram for the development of the client part of the application as well as the description of tables and their relationships in a database that belongs to the backend. The author proposes an algorithm for saving data with deduplication, and also provides results of comparative tests on the speed of the algorithms for saving and recovering information when working with relational database management systems from various manufacturers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Similar content being viewed by others

REFERENCES

  1. Taranin, S.M., Backup with storage in the database, Model. Anal. Inf. Sist., 2016, vol. 23, no. 4, pp. 479–491.

    Article  MathSciNet  Google Scholar 

  2. Kazakov, V.G. and Fedosin, S.A., Technologies and backup algorithms, in Vserossiiskii konkursnyi otbor obzorno-analiticheskikh statei po prioritetnomu napravleniyu Informatsionno-telekommunikatsionnye sistemy (All-Russian Competitive Selection of Review-Analytical Articles in the Priority Area Information and Telecommunication Systems), 2008, pp. 1–49.

    Google Scholar 

  3. Medeiros, J., NTFS Forensics: A Programmers View of Raw Filesystem Data Extraction, Grayscale Research, 2008, pp. 1–27.

    Google Scholar 

  4. Kazakov, V.G., Fedosin, S.A., and Plotnikova, N.P., Method of adaptive deduplication with multilevel block indexing, Fundam. Issled., 2013, no. 8, pp. 1322–1325.

  5. Sears, R., van Ingen, C., and Gray, J., To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?, Technical Report MSR-TR-2006-45, 2006, pp. 1–11.

  6. Zhu, N. and Chiueh, T., Portable and Efficient Continuous Data Protection for Network File Servers, Stony Brook University, 2007, pp. 1–17.

    Book  Google Scholar 

  7. Meyer, D.T. and Bolosky, W.J., A study of practical deduplication, ACM Trans. Storage, 2012, vol. 7, no. 4, pp. 1–13.

    Article  Google Scholar 

  8. Storer, M.W., Greenan, K., Long, D.D.E., and Miller, E.L., Secure data deduplication, Proceedings of the 4th ACM International Workshop on Storage Security and Survivability, 2008, pp. 1–10.

  9. Renzel, K. and Keller, W., Client/Server Architectures for Business Information Systems: A Pattern Language, 1997, pp. 1–25.

  10. Date, C.J., Introduction to Database Systems, Pearson Education, Inc., 2004, 8th ed.

    MATH  Google Scholar 

  11. Groff, D., Weinberg, P., and Oppel, E., SQL: The Complete Reference, The McGraw-Hill Companies, 2010, 3rd ed.

    Google Scholar 

  12. Date, C.J., SQL and Relational Theory. How to Write Accurate SQL Code, O’Reilly Media Inc., 2009.

    Google Scholar 

  13. Mistry, R. and Misner, S., Introducing Microsoft SQL Server 2008 R2, Microsoft Press, 2010.

    Google Scholar 

  14. Maksimov, V., Kozlenko, L.A., Markin, S.P., and Bojchenko, I.A., Protected Relational DBMS Linter, Otkrytye Sist., SUBD, 1999, nos. 11–12.

  15. Tanenbaum, E. and Bos, H., Modern Operating Systems, Pearson Education, Inc., 2015, 4th ed.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. M. Taranin.

Additional information

Translated by K. Lazarev

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Taranin, S.M. Deduplication in the Backup System with Information Storage in a Database. Aut. Control Comp. Sci. 52, 608–614 (2018). https://doi.org/10.3103/S0146411618070246

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411618070246

Keywords:

Navigation