Skip to main content
Log in

Sync your data: update propagation for heterogeneous protein databases

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The traditional model of bench (wet) chemistry in many life sciences domain is today actively complimented by computer-based discoveries utilizing the growing number of online data sources. A typical computer-based discovery scenario for many life scientists includes the creation of local caches of pertinent information from multiple online resources such as Swissprot [Nucleic Acid Res. 1(28), 45–48 (2000)], PIR [Nucleic Acids Res. 28(1), 41–44 (2000)], PDB [The Protein DataBank. Wiley, New York (2003)], to enable efficient data analysis. This local caching of data, however, exposes their research and eventual results to the problems of data staleness, that is, cached data may quickly be obsolete or incorrect, dependent on the updates that are made to the source data. This represents a significant challenge to the scientific community, forcing scientists to be continuously aware of the frequent changes made to public data sources, and more importantly aware of the potential effects on their own derived data sets during the course of their research. To address this significant challenge, in this paper we present an approach for handling update propagation between heterogeneous databases, guaranteeing data freshness for scientists irrespective of their choice of data source and its underlying data model or interface. We propose a middle-layer–based solution wherein first the change in the online data source is translated to a sequence of changes in the middle-layer; next each change in the middle-layer is propagated through an algebraic representation of the translation between the source and the target; and finally the net-change is translated to a set of changes that are then applied to the local cache. In this paper, we present our algebraic model that represents the mapping of the online resource to the local cache, as well as our adaptive propagation algorithm that can incrementally propagate both schema and data changes from the source to the cache in a data model independent manner. We present a case study based on a joint ongoing project with our collaborators in the Chemistry Department at UMass-Lowell to explicate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. GDB: The Genome Database. http://gdbwww.gdb.org/

  2. UniProt – the universal protein resource. http://www.uniprot.org (2002)

  3. Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence databank and its supplement TrEMBL. Nucleic Acid Res. 1(28), 45–48 (2000)

    Article  Google Scholar 

  4. Etzold, T., Ulyanov, A., Argos, P.: SRS: information retrieval syste, for molecular biology data banks. Meth. Enzymol. 266, 114–128 (1996)

    PubMed  Google Scholar 

  5. García-Molina, H., Hammer, J., Ireland, K. et al.: Integrating and accessing heterogeneous information sources in TSIMMIS. In: AAAI Spring Symposium on Information Gathering (1995)

  6. Baker, P.G., Brass, A., Bechhofer, S., Goble, C., Paton, N., Stevens, R.: TAMBIS: transparent access to multiple bioinformatics information sources: an overview. In: Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology ISMB98 (1998)

  7. Haas, L.M., Kodali, P., Rice, J.E., Schwarz, P.M., Swope, W.C.: Integrating life sciences data—with a litte garlic. In: IEEE International Symposium on Bio-Informatics and Biomedical Engineering (BIBE), pp. 5–12. ACM Press (2000)

  8. European Bioinformatics Institute: http://www.ebi.ac.uk (2002)

  9. Barker, B.C., Garavelli, J.S., Huang, H., McGarvey, P.B., Orcutt, B., Srinivasarao, G.Y., Xiao, C., Yeh, L.S., Ledley, R.S., Janda, J.F., Pfeiffer, F., Mewes, H.W., Tsugita, A., Wu, C.: The protein information resource (pir). Nucleic Acids Res. 28(1), 41–44 (2000)

    Article  PubMed  Google Scholar 

  10. Swiss-Prot Release 42.11 Statistics. http://au.expasy.org/sprot/relnotes/relstat.html (2004)

  11. Bry, F., Kroger, P.: A computational biology database digest: data, data analysis, and data management, Int. J. Dist. Parallel Databases, special issue on bioinformatics 13(42) (2003)

  12. Claypool, K.T., Rundensteiner, E.A.: Sangam: a framework for modeling heterogeneous database transformations. In: Proceedings of International Conference on Enterprise Information Systems (ICEIS), pp. 219–224, Angers, France (2003)

  13. Claypool, K.T., Rundensteiner, E.A.: Sangam: a transformation modeling framework. In: DASFAA. Kyoto, Japan (2003)

  14. Claypool, K.T., Rundensteiner, E.A., Zhang, X., Su, H., Kuno, H., Lee, W.-C., Mitchell, G.: SANGAM: a solution to support multiple data models, their mappings and maintenance. In: Demo Session Proceedings of SIGMOD'01 (2001)

  15. W3C. XQuery: A Query Language for XML. http://www.w3.org/TR/xquery/ (2001)

  16. Gross, J., Yellen, J.: Graph Theory and it Applications. CRC Press, Boca Raton (1998)

    Google Scholar 

  17. Claypool, K.T., Rundensteiner, E.A.: AUP: adaptive change propagation across data model boundaries. In: Proceedings of 21st British National Conference on Databases (BNCOD), pp. 72–83. Edinburgh, Scotland, (2004)

  18. Gupta, A., Blakeley, J.A.: Using partial information to update materialized views. Inform. Syst. 20(8), 641–662 (1995)

    Article  Google Scholar 

  19. Keller, A.: Updates to relational database through views involving joins. In: Scheuermann (1982)

  20. Koeller, A., Rundensteiner, E.A.: Incremental maintenance of schema-restructuring views. In: Proceedings of International Conference on Extending Database Technology (EDBT) (2002)

  21. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  PubMed  Google Scholar 

  22. Cobena, G., Abiteboul, S., Marian, A.: Detecting changes in XML documents. In: Proceedings of ICDE, pp. 41–52. San Jose, California, IEEE (2002)

  23. Chawathe, S.S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change detection in hierarchically structured information. Proceedings of SIGMOD. ACM SIGMOD Rec. 25(2), 493–504 (1996)

    Google Scholar 

  24. Nguyen, B., Abiteboul, S., Cobena, G., Preda, M.: Monitoring XML data on the web. In: Proceedings of SIGMOD, pp. 437–448 (2001)

  25. Wang, Y., DeWitt, D.J., Cai, J.-Y.: X-Diff: An effective change detection algorithm for XML documents. In: Proceedings of ICDE, pp. 519–530. Bangalore, India, IEEE (March 2003)

  26. Xu, H., Wu, Q., Wang, H., Yang, G., Jia, Y.: KF-Diff+: highly efficient change detection algorithm for XML documents. In: Meersman, R., Tari, Z. et al. (eds.) Confederated International Conferences CoopIS, DOA, and ODBASE 2002. Lecture Notes in Computer Science 2519, pp. 1273–1286. Springer-Verlag, Berlin Heidelberg New York (2002)

  27. Atzeni, P., Torlone, R.: Management of multiple models in an extensible database design tool. In: Apers, P.M.G. et al. (eds.) Proceedings of International Conference on Extending Database Technology (EDBT), LNCS. Springer, Berlin Heidelberg New York (1996)

  28. Bernstein, P.A., Rahm, E.: Data warehouse scenarios for model management. In: International Conference on Conceptual Modeling (2000)

  29. Göbel, S., Lutze, K.: Development of meta databases for geospatial data in the WWW. In: ACM-GIS, pp. 94–99 (1998)

  30. Mark, L., Roussopoulos, N.: Integration of data, schema and meta-schema in the context of self-documenting data models. In: Davis, C.G., Jajodia, S., Ng, P.A., Yeh, R.T. (eds.) Proceedings of the 3rd International Conference on Entity-Relationship Approach (ER'83), pp. 585–602. North-Holland (1983)

  31. Papazoglou, M.P., Russell, N.: A semantic meta-modeling approach to schema transformation. In: CIKM '95, pp. 113–121. ACM (1995)

  32. Claypool, K.T.: Managing change in databases. Ph.D. thesis, Worcester Polytechnic Institute (2002)

  33. Agrawal, D., El Abbadi, A., Singh, A., Yurek, T.: Efficient view maintenance at data warehouses. In: Proceedings of SIGMOD, pp. 417–427 (1997)

  34. Blakeley, J.A., Larson, P.-E., Tompa, F.W.: Efficiently updating materialized views. In: Proceedings of SIGMOD, pp. 61–71 (1986)

  35. Gupta, A., Mumick, I.S.: Maintenance of materialized views: problems, techniques, and applications, special issue on materialized views and warehousing. IEEE Data Eng. Bull. 18(2), 3–19 (1995)

    Google Scholar 

  36. Mohania, M.K., Konomi, S., Kambayashi, Y.: Incremental maintenance of materialized views. In: Database and Expert Systems Applications (DEXA), pp. 551–560 (1997)

  37. Zhuge, Y., García-Molina, H., Hammer, J., Widom, J.: View maintenance in a warehousing environment. In: Proceedings of SIGMOD, pp. 316–327 (1995)

  38. Uniprot Documents: http://www.ebi.uniprot.org/support/docs/uniprot.xsd (2004)

  39. PDBj Home Page: http://www.pdbj.org (2004)

  40. Claypool, K.T., Jin, J., Rundensteiner, E.A.: SERF: schema evolution through an extensible, re-usable and flexible framework. In: Proceedings of International Conference on Information and Knowledge Management, pp. 314–321 (1998)

  41. Florescu, D., Kossmann, D.: Storing and querying XML data using an RDBMS. Bulletin of the Technical Committee on Data Engineering, pp. 27–34 (1999)

  42. Haas, L.M., Miller, R.J., Niswonger, B., Roth, M.T., Schwarz, P., Wimmers, E.L.: Transforming heterogeneous data with database middleware: beyond integration. IEEE Data Eng. Bull. 22(1), 31–36 (1999)

    Google Scholar 

  43. Miller, R.J., Ioannidis, Y., Ramakrishnan, R.: The use of information capacity in schema integration and translation. In: Proceedings of the Nineteenth International Conference on Very Large Data Bases (VLDB), pp. 120–133. Dublin, Ireland (1993)

  44. Milo, T., Zohar, S.: Using schema matching to simplify heterogeneous data translation. In: International Conference on Very Large Data Bases, pp. 122–133 (1998)

  45. Rosenthal, A., Reiner, D.: Theoretically sound transformations for practical database design. In: March, S.T. (ed.) Entity-Relationship Approach, Proceedings of the Sixth International Conference on Entity-Relationship Approach, pp. 115–131. New York, USA (1987)

  46. Melnik, S., Rahm, E., Bernstein, P.: Rondo: A programming platform for generic model management. In: Proceedings of SIGMOD, pp. 193–204 (2003)

  47. Hanson, E.N., Noronha, L.: Timer-driven database triggers and alerters: semantics and a challenge. SIGMOD Rec. 28(4), 11–16 (1999)

    Google Scholar 

  48. Chawathe, S.S., Abiteboul, S., Widom, J.: Representing and querying changes in semistructured data. In: Proceedings of the International Conference on Data Engineering, pp. 4–13 (1998)

  49. Chien, S.-Y., Tsotras, V.J., Zaniolo, C.: Efficient management of multiversion documents by object referencing. In: Proceedings of the 27th International Conference on Very Large Data Bases(VLDB '01), Orlando, pp. 291–300. Morgan Kaufman (2001)

  50. Marian, A., Abiteboul, S., Cobéna, G., Mignet, L.: Change-centric management of versions in an XML warehouse. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB), pp. 581–590. Orlando (2001)

  51. Papakonstantinou, Y., García-Molina, H., Widom, J.: Object exchange across heterogeneous information sources. In: Proceedings of IEEE International Conference on Data Engineering, pp. 251–260 (1995)

  52. Wang, F., Zaniolo, C.: Temporal queries in XML document archives and web warehouses. In: Proceedings of 10th International Symposium on Temporal Representation and Reasoning and Fourth International Conference on Temporal Logic, Cairns, pp. 47–55. Queensland, Australia (July 2003) IEEE

  53. Tatarinov, I., Ives, Z.G., Halevy, A.Y., Weld, D.S.: Updating XML. In: Proceedings of SIGMOD, pp. 413–424. New York, NY, USA, ACM Press (2001)

  54. Braganholo, V.P., Davidson, S.B., Heuser, C.A.: On the updatability of XML views over relational databases. In: Proceedings of the International Workshop on the Web and Databases (WebDB), pp. 31–36. San Diego, CA (2003)

  55. Kuno, H.A., Rundensteiner, E.A.: Incremental maintenance of materialized object-oriented views in MultiView: strategies and performance evaluation. IEEE Trans. Knowledge Data Eng. (TKDE) 10(5), 768–792 (1998)

    Article  Google Scholar 

  56. Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally. In: Proceedings of SIGMOD, pp. 157–166 (1993)

  57. Liefke, H., Davidson, S.B.: View maintenance for hierarchical semistructured data. In: Proceedings of 2nd International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Greenwich, UK, Lecture Notes in Computer Science Vol. 1874. pp. 114–125. Springer-Verlag, Berlin Heidelberg New York (2000)

  58. Abiteboul, S., McHugh, J., Rys, M., Vassalos, V., Wiener, J.: Incremental maintenance for materialized views over semistructured data. In: International Conference on VLDB, pp. 38–49 (1998)

  59. Zhuge, Y., Garcia Molina, H.: Graph structured views and their incremental maintenance. In: Proceedings of the 14th International Conference on Data Engineering, pp. 116–125. Orlando, Florida (1998)

  60. Griffin, T., Libkin, L.: Incremental maintenance of views with duplicates. In: SIGMOD, pp. 328–339 (1995)

  61. Koeller, A., Rundensteiner, E.A.: Incremental maintenance of schema-restructuring views. In: Proceedings of International Conference on Extending Database Technology (EDBT), pp. 354–371 (2002)

  62. Koeller, A., Rundensteiner, E.A.: Incremental maintenance of schema-restructuring views in SchemaSQL. IEEE Trans. Knowledge Data Eng. (TKDE) (2004)

  63. Jagadish, H.V., Al-Khalifa, S., Chapman, A., Lakshmanan, L.V.S. et al.: TIMBER: a native XML database. VLDB J.: Very Large Data Bases 11(4), 274–291 (2002)

    Article  Google Scholar 

  64. Jagadish, H.V., Lakshmanan, L.V.S., Srivastava, D., Thompson, K.: TAX: a tree algebra for XML. Lecture Notes Comput. Sci. 2397, 149–164 (2002)

    Google Scholar 

  65. PDB Team. The Protein DataBank. Wiley, New York (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kajal T. Claypool.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Claypool, K.T., Rundensteiner, E.A. Sync your data: update propagation for heterogeneous protein databases. The VLDB Journal 14, 300–317 (2005). https://doi.org/10.1007/s00778-005-0155-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-005-0155-7

Keywords

Navigation