skip to main content
research-article

Provenance in Collaborative in Silico Scientific Research: a Survey

Authors Info & Claims
Published:10 December 2020Publication History
Skip Abstract Section

Abstract

Science is a collaborative activity by definition. Research is usually conducted by several scientists working together, and this behavior has been intensified in recent years. Furthermore, experiments are increasingly performed in silico, which demands proper support tools. Provenance-aware Workflow Management Systems and script-based tools have been popular ways of running in silico experiments, but these tools often neglect the collaboration aspect. Even solutions that aim at collaborative experiments do not always address the collaborators- needs. Literature shows surveys discussing subjects related to in silico experiments. However, they either focus on provenance collection and applications, thus treating collaboration as just another possible application, or focus on Workflow Management Systems, only listing collaboration as a possible challenge. This article surveys available tools and approaches that aim at aiding scientists to conduct collaborative in silico experiments. Particularly, we focus on challenges related to the provenance of these collaborative experiments. We devise a taxonomy with the aspects of collaboration in scientific research and discuss each of these aspects. We also identify literature gaps that provide future opportunities.

References

  1. I. Altintas, M. K. Anand, D. Crawl, S. Bowers, A. Belloum, P. Missier, B. Lud¨ascher, C. A. Goble, and P. M. A. Sloot. Understanding collaborative studies through interoperable workflow provenance. In D. L. McGuinness, J. R. Michaelis, and L. Moreau, editors, Provenance and Annotation of Data and Processes, pages 42--58. Springer Berlin Heidelberg, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. I. Altintas, M. K. Anand, T. N. Vuong, S. Bowers, B. Lud¨ascher, and P. M. A. Sloot. A data model for analyzing user collaborations in workflow-driven escience. International Journal of Computers and Their Applications, 18:160--179, 2011.Google ScholarGoogle Scholar
  3. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock. Kepler: an extensible system for design and execution of scientific workflows. In Scientific and Statistical Database Management, pages 423--424, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Altintas, A. W. Lin, J. Chen, C. Churas, M. Gujral, S. Sun, W. Li, R. Manansala, M. Sedova, J. S. Grethe, and M. Ellisman. Camera 2.0: A data-centric metagenomics community infrastructure driven by scientific workflows. In World Congress on Services, pages 352--359, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. K. Anand, S. Bowers, T. McPhillips, and B. Lud¨ascher. Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs. In M. Winslett, editor, Scientific and Statistical Database Management, Lecture Notes in Computer Science, pages 237--254. Springer Berlin Heidelberg, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Belloum, M. A. Inda, D. Vasunin, V. Korkhov, Z. Zhao, H. Rauwerda, T. M. Breit, M. Bubak, and L. O. Hertzberger. Collaborative e-science experiments and scientific workflows. IEEE Internet Computing, 15(4):39--47, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Bubak, T. Gubala, M. Kasztelnik, M. Malawski, P. Nowakowski, and P. Sloot. Collaborative virtual laboratory for e-health. In Expanding the Knowledge Economy: Issues, Applications, Case Studies, eChallenges, pages 537--544, 2007.Google ScholarGoogle Scholar
  8. R. Caldwell and D. Lindberg. Participants in science behave scientifically. Understanding Science., 2018. Available at https://undsci.berkeley.edu/article/0_ 0_0/whatisscience_09.Google ScholarGoogle Scholar
  9. S. Chacon and J. Long. Git. https://git-scm.com/. Accessed: 2018-06-09.Google ScholarGoogle Scholar
  10. T. Classe, R. Braga, F. Campos, and J. M. N. David. A semantic peer to peer network to support e-science. In IEEE International Conference on e-Science, pages 503--512, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Classe, R. Braga, J. M. N. David, F. Campos, M. A. Ara´ujo, and V. Str¨oele. A collaborative approach to support e-science activities. In IEEE International Conference on Computer Supported Cooperative Work in Design, pages 20--25. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Classe, R. Braga, J. M. N. David, F. Campos, and W. Arbex. A distributed infrastructure to support scientific experiments. Journal of Grid Computing, 15(4):475--500, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cocalc user manual documentation. https://doc.cocalc.com/contents.html, 2013. Accessed: 2019--12-05.Google ScholarGoogle Scholar
  14. G. C. B. Costa, R. Braga, J. M. N. David, and F. Campos. A scientific software product line for the bioinformatics domain. Journal of Biomedical Informatics, 56:239--264, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. J. Date. An introduction to database systems. Pearson/Addison Wesley, Boston, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. B. Davidson and J. Freire. Provenance and scientific workflows: Challenges and opportunities. In ACM Special Interest Group on Management of Data, pages 1345--1350. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Davison. Automated capture of experiment context for easier reproducibility in computational research. Computing in Science & Engineering, 14(4):48--56, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. De Oliveira, E. Ogasawara, F. Baiao, and M. Mattoso. Scicumulus: A lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In International Conference on Cloud Computing, pages 378--385, Washington, DC, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. De Roure, C. Goble, and R. Stevens. The design and realisation of the myexperiment virtual research environment for social sharing of workflows. Future Generation Computer Systems, 25(5):561--567, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz. Pegasus: a framework for mapping complex scientific workflows onto 48 SIGMOD Record, June 2020 (Vol. 49, No. 2) distributed systems. Scientific Programming Journal, 13(3):219--237, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. A. Duce and M. S. Sagar. skML a markup language for distributed collaborative visualization. In Theory and Practice of Computer Graphics, pages 171--178, 2005.Google ScholarGoogle Scholar
  22. T. Ellkvist, D. Koop, E. W. Anderson, J. Freire, and C. Silva. Using provenance to support real-time collaborative design of workflows. In International Workshop on Provenance and Annotation (IPAW), pages 266--279. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Elmasri and S. Navathe. Fundamentals of database systems. Addison-Wesley, 6 edition, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Foulser. IRIS Explorer: a framework for investigation. SIGGRAPH Computer Graphics, 29(2):13--16, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Freire, D. Koop, E. Santos, and C. T. Silva. Provenance for computational tasks: A survey. Computing in Science & Engineering, 10(3):11--21, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In L. Moreau and I. Foster, editors, Provenance and Annotation of Data, Lecture Notes in Computer Science, pages 10--18. Springer Berlin Heidelberg, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers. Examining the challenges of scientific workflows. Computer, 40(12):24--32, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. A. Goble, J. Bhagat, S. Aleksejevs, D. Cruickshank, D. Michaelides, D. Newman, M. Borkum, S. Bechhofer, M. Roos, P. Li, and D. De Roure. myexperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Research, 38(Web Server Issue):677--682, 2010.Google ScholarGoogle Scholar
  29. C. A. Goble and D. C. D. Roure. myexperiment: Social networking for workflow-using e-scientists. In Workshop on Workflows in Support of Large-Scale Science, pages 1--2. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. A. Goodman. Snowball sampling. The Annals of Mathematical Statistics, 32(1):148--170, 1961.Google ScholarGoogle ScholarCross RefCross Ref
  31. M. Herschel, R. Diestelk¨amper, and H. B. Lahmar. A survey on provenance: What for? what form? what from? The VLDB Journal, 26(6):881--906, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, and T. Oinn. Taverna: a tool for building and running workflows of services. Nucleic Acids Research, 34(2):729--732, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  33. H. M. R. III, D. H. Honemann, T. J. Balch, D. E. Seabold, and S. Gerber. Robert's rules of order newly revised. PublicAffairs, 11 edition, 2011.Google ScholarGoogle Scholar
  34. Jia Zhang, C. Chang, and Jen-Yao Chung. Mediating electronic meetings. In International Computer Software and Applications Conference, pages 216--221, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. King. An introduction to the dataverse network as an infrastructure for data sharing, 2007.Google ScholarGoogle Scholar
  36. B. Lerner and E. Boose. Rdatatracker: collecting provenance in an interactive scripting environment. In USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2014.Google ScholarGoogle Scholar
  37. S. Lu and J. Zhang. Collaborative scientific workflows. In IEEE International Conference on Web Services, pages 527--534. IEEE, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Lu and J. Zhang. Collaborative scientific workflows supporting collaborative science. International Journal of Business Process Integration and Management, page 185, 2011.Google ScholarGoogle Scholar
  39. M. Mattoso, C. Werner, G. H. Travassos, V. Braganholo, E. Ogasawara, D. Oliveira, S. Cruz, W. Martinho, and L. Murta. Towards supporting the life cycle of large scale scientific experiments. International Journal of Business Process Integration and Management, 5(1):79--92, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  40. Mercurial scm. https://www.mercurial-scm.org/. Accessed: 2019-04--23.Google ScholarGoogle Scholar
  41. H. Miao, A. Chavan, and A. Deshpande. Provdb: Lifecycle management of collaborative analysis workflows. In Workshop on Human-In-the-Loop Data Analytics (HILDA), pages 7:1--7:6, New York, NY, USA, 2017. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. Miller, P. McBurney, J. McGinnis, and K. Stathis. First-class protocols for agent-based coordination of scientific instruments. In IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, pages 41--46, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. Missier, B. Ludascher, S. Bowers, S. Dey, A. Sarkar, B. Shrestha, I. Altintas, M. Anand, and C. Goble. Linking multiple workflow provenance traces for interoperable SIGMOD Record, June 2020 (Vol. 49, No. 2) 49 collaborative science. In Workshop on Workflows in Support of Large-Scale Science, pages 1--8, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  44. L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche. The open provenance model core specification (v1.1). Future Generation Computer Systems, 27(6):743--756, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. Moreau, P. Missier, K. Belhajjame, R. B'Far, J. Cheney, S. Coppens, S. Cresswell, Y. Gil, P. Groth, G. Klyne, T. Lebo, J. McCusker, S. Miles, J. Myers, S. Sahoo, and C. Tilmes. PROV-DM: The PROV data model. W3C Recommendation. W3C Recommendation, 2013. Available at http://www.w3.org/TR/2013/ REC-prov-dm-20130430/.Google ScholarGoogle Scholar
  46. G. Mostaeen, B. Roy, C. K. Roy, and K. A. Schneider. Fine-grained attribute level locking scheme for collaborative scientific workflow development. In IEEE International Conference on Services Computing, pages 273--277, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  47. L. Murta, V. Braganholo, F. Chirigati, D. Koop, and J. Freire. noworkflow: Capturing and analyzing provenance of scripts. In International Workshop on Provenance Annotation (IPAW), pages 1--12, 2014.Google ScholarGoogle Scholar
  48. A. F. Pereira, J. M. N. David, R. Braga, and F. Campos. An architecture to enhance collaboration in scientific software product line. In International Conference on System Sciences, pages 338--347. IEEE, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. F. Pimentel, J. Freire, L. Murta, and V. Braganholo. A survey on collecting, managing, and analyzing provenance from scripts. ACM Computing Surveys, 52(3):47:1--47:38, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Prudencio, L. Murta, C. Werner, and R. Cepeda. To lock, or not to lock: That is the question. Journal of Systems and Software, 85(2):277--289, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. E. D. Ragan, A. Endert, J. Sanyal, and J. Chen. Characterizing provenance in visualization and data analysis: an organizational framework of provenance types and purposes. IEEE Transactions on Visualization and Computer Graphics, 22(1):31--40, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. R. Ramakrishnan and J. Gehrke. Database management systems. McGraw-Hill, New York, third edition edition, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. C. Reddy, P. Dourish, and W. Pratt. Temporality in medical work: Time also matters. Computer Supported Cooperative Work, 15(1):29--53, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. D. H. Sonnenwald. Scientific collaboration. Annual review of information science and technology, 41(1):643--681, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Apache subversion. https://subversion.apache.org/. Accessed: 2019-04--23.Google ScholarGoogle Scholar
  56. Sumatra 0.7.0 documentation. https://pythonhosted.org/Sumatra/ record_stores.html. Accessed: 2019--12-03.Google ScholarGoogle Scholar
  57. S. Sun, J. Chen, W. Li, I. Altintas, A. Lin, S. Peltier, K. Stocks, E. E. Allen, M. Ellisman, J. Grethe, and J. Wooley. Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Research, 39:D546--551, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  58. A. S. Tanenbaum. Modern operating systems. Prentice Hall, Upper Saddle River, N.J, 3 edition edition, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Gt4 globus toolkit web site. http://toolkit.globus.org/toolkit/. Accessed: 2019-04--23.Google ScholarGoogle Scholar
  60. G. H. Travassos and M. O. Barros. Contributions of in virtuo and in silico experiments for the future of empirical studies in software engineering. In Workshop on Empirical Software Engineering the Future of Empirical Studies in Software Engineering, pages 117--130, 2003.Google ScholarGoogle Scholar
  61. S. Vali and S. Sreerama. Multi-user tool for scientific work flow composition. International Journal of Computer Trends & Technology, 4, 2013.Google ScholarGoogle Scholar
  62. J. N. Van Rijn, B. Bischl, L. Torgo, B. Gao, V. Umaashankar, S. Fischer, P. Winter, B. Wiswedel, M. R. Berthold, and J. Vanschoren. Openml: A collaborative science platform. In Joint european conference on machine learning and knowledge discovery in databases, pages 645--649. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. H. Wang, K. W. Brodlie, J. W. Handley, and J. D. Wood. Service-oriented approach to collaborative visualization. Concurrency and Computation: Practice and Experience, 20(11):1289--1301, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M. Hategan, B. Clifford, and I. Raicu. Parallel scripting for applications at the petascale and beyond. 50 SIGMOD Record, June 2020 (Vol. 49, No. 2) Computer, 42(11):50--60, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. J. Wood, H. Wright, and K. Brodlie. Collaborative visualization. In Conference on Visualization, pages 253--259. IEEE Computer Society Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. S. Wuchty, B. F. Jones, and B. Uzzi. The increasing dominance of teams in production of knowledge. Science, 316(5827):1036--1039, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  67. J. Zhang. Co-taverna: A tool supporting collaborative scientific workflows. In IEEE International Conference on Services Computing, pages 41--48, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. J. Zhang, Q. Bao, X. Duan, S. Lu, L. Xue, R. Shi, and P. Tang. Collaborative scientific workflow composition as a service: An infrastructure supporting collaborative data analytics workflow design and management. In IEEE International Conference on Collaboration and Internet Computing, pages 219--228, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  69. J. Zhang, C. K. Chang, and J. Voas. A uniform meta-model for mediating formal electronic conferences. In International Computer Software and Applications Conference, pages 376--381. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. J. Zhang, D. Kuc, and S. Lu. Confucius: A tool supporting collaborative scientific workflow composition. IEEE Transactions on Services Computing, 7(1), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 49, Issue 2
    June 2020
    57 pages
    ISSN:0163-5808
    DOI:10.1145/3442322
    Issue’s Table of Contents

    Copyright © 2020 Copyright is held by the owner/author(s)

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 10 December 2020

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader