Abstract
Science is a collaborative activity by definition. Research is usually conducted by several scientists working together, and this behavior has been intensified in recent years. Furthermore, experiments are increasingly performed in silico, which demands proper support tools. Provenance-aware Workflow Management Systems and script-based tools have been popular ways of running in silico experiments, but these tools often neglect the collaboration aspect. Even solutions that aim at collaborative experiments do not always address the collaborators- needs. Literature shows surveys discussing subjects related to in silico experiments. However, they either focus on provenance collection and applications, thus treating collaboration as just another possible application, or focus on Workflow Management Systems, only listing collaboration as a possible challenge. This article surveys available tools and approaches that aim at aiding scientists to conduct collaborative in silico experiments. Particularly, we focus on challenges related to the provenance of these collaborative experiments. We devise a taxonomy with the aspects of collaboration in scientific research and discuss each of these aspects. We also identify literature gaps that provide future opportunities.
- I. Altintas, M. K. Anand, D. Crawl, S. Bowers, A. Belloum, P. Missier, B. Lud¨ascher, C. A. Goble, and P. M. A. Sloot. Understanding collaborative studies through interoperable workflow provenance. In D. L. McGuinness, J. R. Michaelis, and L. Moreau, editors, Provenance and Annotation of Data and Processes, pages 42--58. Springer Berlin Heidelberg, 2010.Google ScholarCross Ref
- I. Altintas, M. K. Anand, T. N. Vuong, S. Bowers, B. Lud¨ascher, and P. M. A. Sloot. A data model for analyzing user collaborations in workflow-driven escience. International Journal of Computers and Their Applications, 18:160--179, 2011.Google Scholar
- I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock. Kepler: an extensible system for design and execution of scientific workflows. In Scientific and Statistical Database Management, pages 423--424, 2004. Google ScholarDigital Library
- I. Altintas, A. W. Lin, J. Chen, C. Churas, M. Gujral, S. Sun, W. Li, R. Manansala, M. Sedova, J. S. Grethe, and M. Ellisman. Camera 2.0: A data-centric metagenomics community infrastructure driven by scientific workflows. In World Congress on Services, pages 352--359, 2010. Google ScholarDigital Library
- M. K. Anand, S. Bowers, T. McPhillips, and B. Lud¨ascher. Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs. In M. Winslett, editor, Scientific and Statistical Database Management, Lecture Notes in Computer Science, pages 237--254. Springer Berlin Heidelberg, 2009. Google ScholarDigital Library
- A. Belloum, M. A. Inda, D. Vasunin, V. Korkhov, Z. Zhao, H. Rauwerda, T. M. Breit, M. Bubak, and L. O. Hertzberger. Collaborative e-science experiments and scientific workflows. IEEE Internet Computing, 15(4):39--47, July 2011. Google ScholarDigital Library
- M. Bubak, T. Gubala, M. Kasztelnik, M. Malawski, P. Nowakowski, and P. Sloot. Collaborative virtual laboratory for e-health. In Expanding the Knowledge Economy: Issues, Applications, Case Studies, eChallenges, pages 537--544, 2007.Google Scholar
- R. Caldwell and D. Lindberg. Participants in science behave scientifically. Understanding Science., 2018. Available at https://undsci.berkeley.edu/article/0_ 0_0/whatisscience_09.Google Scholar
- S. Chacon and J. Long. Git. https://git-scm.com/. Accessed: 2018-06-09.Google Scholar
- T. Classe, R. Braga, F. Campos, and J. M. N. David. A semantic peer to peer network to support e-science. In IEEE International Conference on e-Science, pages 503--512, 2015. Google ScholarDigital Library
- T. Classe, R. Braga, J. M. N. David, F. Campos, M. A. Ara´ujo, and V. Str¨oele. A collaborative approach to support e-science activities. In IEEE International Conference on Computer Supported Cooperative Work in Design, pages 20--25. IEEE, 2016.Google ScholarCross Ref
- T. Classe, R. Braga, J. M. N. David, F. Campos, and W. Arbex. A distributed infrastructure to support scientific experiments. Journal of Grid Computing, 15(4):475--500, 2017. Google ScholarDigital Library
- Cocalc user manual documentation. https://doc.cocalc.com/contents.html, 2013. Accessed: 2019--12-05.Google Scholar
- G. C. B. Costa, R. Braga, J. M. N. David, and F. Campos. A scientific software product line for the bioinformatics domain. Journal of Biomedical Informatics, 56:239--264, 2015. Google ScholarDigital Library
- C. J. Date. An introduction to database systems. Pearson/Addison Wesley, Boston, 2004. Google ScholarDigital Library
- S. B. Davidson and J. Freire. Provenance and scientific workflows: Challenges and opportunities. In ACM Special Interest Group on Management of Data, pages 1345--1350. ACM, 2008. Google ScholarDigital Library
- A. Davison. Automated capture of experiment context for easier reproducibility in computational research. Computing in Science & Engineering, 14(4):48--56, 2012. Google ScholarDigital Library
- D. De Oliveira, E. Ogasawara, F. Baiao, and M. Mattoso. Scicumulus: A lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In International Conference on Cloud Computing, pages 378--385, Washington, DC, USA, 2010. Google ScholarDigital Library
- D. De Roure, C. Goble, and R. Stevens. The design and realisation of the myexperiment virtual research environment for social sharing of workflows. Future Generation Computer Systems, 25(5):561--567, 2009. Google ScholarDigital Library
- E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz. Pegasus: a framework for mapping complex scientific workflows onto 48 SIGMOD Record, June 2020 (Vol. 49, No. 2) distributed systems. Scientific Programming Journal, 13(3):219--237, 2005. Google ScholarDigital Library
- D. A. Duce and M. S. Sagar. skML a markup language for distributed collaborative visualization. In Theory and Practice of Computer Graphics, pages 171--178, 2005.Google Scholar
- T. Ellkvist, D. Koop, E. W. Anderson, J. Freire, and C. Silva. Using provenance to support real-time collaborative design of workflows. In International Workshop on Provenance and Annotation (IPAW), pages 266--279. Springer, 2008. Google ScholarDigital Library
- R. Elmasri and S. Navathe. Fundamentals of database systems. Addison-Wesley, 6 edition, Apr. 2010. Google ScholarDigital Library
- D. Foulser. IRIS Explorer: a framework for investigation. SIGGRAPH Computer Graphics, 29(2):13--16, 1995. Google ScholarDigital Library
- J. Freire, D. Koop, E. Santos, and C. T. Silva. Provenance for computational tasks: A survey. Computing in Science & Engineering, 10(3):11--21, 2008. Google ScholarDigital Library
- J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In L. Moreau and I. Foster, editors, Provenance and Annotation of Data, Lecture Notes in Computer Science, pages 10--18. Springer Berlin Heidelberg, 2006. Google ScholarDigital Library
- Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers. Examining the challenges of scientific workflows. Computer, 40(12):24--32, 2007. Google ScholarDigital Library
- C. A. Goble, J. Bhagat, S. Aleksejevs, D. Cruickshank, D. Michaelides, D. Newman, M. Borkum, S. Bechhofer, M. Roos, P. Li, and D. De Roure. myexperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Research, 38(Web Server Issue):677--682, 2010.Google Scholar
- C. A. Goble and D. C. D. Roure. myexperiment: Social networking for workflow-using e-scientists. In Workshop on Workflows in Support of Large-Scale Science, pages 1--2. ACM, 2007. Google ScholarDigital Library
- L. A. Goodman. Snowball sampling. The Annals of Mathematical Statistics, 32(1):148--170, 1961.Google ScholarCross Ref
- M. Herschel, R. Diestelk¨amper, and H. B. Lahmar. A survey on provenance: What for? what form? what from? The VLDB Journal, 26(6):881--906, 2017. Google ScholarDigital Library
- D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, and T. Oinn. Taverna: a tool for building and running workflows of services. Nucleic Acids Research, 34(2):729--732, 2006.Google ScholarCross Ref
- H. M. R. III, D. H. Honemann, T. J. Balch, D. E. Seabold, and S. Gerber. Robert's rules of order newly revised. PublicAffairs, 11 edition, 2011.Google Scholar
- Jia Zhang, C. Chang, and Jen-Yao Chung. Mediating electronic meetings. In International Computer Software and Applications Conference, pages 216--221, 2003. Google ScholarDigital Library
- G. King. An introduction to the dataverse network as an infrastructure for data sharing, 2007.Google Scholar
- B. Lerner and E. Boose. Rdatatracker: collecting provenance in an interactive scripting environment. In USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2014.Google Scholar
- S. Lu and J. Zhang. Collaborative scientific workflows. In IEEE International Conference on Web Services, pages 527--534. IEEE, 2009.Google ScholarDigital Library
- S. Lu and J. Zhang. Collaborative scientific workflows supporting collaborative science. International Journal of Business Process Integration and Management, page 185, 2011.Google Scholar
- M. Mattoso, C. Werner, G. H. Travassos, V. Braganholo, E. Ogasawara, D. Oliveira, S. Cruz, W. Martinho, and L. Murta. Towards supporting the life cycle of large scale scientific experiments. International Journal of Business Process Integration and Management, 5(1):79--92, 2010.Google ScholarCross Ref
- Mercurial scm. https://www.mercurial-scm.org/. Accessed: 2019-04--23.Google Scholar
- H. Miao, A. Chavan, and A. Deshpande. Provdb: Lifecycle management of collaborative analysis workflows. In Workshop on Human-In-the-Loop Data Analytics (HILDA), pages 7:1--7:6, New York, NY, USA, 2017. ACM. Google ScholarDigital Library
- T. Miller, P. McBurney, J. McGinnis, and K. Stathis. First-class protocols for agent-based coordination of scientific instruments. In IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, pages 41--46, 2007. Google ScholarDigital Library
- P. Missier, B. Ludascher, S. Bowers, S. Dey, A. Sarkar, B. Shrestha, I. Altintas, M. Anand, and C. Goble. Linking multiple workflow provenance traces for interoperable SIGMOD Record, June 2020 (Vol. 49, No. 2) 49 collaborative science. In Workshop on Workflows in Support of Large-Scale Science, pages 1--8, 2010.Google ScholarCross Ref
- L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche. The open provenance model core specification (v1.1). Future Generation Computer Systems, 27(6):743--756, 2011. Google ScholarDigital Library
- L. Moreau, P. Missier, K. Belhajjame, R. B'Far, J. Cheney, S. Coppens, S. Cresswell, Y. Gil, P. Groth, G. Klyne, T. Lebo, J. McCusker, S. Miles, J. Myers, S. Sahoo, and C. Tilmes. PROV-DM: The PROV data model. W3C Recommendation. W3C Recommendation, 2013. Available at http://www.w3.org/TR/2013/ REC-prov-dm-20130430/.Google Scholar
- G. Mostaeen, B. Roy, C. K. Roy, and K. A. Schneider. Fine-grained attribute level locking scheme for collaborative scientific workflow development. In IEEE International Conference on Services Computing, pages 273--277, 2018.Google ScholarCross Ref
- L. Murta, V. Braganholo, F. Chirigati, D. Koop, and J. Freire. noworkflow: Capturing and analyzing provenance of scripts. In International Workshop on Provenance Annotation (IPAW), pages 1--12, 2014.Google Scholar
- A. F. Pereira, J. M. N. David, R. Braga, and F. Campos. An architecture to enhance collaboration in scientific software product line. In International Conference on System Sciences, pages 338--347. IEEE, 2016. Google ScholarDigital Library
- J. F. Pimentel, J. Freire, L. Murta, and V. Braganholo. A survey on collecting, managing, and analyzing provenance from scripts. ACM Computing Surveys, 52(3):47:1--47:38, 2019. Google ScholarDigital Library
- J. Prudencio, L. Murta, C. Werner, and R. Cepeda. To lock, or not to lock: That is the question. Journal of Systems and Software, 85(2):277--289, 2012. Google ScholarDigital Library
- E. D. Ragan, A. Endert, J. Sanyal, and J. Chen. Characterizing provenance in visualization and data analysis: an organizational framework of provenance types and purposes. IEEE Transactions on Visualization and Computer Graphics, 22(1):31--40, 2016.Google ScholarDigital Library
- R. Ramakrishnan and J. Gehrke. Database management systems. McGraw-Hill, New York, third edition edition, 2003. Google ScholarDigital Library
- M. C. Reddy, P. Dourish, and W. Pratt. Temporality in medical work: Time also matters. Computer Supported Cooperative Work, 15(1):29--53, 2006. Google ScholarDigital Library
- D. H. Sonnenwald. Scientific collaboration. Annual review of information science and technology, 41(1):643--681, 2007. Google ScholarDigital Library
- Apache subversion. https://subversion.apache.org/. Accessed: 2019-04--23.Google Scholar
- Sumatra 0.7.0 documentation. https://pythonhosted.org/Sumatra/ record_stores.html. Accessed: 2019--12-03.Google Scholar
- S. Sun, J. Chen, W. Li, I. Altintas, A. Lin, S. Peltier, K. Stocks, E. E. Allen, M. Ellisman, J. Grethe, and J. Wooley. Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Research, 39:D546--551, 2011.Google ScholarCross Ref
- A. S. Tanenbaum. Modern operating systems. Prentice Hall, Upper Saddle River, N.J, 3 edition edition, Dec. 2007. Google ScholarDigital Library
- Gt4 globus toolkit web site. http://toolkit.globus.org/toolkit/. Accessed: 2019-04--23.Google Scholar
- G. H. Travassos and M. O. Barros. Contributions of in virtuo and in silico experiments for the future of empirical studies in software engineering. In Workshop on Empirical Software Engineering the Future of Empirical Studies in Software Engineering, pages 117--130, 2003.Google Scholar
- S. Vali and S. Sreerama. Multi-user tool for scientific work flow composition. International Journal of Computer Trends & Technology, 4, 2013.Google Scholar
- J. N. Van Rijn, B. Bischl, L. Torgo, B. Gao, V. Umaashankar, S. Fischer, P. Winter, B. Wiswedel, M. R. Berthold, and J. Vanschoren. Openml: A collaborative science platform. In Joint european conference on machine learning and knowledge discovery in databases, pages 645--649. Springer, 2013. Google ScholarDigital Library
- H. Wang, K. W. Brodlie, J. W. Handley, and J. D. Wood. Service-oriented approach to collaborative visualization. Concurrency and Computation: Practice and Experience, 20(11):1289--1301, 2008. Google ScholarDigital Library
- M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M. Hategan, B. Clifford, and I. Raicu. Parallel scripting for applications at the petascale and beyond. 50 SIGMOD Record, June 2020 (Vol. 49, No. 2) Computer, 42(11):50--60, 2009. Google ScholarDigital Library
- J. Wood, H. Wright, and K. Brodlie. Collaborative visualization. In Conference on Visualization, pages 253--259. IEEE Computer Society Press, 1997. Google ScholarDigital Library
- S. Wuchty, B. F. Jones, and B. Uzzi. The increasing dominance of teams in production of knowledge. Science, 316(5827):1036--1039, 2007.Google ScholarCross Ref
- J. Zhang. Co-taverna: A tool supporting collaborative scientific workflows. In IEEE International Conference on Services Computing, pages 41--48, 2010. Google ScholarDigital Library
- J. Zhang, Q. Bao, X. Duan, S. Lu, L. Xue, R. Shi, and P. Tang. Collaborative scientific workflow composition as a service: An infrastructure supporting collaborative data analytics workflow design and management. In IEEE International Conference on Collaboration and Internet Computing, pages 219--228, 2016.Google ScholarCross Ref
- J. Zhang, C. K. Chang, and J. Voas. A uniform meta-model for mediating formal electronic conferences. In International Computer Software and Applications Conference, pages 376--381. IEEE, 2004. Google ScholarDigital Library
- J. Zhang, D. Kuc, and S. Lu. Confucius: A tool supporting collaborative scientific workflow composition. IEEE Transactions on Services Computing, 7(1), 2012. Google ScholarDigital Library
Recommendations
Provenance and scientific workflows: challenges and opportunities
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataProvenance in the context of workflows, both for the data they derive and for their specification, is an essential component to allow for result reproducibility, sharing, and knowledge re-use in the scientific community. Several workshops have been held ...
Facilitating collaborative biomedical research
GROUP '07: Doctoral Consortium Papers of the 2007 ACM International Conference on Supporting Group WorkA thorough study of biomedical research collaboration is necessary to systematically identify the social and technical infrastructure needed to support and improve the development of informatics tools to facilitate biomedical collaboration. This ...
Collaborative Scientific Workflows
ICWS '09: Proceedings of the 2009 IEEE International Conference on Web ServicesIn recent years, a number of scientific workflow management systems (SWFMSs) have been developed to help domain scientists synergistically integrate distributed computations, datasets, and analysis tools to enable and accelerate scientific discoveries. ...
Comments