Abstract
Data analysis is an important part of the scientific process carried out by domain experts in data-intensive science. Despite the availability of several software tools and systems, their use in combination with each other for conducting complex types of analyses is a very difficult task for non-IT experts. The main contribution of this paper is to introduce an open architectural framework based on service-oriented computing (SOC) principles called the Ad-hoc DAta Grid Environment (ADAGE) framework that can be used to guide the development of domain-specific problem-solving environments or systems to support data analysis activities. Through an application of the ADAGE framework and a prototype implementation that supports the analysis of financial news and market data, this paper demonstrates that systems developed based on the framework allow users to effectively express common analysis processes. This paper also outlines some limitations as well as avenues for future research.
Similar content being viewed by others
References
Hey, T, Tansley, S, Tolle, K (eds) (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research, Redmond
Ozdemir V, Smith C, Bongiovanni K, Cullen D, Knoppers BM, Lowe A, Peters M, Robbins R, Stewart E, Yee G, Yu Y, Kolker E (2011) Policy and data-intensive scientific discovery in the beginning of the 21st century. OMICS A J Integr Biol 15(4): 221–225
McFedries P (2011) The coming data deluge. IEEE Spectrum 48(2): 19
Szalay A (2011) Extreme data-intensive scientific computing. IEEE Comput Sci Eng 13(6): 34–41
Tukey JW (1977) Exploratory data analysis. Addison–Wesley Publishing Company, Reading
Hartwig F, Dearing BE (1979) Exploratory data analysis. Sage Publications, Beverly Hills
Dacorogna MM, Gençay R, Müller U, Olsen RB, Pictet OV (2001) An introduction to high-frequency finance. Academic Press, San Diego
Yao L, Rabhi FA (2010) Modelling exploratory analysis processes for eResearch. In: Proceedings of 21st Australasian Conference on Information Systems (ACIS 2010), http://aisel.aisnet.org/acis2010/14. Accessed 1 March 2011
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18
Foster I, Kesselman C, Tuecke S (2001) The anatomy of the grid: enabling scalable virtual organizations. Int J High Perform Comput Appl 15(3): 200–222
Schwiegelshohn U, Badia RM, Bubak M, Danelutto M, Dustdar S, Gagliardi F (2010) Perspectives on grid computing. Future Generation Comput Syst 26(8): 1104–1115
Frischbier S, Petrov I (2010) Aspects of data-intensive cloud computing. In: Sachs K, Petrov I, Guerrero P (eds) From active data management to event-based systems and more. Springer, Berlin, pp 57–77
Park J, Ram S (2004) Information systems interoperability: what lies beneath. ACM Trans Inform Syst 22(4): 595–632
Wegner P (1996) Interoperability. ACM Comput Surv 28(1): 285–287
Papazoglou MP (2008) Web services: principles and technology. Pearson Education, Harlow
Nezhad HRM, Benatallah B, Casati F, Toumani F (2006) Web services interoperability specifications. IEEE Comput 39(5): 24–32
Houstis E, Gallopoulos E, Bramley R, Rice J (1997) Problem-solving environments for computational science. IEEE Comput Sci Eng 4(3): 18–21
Bunnin FO, Guo Y, Darlington J (2001) Design of problem-solving environment for contingent claim valuation. In: Sakellariou R, Keane J, Gurd J, Freeman L (eds) Euro-Par 2001 parallel processing. Springer, Berlin, pp 935–938
Fleeter S, Houstis E, Rice J, Zhou C, Catlin A (2000) GasTurbnLab: a problem solving environment for simulating gas turbines. In: Proceedings of 16th IMACS world congress on scientific computation, applied mathematics and simulation, pp 104–105
Buschmann F, Meunier R, Rohnert H, Sommerlad P, Stal M (1996) Pattern-oriented software architecture: a system of patterns. Wiley, Chichester
Klösgen W, Żytkow JM (2002) Knowledge discovery in databases: the purpose, necessity, and challenges. In: Klösgen W, Żytkow JM (eds) Handbook of data mining and knowledge discovery. Oxford University Press, New York, pp 1–9
Brodie ML (1984) On the development of data models. In: Brodie ML, Mylopoulos J, Schmidt JW (eds) On conceptual modelling: perspectives from artificial intelligence, databases, and programming languages. Springer, New York, pp 19–47
Maurizio A, Sager J, Jones P, Corbitt G, Girolami L (2008) Service oriented architecture: challenges for business and academia. In: Proceedings of 41st annual Hawaii international conference on system sciences (HICSS 2008) (CD-ROM)
Bieberstein N, Bose S, Fiammante M, Jones K, Shah R (2005) Service-oriented architecture (SOA) compass: business value, planning, and enterprise roadmap. IBM Press, Indianapolis
Mackenzie CM, Laskey K, McCabe F, Brown PF, Metz R (2006) OASIS reference model for service oriented architecture 1.0, Official OASIS Standard. OASIS SOA Reference Model Technical Committee, http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=soa-rm. Accessed 1 March 2011
Ogle D, Kreger H, Salahshour A et al (2004) Canonical situation data format: the common base event V1.0.1, IBM, http://www.eclipse.org/tptp/platform/documents/resources/cbe101spec/CommonBaseEventSituationDataV1.0.1.pdf. Accessed 1 March 2011
TRTH (2010) Thomson Reuters Tick History. http://thomsonreuters.com/products_services/financial/financial_products/quantitave_research_trading/tick_history. Accessed 1 July 2010
Sun W, Rachev S, Fabozzi F (2008) Long-range dependence, fractal processes, and intra-daily data. In: Seese D, Weinhardt C, Schlottmann F (eds) Handbook on information technology in finance. Springer, Berlin, pp 543–585
Goodhart CAE, O’Hara M (1997) High frequency data in financial markets: issues and applications. J Empir Finance 4(2-3): 73–114
Andersen TG (2000) Some reflections on analysis of high-frequency data. J Bus Econ Stat 18(2): 146–153
Bollerslev T, Law TH, Tauchen G (2008) Risk, jumps, and diversification. J Econ 144(1): 234–256
Rabhi FA, Guabtni A, Yao L (2009) A data model for processing financial market and news data. Int J Electron Finance 3(4): 387–403
DeBardeleben N, Sass R, Stanzione D, Ligon WB (2009) Building problem-solving environments with the Arches framework. J Syst Softw 82(7): 1137–1151
Carmichael R, Braga-Henebry P, Thain D, Emrich S (2011) Biocompute 2.0: an improved collaborative workspace for data intensive bio-science. Concurrency Comput Pract Exp 23(17): 2305–2314
Litzkow MJ, Livny M, Mutka MW (1988) Condor—a hunter of idle workstations. In: Proceedings of 8th international conference on distributed computing systems, IEEE, pp 104–111
Yu L, Moretti C, Thrasher A, Emrich S, Judd K et al (2010) Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions. J Cluster Comput 13(3): 243–256
Foster I, Kesselman C, Tuecke S (2004) The open grid services architecture. In: Foster I, Kesselman C (eds) The Grid blue print for a new computing infrastructure. Morgan Kaufmann, San Francisco, pp 215–257
Armstrong R, Gannon D, Geist A, Keahey K, Kohn S, McInnes L, Parker S, Smolinski B (1999) Toward a common component architecture for high-performance scientific computing. In: Proceedings of 8th international symposium on high performance distributed computing, IEEE, pp 115–124
Frankel DS (2003) Model driven architecture: applying MDA to enterprise computing. Wiley, Indianapolis
Berre A, Elvesæter B, Figay N, Guglielmina C, Johnsen SG, Karlsen D, Knothe T, Lippe S (2007) The ATHENA interoperability framework. In: Gonçalves RJ, Müller JP, Mertins K, Zelm M (eds) Enterprise interoperability II: new challenges and approaches. Springer, London, pp 569–580
Milanovic N, Malek M (2004) Current solutions for Web service composition. IEEE Internet Comput 8(6): 51–59
Davenport TH (1993) Process innovation: reengineering work through information technology. Harvard Business School Press, Boston
Fischer G, Nakakoji K, Ye Y (2009) Metadesign: guidelines for supporting domain experts in software development. IEEE Softw 26(5): 37–44
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rabhi, F.A., Yao, L. & Guabtni, A. ADAGE: a framework for supporting user-driven ad-hoc data analysis processes. Computing 94, 489–519 (2012). https://doi.org/10.1007/s00607-012-0193-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-012-0193-0
Keywords
- SOA applications
- Data-intensive science
- Business process modeling
- User-driven composition
- High-frequency data
- Financial market data
- Time series analysis
- Event processing systems
- Thomson Reuters
- ADAGE