Skip to main content
Log in

ADAGE: a framework for supporting user-driven ad-hoc data analysis processes

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Data analysis is an important part of the scientific process carried out by domain experts in data-intensive science. Despite the availability of several software tools and systems, their use in combination with each other for conducting complex types of analyses is a very difficult task for non-IT experts. The main contribution of this paper is to introduce an open architectural framework based on service-oriented computing (SOC) principles called the Ad-hoc DAta Grid Environment (ADAGE) framework that can be used to guide the development of domain-specific problem-solving environments or systems to support data analysis activities. Through an application of the ADAGE framework and a prototype implementation that supports the analysis of financial news and market data, this paper demonstrates that systems developed based on the framework allow users to effectively express common analysis processes. This paper also outlines some limitations as well as avenues for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hey, T, Tansley, S, Tolle, K (eds) (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research, Redmond

    Google Scholar 

  2. Ozdemir V, Smith C, Bongiovanni K, Cullen D, Knoppers BM, Lowe A, Peters M, Robbins R, Stewart E, Yee G, Yu Y, Kolker E (2011) Policy and data-intensive scientific discovery in the beginning of the 21st century. OMICS A J Integr Biol 15(4): 221–225

    Article  Google Scholar 

  3. McFedries P (2011) The coming data deluge. IEEE Spectrum 48(2): 19

    Article  Google Scholar 

  4. Szalay A (2011) Extreme data-intensive scientific computing. IEEE Comput Sci Eng 13(6): 34–41

    MathSciNet  Google Scholar 

  5. Tukey JW (1977) Exploratory data analysis. Addison–Wesley Publishing Company, Reading

    MATH  Google Scholar 

  6. Hartwig F, Dearing BE (1979) Exploratory data analysis. Sage Publications, Beverly Hills

    Google Scholar 

  7. Dacorogna MM, Gençay R, Müller U, Olsen RB, Pictet OV (2001) An introduction to high-frequency finance. Academic Press, San Diego

    Google Scholar 

  8. Yao L, Rabhi FA (2010) Modelling exploratory analysis processes for eResearch. In: Proceedings of 21st Australasian Conference on Information Systems (ACIS 2010), http://aisel.aisnet.org/acis2010/14. Accessed 1 March 2011

  9. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18

    Article  Google Scholar 

  10. Foster I, Kesselman C, Tuecke S (2001) The anatomy of the grid: enabling scalable virtual organizations. Int J High Perform Comput Appl 15(3): 200–222

    Article  Google Scholar 

  11. Schwiegelshohn U, Badia RM, Bubak M, Danelutto M, Dustdar S, Gagliardi F (2010) Perspectives on grid computing. Future Generation Comput Syst 26(8): 1104–1115

    Article  Google Scholar 

  12. Frischbier S, Petrov I (2010) Aspects of data-intensive cloud computing. In: Sachs K, Petrov I, Guerrero P (eds) From active data management to event-based systems and more. Springer, Berlin, pp 57–77

    Chapter  Google Scholar 

  13. Park J, Ram S (2004) Information systems interoperability: what lies beneath. ACM Trans Inform Syst 22(4): 595–632

    Article  Google Scholar 

  14. Wegner P (1996) Interoperability. ACM Comput Surv 28(1): 285–287

    Article  Google Scholar 

  15. Papazoglou MP (2008) Web services: principles and technology. Pearson Education, Harlow

    Google Scholar 

  16. Nezhad HRM, Benatallah B, Casati F, Toumani F (2006) Web services interoperability specifications. IEEE Comput 39(5): 24–32

    Article  Google Scholar 

  17. Houstis E, Gallopoulos E, Bramley R, Rice J (1997) Problem-solving environments for computational science. IEEE Comput Sci Eng 4(3): 18–21

    Article  Google Scholar 

  18. Bunnin FO, Guo Y, Darlington J (2001) Design of problem-solving environment for contingent claim valuation. In: Sakellariou R, Keane J, Gurd J, Freeman L (eds) Euro-Par 2001 parallel processing. Springer, Berlin, pp 935–938

    Chapter  Google Scholar 

  19. Fleeter S, Houstis E, Rice J, Zhou C, Catlin A (2000) GasTurbnLab: a problem solving environment for simulating gas turbines. In: Proceedings of 16th IMACS world congress on scientific computation, applied mathematics and simulation, pp 104–105

  20. Buschmann F, Meunier R, Rohnert H, Sommerlad P, Stal M (1996) Pattern-oriented software architecture: a system of patterns. Wiley, Chichester

    Google Scholar 

  21. Klösgen W, Żytkow JM (2002) Knowledge discovery in databases: the purpose, necessity, and challenges. In: Klösgen W, Żytkow JM (eds) Handbook of data mining and knowledge discovery. Oxford University Press, New York, pp 1–9

    Google Scholar 

  22. Brodie ML (1984) On the development of data models. In: Brodie ML, Mylopoulos J, Schmidt JW (eds) On conceptual modelling: perspectives from artificial intelligence, databases, and programming languages. Springer, New York, pp 19–47

    Google Scholar 

  23. Maurizio A, Sager J, Jones P, Corbitt G, Girolami L (2008) Service oriented architecture: challenges for business and academia. In: Proceedings of 41st annual Hawaii international conference on system sciences (HICSS 2008) (CD-ROM)

  24. Bieberstein N, Bose S, Fiammante M, Jones K, Shah R (2005) Service-oriented architecture (SOA) compass: business value, planning, and enterprise roadmap. IBM Press, Indianapolis

    Google Scholar 

  25. Mackenzie CM, Laskey K, McCabe F, Brown PF, Metz R (2006) OASIS reference model for service oriented architecture 1.0, Official OASIS Standard. OASIS SOA Reference Model Technical Committee, http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=soa-rm. Accessed 1 March 2011

  26. Ogle D, Kreger H, Salahshour A et al (2004) Canonical situation data format: the common base event V1.0.1, IBM, http://www.eclipse.org/tptp/platform/documents/resources/cbe101spec/CommonBaseEventSituationDataV1.0.1.pdf. Accessed 1 March 2011

  27. TRTH (2010) Thomson Reuters Tick History. http://thomsonreuters.com/products_services/financial/financial_products/quantitave_research_trading/tick_history. Accessed 1 July 2010

  28. Sun W, Rachev S, Fabozzi F (2008) Long-range dependence, fractal processes, and intra-daily data. In: Seese D, Weinhardt C, Schlottmann F (eds) Handbook on information technology in finance. Springer, Berlin, pp 543–585

    Chapter  Google Scholar 

  29. Goodhart CAE, O’Hara M (1997) High frequency data in financial markets: issues and applications. J Empir Finance 4(2-3): 73–114

    Article  Google Scholar 

  30. Andersen TG (2000) Some reflections on analysis of high-frequency data. J Bus Econ Stat 18(2): 146–153

    Article  Google Scholar 

  31. Bollerslev T, Law TH, Tauchen G (2008) Risk, jumps, and diversification. J Econ 144(1): 234–256

    MathSciNet  Google Scholar 

  32. Rabhi FA, Guabtni A, Yao L (2009) A data model for processing financial market and news data. Int J Electron Finance 3(4): 387–403

    Article  Google Scholar 

  33. DeBardeleben N, Sass R, Stanzione D, Ligon WB (2009) Building problem-solving environments with the Arches framework. J Syst Softw 82(7): 1137–1151

    Article  Google Scholar 

  34. Carmichael R, Braga-Henebry P, Thain D, Emrich S (2011) Biocompute 2.0: an improved collaborative workspace for data intensive bio-science. Concurrency Comput Pract Exp 23(17): 2305–2314

    Article  Google Scholar 

  35. Litzkow MJ, Livny M, Mutka MW (1988) Condor—a hunter of idle workstations. In: Proceedings of 8th international conference on distributed computing systems, IEEE, pp 104–111

  36. Yu L, Moretti C, Thrasher A, Emrich S, Judd K et al (2010) Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions. J Cluster Comput 13(3): 243–256

    Article  MATH  Google Scholar 

  37. Foster I, Kesselman C, Tuecke S (2004) The open grid services architecture. In: Foster I, Kesselman C (eds) The Grid blue print for a new computing infrastructure. Morgan Kaufmann, San Francisco, pp 215–257

    Google Scholar 

  38. Armstrong R, Gannon D, Geist A, Keahey K, Kohn S, McInnes L, Parker S, Smolinski B (1999) Toward a common component architecture for high-performance scientific computing. In: Proceedings of 8th international symposium on high performance distributed computing, IEEE, pp 115–124

  39. Frankel DS (2003) Model driven architecture: applying MDA to enterprise computing. Wiley, Indianapolis

    Google Scholar 

  40. Berre A, Elvesæter B, Figay N, Guglielmina C, Johnsen SG, Karlsen D, Knothe T, Lippe S (2007) The ATHENA interoperability framework. In: Gonçalves RJ, Müller JP, Mertins K, Zelm M (eds) Enterprise interoperability II: new challenges and approaches. Springer, London, pp 569–580

    Google Scholar 

  41. Milanovic N, Malek M (2004) Current solutions for Web service composition. IEEE Internet Comput 8(6): 51–59

    Article  Google Scholar 

  42. Davenport TH (1993) Process innovation: reengineering work through information technology. Harvard Business School Press, Boston

    Google Scholar 

  43. Fischer G, Nakakoji K, Ye Y (2009) Metadesign: guidelines for supporting domain experts in software development. IEEE Softw 26(5): 37–44

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lawrence Yao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rabhi, F.A., Yao, L. & Guabtni, A. ADAGE: a framework for supporting user-driven ad-hoc data analysis processes. Computing 94, 489–519 (2012). https://doi.org/10.1007/s00607-012-0193-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-012-0193-0

Keywords

Mathematics Subject Classification

Navigation