Skip to main content

The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project

  • Conference paper
  • First Online:
Large-Scale Parallel Data Mining

Abstract

Data Mining draws on many technologies to deliver novel and actionable discoveries from very large collections of data. The Australian Government’s Cooperative Research Centre for Advanced Computational Systems (ACSys) is a link between industry and research focusing on the deployment of high performance computers for data mining. We present an overview of the work of the ACSys Data Mining projects where the use of large-scale, high performance computers plays a key role. We highlight the use of large-scale computing within three complimentary areas: the development of parallel algorithms for data analysis, the deployment of virtual environments for data mining, and issues in data management for data mining. We also introduce the Data Miner’s Arcade which provides simple abstractions to integrate these components providing high performance data access for a variety of data mining tools communicating through XML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., eds.: Advances in Knowledge Discovery and Data Mining. AAAI Press (1996) 26

    Google Scholar 

  2. Grossman, R., Bailey, S., Ramu, A., Malhi, B., Hallstrom, P., Pulleyn, I., Qin, X.: The management and mining of multiple predictive models using the predictive modelling markup language. Information and Software Technology 41 (1999) 26, 48

    Google Scholar 

  3. Hegland, M., McIntosh, I., Turlach, B.: A parallel solver for generalised additive models. submitted (1999) 27, 27

    Google Scholar 

  4. Hegland, M., Roberts, S., Altas, I.: Finite element thin plate splines for surface fitting. In Noye, B., Teubner, M., Gill, A., eds.: Computational Techniques and Applications: CTAC97, World Scientific (1997) 289–296 27, 29

    Google Scholar 

  5. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993) 27, 44

    Google Scholar 

  6. Friedman, J.: Multivariate adaptive regression splines. The Annals of Statistics 19 (1991) 1–141 27, 30, 33, 33

    Article  MATH  MathSciNet  Google Scholar 

  7. Friedman, J.H., Fisher, N.I.: Bump hunting in high dimensional data. http://www-stat.stanford.edu/~jhf/ftp/prim.ps.Z (1997) 27

  8. Williams, G.J.: Evolutionary hot spots data mining. In: Advances in Data Mining (PAKDD99). Lecture Notes in Computer Science. Springer-Verlag (1999) 27, 35

    Google Scholar 

  9. Mason, L., Bartlett, P.L., Baxter, J.: Direct optimization of margins improves generalization in combined classifiers. http://www.syseng.anu.edu.au/~lmason/nips98.ps (1998) 27

  10. Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961) 27

    MATH  Google Scholar 

  11. Hastie, T., Tibshirani, R.: Generalized Additive Models. Volume 43 of Monographs on Statistics and Applied Probability. Chapman and Hall, London (1990) 27, 27

    MATH  Google Scholar 

  12. Wahba, G.: Spline models for observational data. In: CBMS-NSF Regional Conference Series in Applied Mathematics. Volume 59., SIAM (1990) 28, 28

    Google Scholar 

  13. Beatson, R., Light, W.: Fast evaluation of radial basis functions: methods for two-dimensional polyharmonic splines. IMA J. Numer. Anal. 17 (1997) 343–372 28

    Article  MATH  MathSciNet  Google Scholar 

  14. Beatson, R., Powell, M.: An iterative method for thin plate spline interpolation that employs approximations to lagrange functions. In: Numerical analysis 1993 (Dundee 1993). Volume 303 of Pitman Res. Notes Math. Ser., Longman Sci. Tech., Harlow (1994) 17–39 28

    Google Scholar 

  15. Christen, P., Altas, I., Hegland, M., Roberts, S., Burrage, K., Sidje, R.: A Parallel Finite Element Surface Fitting Algorithm for Data Mining. submitted to IC Press (1999) 29, 30

    Google Scholar 

  16. Grama, A.Y., Gupta, A., Kumar, V.: Isoefficiency: measuring the scalability of parallel algorithms and architectures. IEEE parallel and distributed technology: systems and applications 1 (1993) 12–21 30

    Article  Google Scholar 

  17. Bakin, S., Hegland, M., Williams, G.: Mining taxation data with parallel BMARS. Submitted for publication, Parallel Algorithms and Applications (1999) 30, 35

    Google Scholar 

  18. Miller, A.J.: Subset Selection in Regression. Chapman and Hall (1990) 32, 33

    Google Scholar 

  19. Friedman, J.H.: Estimating functions of mixed ordinal and categorical variables. Technical Report 108, Stanford University (1991) 33

    Google Scholar 

  20. Bakin, S.: Adaptive Regression and Model Selection in Data Mining Problems. PhD thesis, Australian National University (1999) 34, 35, 35

    Google Scholar 

  21. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R.: PVM: Parallel Virtual Machine. MIT Press (1994) 35, 35

    Google Scholar 

  22. Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1983) 35

    MATH  Google Scholar 

  23. Williams, G.J., Huang, Z.: Mining the knowledge mine: The Hot Spots methodology for mining large, real world databases. In Sattar, A., ed.: Advanced Topics in Artificial Intelligence (AI97). Volume 1342 of Lecture Notes in Computer Science. Springer-Verlag (1997) 340–348 35, 45

    Chapter  Google Scholar 

  24. Nagappan, R.: Visualising multidimensional non-geometric data sets. Submitted for publication (1999) 35

    Google Scholar 

  25. A. Inselberg, B. D.: Parallel coordinates: A tool for visualising multi-dimensional geometry. In: Proceedings of IEEE Visualisation’ 90. (1990) 36

    Google Scholar 

  26. Alpern, B., Carter, L.: The hyperbox. In: Proceedings of IEEE Visualisation’ 91. (1991) 36

    Google Scholar 

  27. Keim, D.A., Kriegel, H.P.: Visdb: Database exploration using multidimensional visualization. Computer Graphics and Applications (1994) 40–49 36

    Google Scholar 

  28. Beshers, C., Feiner, S.: Worlds within worlds: Metaphors for exploring n-dimensional virtual worlds. In: ACM Symposium on User Interface Software and Technology. (1990) 36

    Google Scholar 

  29. Leonard, R.: Information visualization of the federal budget. In: Data Visualization Conference. (1997) 36

    Google Scholar 

  30. Asimov, D.: A tool for viewing multidimensional data. SIAM Journal of Scientific and Statisical Computing (1985) 36

    Google Scholar 

  31. Chernoff, H.: The use of faces to represent points in k-dimensional space graphically. Journal of the American Statistical Association 68 (1973) 361–368 36

    Article  Google Scholar 

  32. Nagappan, R.: Visualising multidimensional nongeometric data sets. In: SPIE Visual Data Exploration and Analysis VII, San Jose, CA (2000) 36

    Google Scholar 

  33. Cleveland, W.S.: Visualising Data. Hobart Press, Summit NJ (1993) 38

    Google Scholar 

  34. Nagappan, R., Lin, T.: A virtual environment for exploring relational information. In: Proceedings of SimTecT’ 99. (1999) 38

    Google Scholar 

  35. Lin, T., Cheung, R., He, Z., K, K.S.: Exploration of data from modelling and simulation through visualisation. In: Proceedings of the Third International SimTecT Conference, Adelaide, Australia (1998) 38

    Google Scholar 

  36. Bailey, S., Creel, E., Grossman, R.L., Gutti, S., Sivakumar, H.: A high performace implementation of the data space transfer protocol. In: Proceedings of the KDD99 Workshop on Large-Scale Parallel Data Mining, ACM SIGKDD (1999) 40

    Google Scholar 

  37. Marquez, A., Zigman, J., Blackburn, S.: Fast, portable orthogonally persistent java using semi dynamic semantic extensions. Submitted for publication (1999) 40

    Google Scholar 

  38. Harold Ossher and Peri Tarr: Multi-dimensional separation of concerns in Hyperspace. Research report, IBM (1999) 41

    Google Scholar 

  39. Blackburn, S.M., Stanton, R.B.: The transactional object cache: A foundation for high performance persistent system construction. In Morrison, R., Jordan, M., Atkinson, M., eds.: Advances in Persistent Object Systems: Proceedings of the Eighth International Workshop on Persistent Object Systems, August 30–September 1, 1998, Tiburon, CA, U.S.A., San Francisco, Morgan Kaufmann (1999) 37–50 41

    Google Scholar 

  40. Atkinson, M.P., Jordan, M.J., Daynès, L., Spence, S.: Design issues for Persistent Java: A type-safe, object-oriented, orthogonally persistent system. In Connor, R., Nettles, S., eds.: Seventh International Workshop on Persistent Object Systems, Cape May, NJ, U.S.A, Morgan Kaufmann (1996) 33–47 41, 43, 43

    Google Scholar 

  41. GemStone Systems: GemStone/J. http://www.gemstone.com/ (1999) 41, 43, 43

  42. Kutlu, G., Moss, J.E.B.: Exploiting reflection to add persistence and query optimization to a statically typed object-oriented language. In Morrison, R., Atkinson, M., eds.: Advances in Persistent Object Systems: Proceedings of the Eighth International Workshop on Persistent Object Systems, Tiburon, CA, U.S.A., Morgan Kaufmann (1998) 123–135 41, 41, 43

    Google Scholar 

  43. Boyland, J., Catsagna, G.: Parasitic methods: An implementation of multi-methods for Java. In: Proceedings of the 1997 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA’ 97), Atlanta, Georgia, October 5–9, 199. Volume 32 of SIGPLAN Notices., ACM Press (1997) 66–76 41

    Article  Google Scholar 

  44. Agesen, O., Freund, S.N., Mitchell, J.C.: Adding type parameterization to the Java language. In: OOPSLA’97, Proceedings on the 1997 Conference on Object-Oriented Programming Systems, Languages, and Applications. Volume 32 of SIGPLAN Notices., Atlanta, GA, U.S.A., ACM (1997) 49–65 41, 41

    Google Scholar 

  45. Thorup, K.K.: Genericity in Java with virtual types. In Aksit, M., Matsuoka, S., eds.: ECCOP’97-Object-Oriented Programming, 11th European Conference, Jyväskylä, Finland, June 9–13, 1997. Number 1241 in Lecture Notes in Computer Science (LNCS), Springer-Verlag (1997) 444–471 41

    Google Scholar 

  46. Hosking, A., Nystrom, N., Cutts, Q., Brahnmath, K.: Optimizing the read and write barriers for orthogonal persistence. In Morrison, R., Jordan, M., Atkinson, M., eds.: Advances in Persistent Object Systems: Proceedings of the Eighth International Workshop on Persistent Object Systems, Tiburon, CA, U.S.A., August 30–September 1, 1998, San Francisco, Morgan Kaufmann (1998) 37–50 41

    Google Scholar 

  47. Bokowski, B., Dahm, M.: Poor man’s genericity for Java. In: Proceedings of JIT’98, Frankfurt am Main Germany, Springer Verlag (1998) 41, 42

    Google Scholar 

  48. Atkinson, M.P., Morrison, R.: Orthogonally persistent systems. The VLDB Journal 4 (1995) 319–402 42

    Article  Google Scholar 

  49. Tjasink, S.J., Berman, S.: Providing persistence on small machines. In Morrison, R., Atkinson, M., eds.: Eighth International Workshop on Persistent Object Systems, Tiburon, CA, U.S.A. (1998) 43

    Google Scholar 

  50. John V. E. Ridgway, C. T., Wileden, J.C.: Toward assessing approaches to persistence for java. In Jordan, M., Atkinson, M., eds.: Advances in Persistent Object Systems: Proceedings of the Eighth International Workshop on Persistent Object Systems, August 13–15, 1998, San Francisco, CA, U.S.A., San Francisco, Morgan Kaufmann (1998) 43

    Google Scholar 

  51. POET: POET Programmer’s Guide, SDK Java Edition. Product documentation, POET Software Corporation (1998) 43

    Google Scholar 

  52. Carey, M.J., DeWitt, D.J., and Nancy E. Hall, M.J. F., McAuliffe, M.L., Naughton, J.F., Schuh, D.T., Solomon, M.H., Tan, C.K., Tsatalos, O.G., th J. White, S.: Shoring up persistent applications. In Snodgrass, R.T., Winslett, M., eds.: Proceedings on the 1994 ACM-SIGMOD Conference on the Management of Data. Volume 23 of SIGMOD Record., Minneapolis, MN, U.S.A., ACM (1994) 383–394 43

    Chapter  Google Scholar 

  53. Oracle Corporation: Oracle 8.1. http://www.oracle.com/ (1999) 43

  54. Williams, G.: The Data Miner’s Arcade: A standards-based platform for the delivery of data mining. http://www.cmis.csiro.au/Graham.Williams/dataminer/Arcade.html (1998) 45

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Williams, G. et al. (2002). The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-46502-2_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67194-7

  • Online ISBN: 978-3-540-46502-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics