Abstract
Data Mining draws on many technologies to deliver novel and actionable discoveries from very large collections of data. The Australian Government’s Cooperative Research Centre for Advanced Computational Systems (ACSys) is a link between industry and research focusing on the deployment of high performance computers for data mining. We present an overview of the work of the ACSys Data Mining projects where the use of large-scale, high performance computers plays a key role. We highlight the use of large-scale computing within three complimentary areas: the development of parallel algorithms for data analysis, the deployment of virtual environments for data mining, and issues in data management for data mining. We also introduce the Data Miner’s Arcade which provides simple abstractions to integrate these components providing high performance data access for a variety of data mining tools communicating through XML.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., eds.: Advances in Knowledge Discovery and Data Mining. AAAI Press (1996) 26
Grossman, R., Bailey, S., Ramu, A., Malhi, B., Hallstrom, P., Pulleyn, I., Qin, X.: The management and mining of multiple predictive models using the predictive modelling markup language. Information and Software Technology 41 (1999) 26, 48
Hegland, M., McIntosh, I., Turlach, B.: A parallel solver for generalised additive models. submitted (1999) 27, 27
Hegland, M., Roberts, S., Altas, I.: Finite element thin plate splines for surface fitting. In Noye, B., Teubner, M., Gill, A., eds.: Computational Techniques and Applications: CTAC97, World Scientific (1997) 289–296 27, 29
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993) 27, 44
Friedman, J.: Multivariate adaptive regression splines. The Annals of Statistics 19 (1991) 1–141 27, 30, 33, 33
Friedman, J.H., Fisher, N.I.: Bump hunting in high dimensional data. http://www-stat.stanford.edu/~jhf/ftp/prim.ps.Z (1997) 27
Williams, G.J.: Evolutionary hot spots data mining. In: Advances in Data Mining (PAKDD99). Lecture Notes in Computer Science. Springer-Verlag (1999) 27, 35
Mason, L., Bartlett, P.L., Baxter, J.: Direct optimization of margins improves generalization in combined classifiers. http://www.syseng.anu.edu.au/~lmason/nips98.ps (1998) 27
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961) 27
Hastie, T., Tibshirani, R.: Generalized Additive Models. Volume 43 of Monographs on Statistics and Applied Probability. Chapman and Hall, London (1990) 27, 27
Wahba, G.: Spline models for observational data. In: CBMS-NSF Regional Conference Series in Applied Mathematics. Volume 59., SIAM (1990) 28, 28
Beatson, R., Light, W.: Fast evaluation of radial basis functions: methods for two-dimensional polyharmonic splines. IMA J. Numer. Anal. 17 (1997) 343–372 28
Beatson, R., Powell, M.: An iterative method for thin plate spline interpolation that employs approximations to lagrange functions. In: Numerical analysis 1993 (Dundee 1993). Volume 303 of Pitman Res. Notes Math. Ser., Longman Sci. Tech., Harlow (1994) 17–39 28
Christen, P., Altas, I., Hegland, M., Roberts, S., Burrage, K., Sidje, R.: A Parallel Finite Element Surface Fitting Algorithm for Data Mining. submitted to IC Press (1999) 29, 30
Grama, A.Y., Gupta, A., Kumar, V.: Isoefficiency: measuring the scalability of parallel algorithms and architectures. IEEE parallel and distributed technology: systems and applications 1 (1993) 12–21 30
Bakin, S., Hegland, M., Williams, G.: Mining taxation data with parallel BMARS. Submitted for publication, Parallel Algorithms and Applications (1999) 30, 35
Miller, A.J.: Subset Selection in Regression. Chapman and Hall (1990) 32, 33
Friedman, J.H.: Estimating functions of mixed ordinal and categorical variables. Technical Report 108, Stanford University (1991) 33
Bakin, S.: Adaptive Regression and Model Selection in Data Mining Problems. PhD thesis, Australian National University (1999) 34, 35, 35
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R.: PVM: Parallel Virtual Machine. MIT Press (1994) 35, 35
Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1983) 35
Williams, G.J., Huang, Z.: Mining the knowledge mine: The Hot Spots methodology for mining large, real world databases. In Sattar, A., ed.: Advanced Topics in Artificial Intelligence (AI97). Volume 1342 of Lecture Notes in Computer Science. Springer-Verlag (1997) 340–348 35, 45
Nagappan, R.: Visualising multidimensional non-geometric data sets. Submitted for publication (1999) 35
A. Inselberg, B. D.: Parallel coordinates: A tool for visualising multi-dimensional geometry. In: Proceedings of IEEE Visualisation’ 90. (1990) 36
Alpern, B., Carter, L.: The hyperbox. In: Proceedings of IEEE Visualisation’ 91. (1991) 36
Keim, D.A., Kriegel, H.P.: Visdb: Database exploration using multidimensional visualization. Computer Graphics and Applications (1994) 40–49 36
Beshers, C., Feiner, S.: Worlds within worlds: Metaphors for exploring n-dimensional virtual worlds. In: ACM Symposium on User Interface Software and Technology. (1990) 36
Leonard, R.: Information visualization of the federal budget. In: Data Visualization Conference. (1997) 36
Asimov, D.: A tool for viewing multidimensional data. SIAM Journal of Scientific and Statisical Computing (1985) 36
Chernoff, H.: The use of faces to represent points in k-dimensional space graphically. Journal of the American Statistical Association 68 (1973) 361–368 36
Nagappan, R.: Visualising multidimensional nongeometric data sets. In: SPIE Visual Data Exploration and Analysis VII, San Jose, CA (2000) 36
Cleveland, W.S.: Visualising Data. Hobart Press, Summit NJ (1993) 38
Nagappan, R., Lin, T.: A virtual environment for exploring relational information. In: Proceedings of SimTecT’ 99. (1999) 38
Lin, T., Cheung, R., He, Z., K, K.S.: Exploration of data from modelling and simulation through visualisation. In: Proceedings of the Third International SimTecT Conference, Adelaide, Australia (1998) 38
Bailey, S., Creel, E., Grossman, R.L., Gutti, S., Sivakumar, H.: A high performace implementation of the data space transfer protocol. In: Proceedings of the KDD99 Workshop on Large-Scale Parallel Data Mining, ACM SIGKDD (1999) 40
Marquez, A., Zigman, J., Blackburn, S.: Fast, portable orthogonally persistent java using semi dynamic semantic extensions. Submitted for publication (1999) 40
Harold Ossher and Peri Tarr: Multi-dimensional separation of concerns in Hyperspace. Research report, IBM (1999) 41
Blackburn, S.M., Stanton, R.B.: The transactional object cache: A foundation for high performance persistent system construction. In Morrison, R., Jordan, M., Atkinson, M., eds.: Advances in Persistent Object Systems: Proceedings of the Eighth International Workshop on Persistent Object Systems, August 30–September 1, 1998, Tiburon, CA, U.S.A., San Francisco, Morgan Kaufmann (1999) 37–50 41
Atkinson, M.P., Jordan, M.J., Daynès, L., Spence, S.: Design issues for Persistent Java: A type-safe, object-oriented, orthogonally persistent system. In Connor, R., Nettles, S., eds.: Seventh International Workshop on Persistent Object Systems, Cape May, NJ, U.S.A, Morgan Kaufmann (1996) 33–47 41, 43, 43
GemStone Systems: GemStone/J. http://www.gemstone.com/ (1999) 41, 43, 43
Kutlu, G., Moss, J.E.B.: Exploiting reflection to add persistence and query optimization to a statically typed object-oriented language. In Morrison, R., Atkinson, M., eds.: Advances in Persistent Object Systems: Proceedings of the Eighth International Workshop on Persistent Object Systems, Tiburon, CA, U.S.A., Morgan Kaufmann (1998) 123–135 41, 41, 43
Boyland, J., Catsagna, G.: Parasitic methods: An implementation of multi-methods for Java. In: Proceedings of the 1997 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA’ 97), Atlanta, Georgia, October 5–9, 199. Volume 32 of SIGPLAN Notices., ACM Press (1997) 66–76 41
Agesen, O., Freund, S.N., Mitchell, J.C.: Adding type parameterization to the Java language. In: OOPSLA’97, Proceedings on the 1997 Conference on Object-Oriented Programming Systems, Languages, and Applications. Volume 32 of SIGPLAN Notices., Atlanta, GA, U.S.A., ACM (1997) 49–65 41, 41
Thorup, K.K.: Genericity in Java with virtual types. In Aksit, M., Matsuoka, S., eds.: ECCOP’97-Object-Oriented Programming, 11th European Conference, Jyväskylä, Finland, June 9–13, 1997. Number 1241 in Lecture Notes in Computer Science (LNCS), Springer-Verlag (1997) 444–471 41
Hosking, A., Nystrom, N., Cutts, Q., Brahnmath, K.: Optimizing the read and write barriers for orthogonal persistence. In Morrison, R., Jordan, M., Atkinson, M., eds.: Advances in Persistent Object Systems: Proceedings of the Eighth International Workshop on Persistent Object Systems, Tiburon, CA, U.S.A., August 30–September 1, 1998, San Francisco, Morgan Kaufmann (1998) 37–50 41
Bokowski, B., Dahm, M.: Poor man’s genericity for Java. In: Proceedings of JIT’98, Frankfurt am Main Germany, Springer Verlag (1998) 41, 42
Atkinson, M.P., Morrison, R.: Orthogonally persistent systems. The VLDB Journal 4 (1995) 319–402 42
Tjasink, S.J., Berman, S.: Providing persistence on small machines. In Morrison, R., Atkinson, M., eds.: Eighth International Workshop on Persistent Object Systems, Tiburon, CA, U.S.A. (1998) 43
John V. E. Ridgway, C. T., Wileden, J.C.: Toward assessing approaches to persistence for java. In Jordan, M., Atkinson, M., eds.: Advances in Persistent Object Systems: Proceedings of the Eighth International Workshop on Persistent Object Systems, August 13–15, 1998, San Francisco, CA, U.S.A., San Francisco, Morgan Kaufmann (1998) 43
POET: POET Programmer’s Guide, SDK Java Edition. Product documentation, POET Software Corporation (1998) 43
Carey, M.J., DeWitt, D.J., and Nancy E. Hall, M.J. F., McAuliffe, M.L., Naughton, J.F., Schuh, D.T., Solomon, M.H., Tan, C.K., Tsatalos, O.G., th J. White, S.: Shoring up persistent applications. In Snodgrass, R.T., Winslett, M., eds.: Proceedings on the 1994 ACM-SIGMOD Conference on the Management of Data. Volume 23 of SIGMOD Record., Minneapolis, MN, U.S.A., ACM (1994) 383–394 43
Oracle Corporation: Oracle 8.1. http://www.oracle.com/ (1999) 43
Williams, G.: The Data Miner’s Arcade: A standards-based platform for the delivery of data mining. http://www.cmis.csiro.au/Graham.Williams/dataminer/Arcade.html (1998) 45
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Williams, G. et al. (2002). The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_2
Download citation
DOI: https://doi.org/10.1007/3-540-46502-2_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67194-7
Online ISBN: 978-3-540-46502-7
eBook Packages: Springer Book Archive