Abstract
Recently we presented a new approach [5, 6] to the classification problem arising in data mining. It is based on the regularization network approach, but in contrast to other methods which employ ansatz functions associated to data points, we use basis functions coming from a grid in the usually high-dimensional feature space for the minimization process. Here, to cope with the curse of dimensionality, we employ so-called sparse grids. To be precise we use the sparse grid combination technique [11] where the classification problem is discretized and solved on a sequence of conventional grids with uniform mesh sizes in each dimension. The sparse grid solution is then obtained by linear combination. The method scales only linearly with the number of data points and is well suited for data mining applications where the amount of data is very large, but where the dimension of the feature space is moderately high. The computation on each grid of the sequence of grids is independent of each other and therefore can be done in parallel already on a coarse grain level. A second level of parallelization on a fine grain level can be introduced on each grid through the use of threading on shared-memory multi-processor computers.
We describe the sparse grid combination technique for the classification problem, we discuss the two ways of parallelisation, and we report on the results on a 10 dimensional data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Balder. Adaptive Verfahren für elliptische und parabolische Differentialgleichungen auf dÜnnen Gittern, Dissertation, Technische Universität MÜnchen, 1994.
M. J. A. Berry and G. S. Linoff. Mastering Data Mining, Wiley, 2000.
H.-J. Bungartz. DÜnne Gitter und deren Anwendung bei der adaptiven LÖsung der dreidimensionalen Poisson-Gleichung, Dissertation, Institut für Informatik, Technische Universität MÜnchen, 1992.
K. Cios, W. Pedrycz, and R. Swiniarski. Data Mining Methods for Knowledge Discovery, Kluwer, 1998.
J. Garcke and M. Griebel. Data mining with sparse grids using simplicial basis functions, in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, also as SFB 256 Preprint 713, Universität Bonn, 2001.
J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids, Computing, 2001, (to appear), also as SFB 256 Preprint 675, Institut für Angewandte Mathematik, Universität Bonn, 2000.
F. Girosi. An equivalence between sparse approximation and support vector machines, Neural Computation, 10(6), 1455–1480, 1998.
F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures, Neural Computation, 7, 219–265, 1995.
M. Griebel. The combination technique for the sparse grid solution of PDEs on multiprocessor machines, Parallel Processing Letters, 2(1), 61–70, 1992, also as SFB Bericht 342/14/91 A, Institut für Informatik, TU MÜnchen, 1991.
M. Griebel, W. Huber, T. StÖrtkuhl, and C. Zenger. On the parallel solution of 3D PDEs on a network of workstations and on vector computers, in A. Bode and M. Dal Cin, (eds.), Parallel Computer Architectures: Theory, Hardware, Software, Applications, Lecture Notes in Computer Science, 732, Springer Verlag, 276–291, 1993.
M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems, in P. de Groen and R. Beauwens, (eds.), Iterative Methods in Linear Algebra, IMACS, Elsevier, North Holland, 263–281, 1992, also as SFB Bericht, 342/19/90 A, Institut für Informatik, TU MÜnchen, 1990.
G. Melli. Datgen: A program that creates structured data. Website. http://www.datasetgenerator.com.
A. N. Tikhonov and V. A. Arsenin. Solutions of ill-posed problems, W.H. Winston, Washington D.C., 1977.
G. Wahba. Spline models for observational data, Series in Applied Mathematics, 59, SIAM, Philadelphia, 1990.
C. Zenger. Sparse grids, in W. Hackbusch, (ed.), Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, Kiel, 1990, Notes on Num. Fluid Mech., 31, Vieweg-Verlag, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garcke, J., Griebel, M. (2001). On the Parallelization of the Sparse Grid Approach for Data Mining. In: Margenov, S., Waśniewski, J., Yalamov, P. (eds) Large-Scale Scientific Computing. LSSC 2001. Lecture Notes in Computer Science, vol 2179. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45346-6_2
Download citation
DOI: https://doi.org/10.1007/3-540-45346-6_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43043-8
Online ISBN: 978-3-540-45346-8
eBook Packages: Springer Book Archive