Abstract:
It has long been an important issue in various disciplines to examine massive multidimensional data superimposed by a high level of noises and interferences by extracting...Show MoreMetadata
Abstract:
It has long been an important issue in various disciplines to examine massive multidimensional data superimposed by a high level of noises and interferences by extracting the embedded multi-way factors. With the quick increases of data scales and dimensions in the big data era, research challenges arise in order to (1) reflect the dynamics of large tensors while introducing no significant distortions in the factorization procedure and (2) handle influences of the noises in sophisticated applications. A hierarchical parallel processing framework over a GPU cluster, namely H-PARAFAC, has been developed to enable scalable factorization of large tensors upon a “divide-and-conquer” theory for Parallel Factor Analysis (PARAFAC). The H-PARAFAC framework incorporates a coarse-grained model for coordinating the processing of sub-tensors and a fine-grained parallel model for computing each sub-tensor and fusing sub-factors. Experimental results indicate that (1) the proposed method breaks the limitation on the scale of multidimensional data to be factorized and dramatically outperforms the traditional counterparts in terms of both scalability and efficiency, e.g., the runtime increases in the order of n2 when the data volume increases in the order of n3, (2) H-PARAFAC has potentials in refraining the influences of significant noises, and (3) H-PARAFAC is far superior to the conventional window-based counterparts in preserving the features of multiple modes of large tensors.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 28, Issue: 4, 01 April 2017)