Journals & Magazines >IEEE Transactions on Pattern ... >Volume: 38 Issue: 12

Making Trillion Correlations Feasible in Feature Grouping and Selection

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Today, modern databases with “Big Dimensionality” are experiencing a growing trend. Existing approaches that require the calculations of pairwise feature correlations in ...Show More

Metadata

Abstract:

Today, modern databases with “Big Dimensionality” are experiencing a growing trend. Existing approaches that require the calculations of pairwise feature correlations in their algorithmic designs have scored miserably on such databases, since computing the full correlation matrix (i.e., square of dimensionality in size) is computationally very intensive (i.e., million features would translate to trillion correlations). This poses a notable challenge that has received much lesser attention in the field of machine learning and data mining research. Thus, this paper presents a study to fill in this gap. Our findings on several established databases with big dimensionality across a wide spectrum of domains have indicated that an extremely small portion of the feature pairs contributes significantly to the underlying interactions and there exists feature groups that are highly correlated. Inspired by the intriguing observations, we introduce a novel learning approach that exploits the presence of sparse correlations for the efficient identifications of informative and correlated feature groups from big dimensional data that translates to a reduction in complexity from

$O(m^2n)$ to

$O(m\log m + {\mathcal K}_a mn)$ , where

${\mathcal{K}}_a \ll \textrm {min}(m,n)$ generally holds. In particular, our proposed approach considers an explicit incorporation of linear and nonlinear correlation measures as constraints in the learning model. An efficient embedded feature selection strategy, designed to filter out the large number of non-contributing correlations that could otherwise confuse the classifier while identifying the correlated and informative feature groups, forms one of the highlights of our approach. We also demonstrated the proposed method on one-class learning, where notable speedup can be observed when solving one-class problem on big dimensional data. Further, to identify robust informative features with minimal sampling bias, our feature selection strategy embed...

Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 38, Issue: 12, 01 December 2016)

Page(s): 2472 - 2486

Date of Publication: 25 February 2016

ISSN Information:

PubMed ID: 27824584

DOI: 10.1109/TPAMI.2016.2533384

Funding Agency:

Contents

References is not available for this document.

Making Trillion Correlations Feasible in Feature Grouping and Selection

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Making Trillion Correlations Feasible in Feature Grouping and Selection

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?