research-article

Unsupervised feature selection for linked social media data

Authors:

Huan LiuAuthors Info & Claims

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 904 - 912

https://doi.org/10.1145/2339530.2339673

Published: 12 August 2012 Publication History

Abstract

The prevalent use of social media produces mountains of unlabeled, high-dimensional data. Feature selection has been shown effective in dealing with high-dimensional data for efficient data mining. Feature selection for unlabeled data remains a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, (e.g., part of social media data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper, we study the differences between social media data and traditional attribute-value data, investigate if the relations revealed in linked data can be used to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We perform experiments with real-world social media datasets to evaluate the effectiveness of the proposed framework and probe the working of its key components.

Supplementary Material

JPG File (311a_t_talk_8.jpg)

Download
16.51 KB

MP4 File (311a_t_talk_8.mp4)

Download
163.94 MB

References

[1]

A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. NIPS, 19:41, 2007.

[2]

S. Boyd and L. Vandenberghe. Convex optimization. Cambridge Univ Pr, 2004.

Digital Library

[3]

D. Cai, C. Zhang, and X. He. Unsupervised feature selection for multi-cluster data. In KDD, pages 333--342. ACM, 2010.

Digital Library

[4]

C. Constantinopoulos, M. Titsias, and A. Likas. Bayesian feature and model selection for gaussian mixture models. TPAMI, pages 1013--1018, 2006.

Digital Library

[5]

C. Ding, D. Zhou, X. He, and H. Zha. R 1-pca: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In Proceedings of the 23rd international conference on Machine learning, pages 281--288. ACM, 2006.

Digital Library

[6]

R. Duda, P. Hart, D. Stork, et al. Pattern classification, volume 2. wiley New York, 2001.

Digital Library

[7]

J. Dy and C. Brodley. Feature selection for unsupervised learning. Journal of Machine Learning Research, 5:845--889, 2004.

Digital Library

[8]

J. G. Dy and C. E. Brodley. Feature subset selection and order identification for unsupervised learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 247--254, 2000.

Digital Library

[9]

J. G. Dy and C. E. Brodley. Visualization and interactive feature selection for unsupervised data. In KDD, pages 360--364, 2000.

Digital Library

[10]

J. G. Dy, C. E. Brodley, A. C. Kak, L. S. Broderick, and A. M. Aisen. Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(3):373--378, 2003.

Digital Library

[11]

E. Erosheva, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5220, 2004.

[12]

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine learning, 46(1):389--422, 2002.

Digital Library

[13]

M. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of Seventeenth International Conference on Machine Learning (ICML-00). Morgan Kaufmann Publishers, 2000.

Digital Library

[14]

X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. NIPS, 18:507, 2006.

Digital Library

[15]

R. Horn and C. Johnson. Matrix analysis. Cambridge Univ Pr, 1990.

Digital Library

[16]

G. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection problem. In W. Cohen and H. H., editors, Machine Learning: Proceedings of the Eleventh International Conference, pages 121--129, New Brunswick, N.J., 1994. Rutgers University.

[17]

Y. Kim, W. Street, and F. Menczer. Feature selection for unsupervised learning via evolutionary search. In KDD, pages 365--369, 2000.

Digital Library

[18]

H. Liu and H. Motoda. Computational methods of feature selection. Chapman & Hall, 2008.

Digital Library

[19]

H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4):491, 2005.

Digital Library

[20]

H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowledge and Data Engineering, 17(3):1--12, 2005.

Digital Library

[21]

J. Liu, S. Ji, and J. Ye. Multi-task feature learning via efficient l 2, 1-norm minimization. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 339--348. AUAI Press, 2009.

Digital Library

[22]

U. Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395--416, 2007.

Digital Library

[23]

P. Marsden and N. Friedkin. Network studies of social influence. Sociological Methods and Research, 22(1):127--151, 1993.

[24]

M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):26113, 2004.

[25]

F. Nie, H. Huang, X. Cai, and C. Ding. Efficient and robust feature selection via joint l21-norms minimization. NIPS, 2010.

[26]

H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, pages 1226--1238, 2005.

Digital Library

[27]

V. Roth and T. Lange. Feature selection in clustering problems. NIPS, 16:473--480, 2004.

[28]

J. Tang, H. Gao, and H. Liu. mtrust: Discerning multi-faceted trust in a connected world. In The ACM international conference on Web search and data mining, 2012.

Digital Library

[29]

J. Tang and H. Liu. Feature selection with linked data in social media. In SIAM International Conference on Data Mining, 2012.

[30]

L. Tang and H. Liu. Relational learning via latent social dimensions. In KDD, pages 817--826. ACM, 2009.

Digital Library

[31]

X. Wang, L. Tang, H. Gao, and H. Liu. Discovering overlapping groups in social media. In 2010 IEEE International Conference on Data Mining, pages 569--578. IEEE, 2010.

Digital Library

[32]

L. Wolf and A. Shashua. Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach. Journal of Machine Learning Research, 6:1855--1887, 2005.

Digital Library

[33]

R. Xiang, J. Neville, and M. Rogati. Modeling relationship strength in online social networks. In Proceedings of the 19th international conference on World wide web, pages 981--990. ACM, 2010.

Digital Library

[34]

Y. Yang, H. Shen, Z. Ma, Z. Huang, and X. Zhou. L21-norm regularized discriminative feature selection for unsupervised learning. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2011.

Digital Library

[35]

Z. Zhao and H. Liu. Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th international conference on Machine learning, pages 1151--1157. ACM, 2007.

Digital Library

[36]

Z. Zhao, L. Wang, and H. Liu. Efficient spectral feature selection with minimum redundancy. In Proceedings of the Twenty-4th AAAI Conference on Artificial Intelligence (AAAI), 2010.

Cited By

Zhu JHuang JYang LLi Q(2024)Smoothing algorithms for nonsmooth optimization over the Stiefel manifold with applications to the graph Fourier basis problemAnalysis and Applications10.1142/S021953052450008822:05(937-964)Online publication date: 9-Mar-2024
https://doi.org/10.1142/S0219530524500088
Chen SMa SMan-Cho So AZhang T(2024)Nonsmooth Optimization over the Stiefel Manifold and Beyond: Proximal Gradient Method and Recent VariantsSIAM Review10.1137/24M162857866:2(319-352)Online publication date: 9-May-2024
https://doi.org/10.1137/24M1628578
Bhuyan HRavi V(2023)Analysis of Subfeature for Classification in Data MiningIEEE Transactions on Engineering Management10.1109/TEM.2021.309846370:8(2732-2746)Online publication date: Aug-2023
https://doi.org/10.1109/TEM.2021.3098463
Show More Cited By

Index Terms

Unsupervised feature selection for linked social media data
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

Unsupervised Streaming Feature Selection in Social Media
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

The explosive growth of social media sites brings about massive amounts of high-dimensional data. Feature selection is effective in preparing high-dimensional data for data analytics. The characteristics of social media present novel challenges for ...
Feature Selection for Social Media Data

Feature selection is widely used in preparing high-dimensional data for effective data mining. The explosive popularity of social media produces massive and high-dimensional data at an unprecedented rate, presenting new challenges to feature selection. ...
Adaptive Graph Fusion for Unsupervised Feature Selection
Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning
Abstract
The massive high-dimensional data brings about great time complexity, high storage burden and poor generalization ability of learning models. Feature selection can alleviate curse of dimensionality by selecting a subset of features. Unsupervised ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2012

1616 pages

ISBN:9781450314626

DOI:10.1145/2339530

General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '12

Sponsor:

KDD '12: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 12 - 16, 2012

Beijing, China

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

110
Total Citations
View Citations
1,356
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)5

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhu JHuang JYang LLi Q(2024)Smoothing algorithms for nonsmooth optimization over the Stiefel manifold with applications to the graph Fourier basis problemAnalysis and Applications10.1142/S021953052450008822:05(937-964)Online publication date: 9-Mar-2024
https://doi.org/10.1142/S0219530524500088
Chen SMa SMan-Cho So AZhang T(2024)Nonsmooth Optimization over the Stiefel Manifold and Beyond: Proximal Gradient Method and Recent VariantsSIAM Review10.1137/24M162857866:2(319-352)Online publication date: 9-May-2024
https://doi.org/10.1137/24M1628578
Bhuyan HRavi V(2023)Analysis of Subfeature for Classification in Data MiningIEEE Transactions on Engineering Management10.1109/TEM.2021.309846370:8(2732-2746)Online publication date: Aug-2023
https://doi.org/10.1109/TEM.2021.3098463
Tamilselvan RPrabhu ARajagopal R(2023)An Enhanced K‐Means Algorithm for Large Data Clustering in Social Media NetworksArtificial Intelligence for Sustainable Applications10.1002/9781394175253.ch9(147-162)Online publication date: 5-Sep-2023
https://doi.org/10.1002/9781394175253.ch9
Qiao JLiu YKong L(2022)GNN-Detective: Efficient Weakly Correlated Neighbors Distinguishing and Processing in GNN2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892051(1-8)Online publication date: 18-Jul-2022
https://doi.org/10.1109/IJCNN55064.2022.9892051
Matthews MMatthews SWang DKelemen T(2022)Tweet, like, subscribe! Understanding leadership through social media useThe Leadership Quarterly10.1016/j.leaqua.2021.10158033:1(101580)Online publication date: Feb-2022
https://doi.org/10.1016/j.leaqua.2021.101580
Waikhom LPatgiri R(2022)A survey of graph neural networks in various learning paradigms: methods, applications, and challengesArtificial Intelligence Review10.1007/s10462-022-10321-256:7(6295-6364)Online publication date: 23-Nov-2022
https://dl.acm.org/doi/10.1007/s10462-022-10321-2
Ma YTang J(2021)Deep Learning on Graphs10.1017/9781108924184Online publication date: 2-Sep-2021
https://doi.org/10.1017/9781108924184
Sun XYu YLiang YDong JPlant CBöhm C(2021)Fusing attributed and topological global-relations for network embeddingInformation Sciences10.1016/j.ins.2021.01.012558(76-90)Online publication date: May-2021
https://doi.org/10.1016/j.ins.2021.01.012
Feng SHuang WSong LYing SZeng T(2021)Proximal gradient method for nonconvex and nonsmooth optimization on Hadamard manifoldsOptimization Letters10.1007/s11590-021-01822-016:8(2277-2297)Online publication date: 6-Nov-2021
https://doi.org/10.1007/s11590-021-01822-0
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten