skip to main content
10.1145/2339530.2339673acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Unsupervised feature selection for linked social media data

Published:12 August 2012Publication History

ABSTRACT

The prevalent use of social media produces mountains of unlabeled, high-dimensional data. Feature selection has been shown effective in dealing with high-dimensional data for efficient data mining. Feature selection for unlabeled data remains a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, (e.g., part of social media data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper, we study the differences between social media data and traditional attribute-value data, investigate if the relations revealed in linked data can be used to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We perform experiments with real-world social media datasets to evaluate the effectiveness of the proposed framework and probe the working of its key components.

Skip Supplemental Material Section

Supplemental Material

311a_t_talk_8.mp4

mp4

163.9 MB

References

  1. A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. NIPS, 19:41, 2007.Google ScholarGoogle Scholar
  2. S. Boyd and L. Vandenberghe. Convex optimization. Cambridge Univ Pr, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Cai, C. Zhang, and X. He. Unsupervised feature selection for multi-cluster data. In KDD, pages 333--342. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Constantinopoulos, M. Titsias, and A. Likas. Bayesian feature and model selection for gaussian mixture models. TPAMI, pages 1013--1018, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Ding, D. Zhou, X. He, and H. Zha. R 1-pca: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In Proceedings of the 23rd international conference on Machine learning, pages 281--288. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Duda, P. Hart, D. Stork, et al. Pattern classification, volume 2. wiley New York, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dy and C. Brodley. Feature selection for unsupervised learning. Journal of Machine Learning Research, 5:845--889, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. G. Dy and C. E. Brodley. Feature subset selection and order identification for unsupervised learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 247--254, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. G. Dy and C. E. Brodley. Visualization and interactive feature selection for unsupervised data. In KDD, pages 360--364, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. G. Dy, C. E. Brodley, A. C. Kak, L. S. Broderick, and A. M. Aisen. Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(3):373--378, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Erosheva, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5220, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine learning, 46(1):389--422, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of Seventeenth International Conference on Machine Learning (ICML-00). Morgan Kaufmann Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. NIPS, 18:507, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Horn and C. Johnson. Matrix analysis. Cambridge Univ Pr, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection problem. In W. Cohen and H. H., editors, Machine Learning: Proceedings of the Eleventh International Conference, pages 121--129, New Brunswick, N.J., 1994. Rutgers University.Google ScholarGoogle Scholar
  17. Y. Kim, W. Street, and F. Menczer. Feature selection for unsupervised learning via evolutionary search. In KDD, pages 365--369, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Liu and H. Motoda. Computational methods of feature selection. Chapman & Hall, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4):491, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowledge and Data Engineering, 17(3):1--12, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Liu, S. Ji, and J. Ye. Multi-task feature learning via efficient l 2, 1-norm minimization. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 339--348. AUAI Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. U. Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395--416, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Marsden and N. Friedkin. Network studies of social influence. Sociological Methods and Research, 22(1):127--151, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):26113, 2004.Google ScholarGoogle Scholar
  25. F. Nie, H. Huang, X. Cai, and C. Ding. Efficient and robust feature selection via joint l21-norms minimization. NIPS, 2010.Google ScholarGoogle Scholar
  26. H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, pages 1226--1238, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. Roth and T. Lange. Feature selection in clustering problems. NIPS, 16:473--480, 2004.Google ScholarGoogle Scholar
  28. J. Tang, H. Gao, and H. Liu. mtrust: Discerning multi-faceted trust in a connected world. In The ACM international conference on Web search and data mining, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Tang and H. Liu. Feature selection with linked data in social media. In SIAM International Conference on Data Mining, 2012.Google ScholarGoogle Scholar
  30. L. Tang and H. Liu. Relational learning via latent social dimensions. In KDD, pages 817--826. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Wang, L. Tang, H. Gao, and H. Liu. Discovering overlapping groups in social media. In 2010 IEEE International Conference on Data Mining, pages 569--578. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Wolf and A. Shashua. Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weighted-based approach. Journal of Machine Learning Research, 6:1855--1887, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Xiang, J. Neville, and M. Rogati. Modeling relationship strength in online social networks. In Proceedings of the 19th international conference on World wide web, pages 981--990. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Yang, H. Shen, Z. Ma, Z. Huang, and X. Zhou. L21-norm regularized discriminative feature selection for unsupervised learning. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Z. Zhao and H. Liu. Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th international conference on Machine learning, pages 1151--1157. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z. Zhao, L. Wang, and H. Liu. Efficient spectral feature selection with minimum redundancy. In Proceedings of the Twenty-4th AAAI Conference on Artificial Intelligence (AAAI), 2010.Google ScholarGoogle Scholar

Index Terms

  1. Unsupervised feature selection for linked social media data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2012
      1616 pages
      ISBN:9781450314626
      DOI:10.1145/2339530

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader