Skip to main content
Log in

Three challenges in data mining

  • Review Article
  • Published:
Frontiers of Computer Science in China Aims and scope Submit manuscript

Abstract

In this article, I will discuss three challenges in today’s data mining field. These challenges include: the transfer learning challenge, the social learning challenge and the mobile context mining challenge. I pick these three challenges because I think time is ripe for each of them to be addressed in a major way in the near future, given the current technological and societal readiness to tackle them. I also believe that each of the three challenges discussed in this article will help move the science and engineering of data mining forward, and have a great impact on society.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Caruana R. Multitask learning. Machine Learning, 1997, 28, 41–75

    Article  Google Scholar 

  2. Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010 Available at http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.191

  3. Raina R, Ng A Y, Koller D. Constructing informative priors using transfer learning. In: Proceedings of 23rd International Conference on Machine Learning, Carnegie Mellon, Pittsburgh, Pennsylvania. 2006, 713–720

    Chapter  Google Scholar 

  4. Dai W, Xue G, Yang Q, Yu Y. Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA. 2007, 210–219

  5. Dai W, Xue G, Yang Q, Yu Y. Transferring naive Bayes classifiers for text classification. In: Proceedings of the 22rd AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada. 2007, 540–545

  6. Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning. In: Proceedings of the Conference on Empirical Methods in Natural Language, Sydney, Australia. 2006, 120–128

  7. Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic. 2007, 432–439

  8. Pan S J, Ni X, Sun J T, Yang Q, Chen Z. Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW. 2010, 751–760

  9. Wu P, Dietterich T G. Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the 21st International Conference on Machine Learning, Banff, Alberta, Canada. 2004, 871–878

  10. Arnold A, Nallapati R, Cohen W W. A comparative study of methods for transductive transfer learning. In: Proceedings of the 7th IEEE International Conference on Data Mining Workshops, Washington, DC, USA, IEEE Computer Society. 2007, 77–82

  11. Raykar V C, Krishnapuram B, Bi J, Dundar M, Rao R B. Bayesian multiple instance learning: automatic feature selection and inductive transfer. In: Proceedings of the 25th International Conference on Machine learning, Helsinki, Finland. 2008, 808–815

  12. Ling X, Xue G R, Dai W, Jiang Y, Yang Q, Yu Y. Can Chinese web pages be classified with English data source? In: Proceedings of the 17th International Conference onWorldWideWeb, Beijing, China. 2008, 969–978

  13. Yang Q, Chen Y, Xue G R, Dai W, Yu Y. Heterogeneous transfer learning for image clustering via the social Web. In: ACL-IJCNLP (2009). 1–9

  14. Yang Q. Activity recognition: Linking low-level sensors to highlevel intelligence. In: International Joint Conferences on Artificial Intelligence (IJCAI). 2009, 20–25

  15. Pan S J, Shen D, Yang Q, Kwok J T. Transferring localization models across space. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, Illinois, USA. 2008, 1383–1388

  16. Zheng V W, Pan S J, Yang Q, Pan J J. Transferring multi-device localization models using latent multi-task learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, Illinois, USA. 2008, 1427–1432

  17. Su E C Y, Chiu H S, Lo A, Hwang J K, Sung T Y, Hsu W L. Protein subcellular localization prediction based on compartment-specific feature and structure conservation. BMC Bioinformatics, 2007, 8(1): 330–341

    Article  Google Scholar 

  18. Muskal S M, Kim S H. Predicting protein secondary structure content. A tandem neural network approach. Journal of Molecular Biology, 1992, 225(3): 713–727

    Article  Google Scholar 

  19. Zhou G P. An intriguing controversy over protein structural class prediction. Journal of Protein Chemistry, 1998, 17(8): 729–738

    Article  Google Scholar 

  20. Zhou G P, Assa-Munt N. Some insights into protein structural class prediction. Proteins, 2001, 44(1): 57–59

    Article  Google Scholar 

  21. Chou K C. Prediction of protein cellular attributes using pseudoamino acid composition. Proteins, 2001, 43(3): 246–255

    Article  Google Scholar 

  22. Liu W, Chou K C. Prediction of protein secondary structure content. Protein Engineering, 1999, 12(12): 1041–1050

    Article  Google Scholar 

  23. Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research, 1998, 26(9): 2230–2236

    Article  Google Scholar 

  24. Huang Y, Li Y. Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics, 2004, 20(1): 21–28

    Article  Google Scholar 

  25. Yu C S, Lin C J, Hwang J K. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein science: A Publication of the Protein Society, Protein Sci., 2004, 13(5): 1402–1406

    Article  Google Scholar 

  26. Shen H B, Yang J, Chou K C. Euk-PLoc: An ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids, 2007, 33(1): 57–67

    Article  Google Scholar 

  27. Chou K C, Shen H B. Cell-PLoc: A package of Web servers for predicting subcellular localization of proteins in various organisms. Nature Protocols, 2008, 3(2): 153–162

    Article  Google Scholar 

  28. Xu Q, Pan S J, Xue H H, Yang Q. Multitask learning for protein subcellular location prediction. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2010

  29. Wang F-Y, Carley K M, Zeng D, Mao W. Social computing: From social informatics to social intelligence. In: IEEE Intelligent Systems, March/April. 2007, 79–83

  30. Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. JASIST, 2007, 58(7): 1019–1031

    Article  Google Scholar 

  31. Liben-Nowell D, Kleinberg J M. The link prediction problem for social networks. In: ACM Conference on Information and Knowledge Management. 2003, 556–559

  32. Breese J, Heckerman D, Kadie C. Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on Uncertainty in Artificial Intelligence. 1998, 43–52

  33. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. GroupLens: An open architecture for Collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work. 1994, 175–186

  34. Herlocker J, Konstan J A, Riedl J. An empirical analysis of design choices in neighborhood-based collaborative Filtering algorithms. Information Retrieval, 2002, 5(4): 287–310

    Article  Google Scholar 

  35. Sarwar B, Karypis G, Konstan J, Reidl J. Item-based collaborative filtering recommendation algorithms. In: WWW. 2001, 285–295

  36. Han J, Sun Y, Yan Y, Yu P S. Mining knowledge from databases: An information network analysis approach. In: SIGMOD Conference. 2010, 1251–1252

  37. Gruhl D, Guha R V, Liben-Nowell D, Tomkins A. Information diffusion through blogspace. In: WWW. 2004, 491–501

  38. Tang J, Sun J, Wang C, Yang Z. Social influence analysis in largescale networks. In: ACM KDD. 2009, 807–816

  39. Leskovec J, Backstrom L, Kumar R, Tomkins A. Microscopic evolution of social networks. In: ACM KDD. 2008, 462–470

  40. Linden G, Smith B, York J. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 2003, 7(1): 76–80

    Article  Google Scholar 

  41. Goldberg K, Roeder T, Gupta D, Perkins C. Eigentaste: A constant time collaborative filtering algorithm. Information Rretrieval, 2001, 4(2): 133–151

    Article  MATH  Google Scholar 

  42. Ma H, King I, Lyu M. Effective missing data prediction for collaborative filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 39–46

  43. Rennie J, Srebro N. Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the 22nd International Conference on Machine Learning. 2005, 713–719

  44. Paterek A. Improving regularized singular value decomposition for collaborative filtering. In: Proceedings of KDD Cup and Workshop. 2007

  45. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. IEEE Computer, 2009, 42(8): 30–37

    Google Scholar 

  46. Hofmann T. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 2004, 22(1): 89–115

    Article  Google Scholar 

  47. Jin R, Si L, Zhai C, Callan J. Collaborative filtering with decoupled models for preferences and ratings. In: ACM Conference on Information and Knowledge Management. 2003, 309–316

  48. Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning. 2007, 791–798

  49. Li B, Yang Q, Xue X. Transfer learning for collaborative filtering via a rating-matrix generative model. In: ICML. 2009, 617–624

  50. Pan W, Xiang E W, Liu N, Yang Q. Transfer learning in collaborative filtering for sparsity reduction. In: Proceedings of the 24rd AAAI Conference on Artificial Intelligence. 2010. To appear

  51. Kittur A, Chi E H, Suh B. Crowdsourcing user studies with Mechanical Turk. In: Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems (2008). CHI’ 08. ACM, New York, NY, 2008, 453–456

    Chapter  Google Scholar 

  52. Das A S, Datar M, Garg A, Rajaram S. Google news personalization: scalable online collaborative filtering. In: Proceedings of WWW. 2007, 271–280

  53. Dean J, Ghemawat S. Mapreduce. Communications of the ACM, 2008, 51(1): 107–113

    Article  Google Scholar 

  54. Yin J, Chai X, Yang Q. High-level goal recognition in a wireless LAN. In: Proceedings of the 19th AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2004, 578–584

  55. Chai X, Yang Q. Multiple-goal recognition from low-level signals. In: Proceedings of the 20 AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2005, 3–8

  56. Hu D H, Yang Q. Cigar: Concurrent and interleaving goal and activity recognition. In: Proceedings of the 23 AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2008, 1715–1720

  57. Yin J, Yang Q, Pan J J. Sensor-based abnormal human-activity detection. IEEE Trans. on Knowl. and Data Eng., 2008, 20(8): 1082–1090

    Article  Google Scholar 

  58. Hu D H, Zhang X X, Yin J, Zheng VW, Yang Q. Abnormal activity recognition based on HDP-HMM models. In: International Joint Conferences on Artificial Intelligence (IJCAI). 2009, 1715–1720

  59. Zheng V W, Zheng Y, Xie X, Yang Q. Collaborative location and activity recommendations with gps history data. In: WWW. 2010, 1029–1038

  60. Zheng V W, Cao B, Zheng Y, Xie X, Yang Q. Collaborative filtering meets mobile recommendation: A user-centered approach. In: Proceedings of the 24rd AAAI Conference on Artificial Intelligence. 2010. To appear

  61. Eagle N. Mobile Phones as Social Sensors. The Handbook of Emergent Technologies in Social Research. Oxford University Press, 2010

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Q. Three challenges in data mining. Front. Comput. Sci. China 4, 324–333 (2010). https://doi.org/10.1007/s11704-010-0102-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-010-0102-7

Keywords

Navigation