Three challenges in data mining

Yang, Qiang

doi:10.1007/s11704-010-0102-7

Three challenges in data mining

Review Article
Published: 10 August 2010

Volume 4, pages 324–333, (2010)
Cite this article

Frontiers of Computer Science in China Aims and scope Submit manuscript

Qiang Yang¹

368 Accesses
9 Citations
Explore all metrics

Abstract

In this article, I will discuss three challenges in today’s data mining field. These challenges include: the transfer learning challenge, the social learning challenge and the mobile context mining challenge. I pick these three challenges because I think time is ripe for each of them to be addressed in a major way in the near future, given the current technological and societal readiness to tackle them. I also believe that each of the three challenges discussed in this article will help move the science and engineering of data mining forward, and have a great impact on society.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Caruana R. Multitask learning. Machine Learning, 1997, 28, 41–75
Article Google Scholar
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010 Available at http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.191
Raina R, Ng A Y, Koller D. Constructing informative priors using transfer learning. In: Proceedings of 23rd International Conference on Machine Learning, Carnegie Mellon, Pittsburgh, Pennsylvania. 2006, 713–720
Chapter Google Scholar
Dai W, Xue G, Yang Q, Yu Y. Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA. 2007, 210–219
Dai W, Xue G, Yang Q, Yu Y. Transferring naive Bayes classifiers for text classification. In: Proceedings of the 22rd AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada. 2007, 540–545
Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning. In: Proceedings of the Conference on Empirical Methods in Natural Language, Sydney, Australia. 2006, 120–128
Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic. 2007, 432–439
Pan S J, Ni X, Sun J T, Yang Q, Chen Z. Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW. 2010, 751–760
Wu P, Dietterich T G. Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the 21st International Conference on Machine Learning, Banff, Alberta, Canada. 2004, 871–878
Arnold A, Nallapati R, Cohen W W. A comparative study of methods for transductive transfer learning. In: Proceedings of the 7th IEEE International Conference on Data Mining Workshops, Washington, DC, USA, IEEE Computer Society. 2007, 77–82
Raykar V C, Krishnapuram B, Bi J, Dundar M, Rao R B. Bayesian multiple instance learning: automatic feature selection and inductive transfer. In: Proceedings of the 25th International Conference on Machine learning, Helsinki, Finland. 2008, 808–815
Ling X, Xue G R, Dai W, Jiang Y, Yang Q, Yu Y. Can Chinese web pages be classified with English data source? In: Proceedings of the 17th International Conference onWorldWideWeb, Beijing, China. 2008, 969–978
Yang Q, Chen Y, Xue G R, Dai W, Yu Y. Heterogeneous transfer learning for image clustering via the social Web. In: ACL-IJCNLP (2009). 1–9
Yang Q. Activity recognition: Linking low-level sensors to highlevel intelligence. In: International Joint Conferences on Artificial Intelligence (IJCAI). 2009, 20–25
Pan S J, Shen D, Yang Q, Kwok J T. Transferring localization models across space. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, Illinois, USA. 2008, 1383–1388
Zheng V W, Pan S J, Yang Q, Pan J J. Transferring multi-device localization models using latent multi-task learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, Illinois, USA. 2008, 1427–1432
Su E C Y, Chiu H S, Lo A, Hwang J K, Sung T Y, Hsu W L. Protein subcellular localization prediction based on compartment-specific feature and structure conservation. BMC Bioinformatics, 2007, 8(1): 330–341
Article Google Scholar
Muskal S M, Kim S H. Predicting protein secondary structure content. A tandem neural network approach. Journal of Molecular Biology, 1992, 225(3): 713–727
Article Google Scholar
Zhou G P. An intriguing controversy over protein structural class prediction. Journal of Protein Chemistry, 1998, 17(8): 729–738
Article Google Scholar
Zhou G P, Assa-Munt N. Some insights into protein structural class prediction. Proteins, 2001, 44(1): 57–59
Article Google Scholar
Chou K C. Prediction of protein cellular attributes using pseudoamino acid composition. Proteins, 2001, 43(3): 246–255
Article Google Scholar
Liu W, Chou K C. Prediction of protein secondary structure content. Protein Engineering, 1999, 12(12): 1041–1050
Article Google Scholar
Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research, 1998, 26(9): 2230–2236
Article Google Scholar
Huang Y, Li Y. Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics, 2004, 20(1): 21–28
Article Google Scholar
Yu C S, Lin C J, Hwang J K. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein science: A Publication of the Protein Society, Protein Sci., 2004, 13(5): 1402–1406
Article Google Scholar
Shen H B, Yang J, Chou K C. Euk-PLoc: An ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids, 2007, 33(1): 57–67
Article Google Scholar
Chou K C, Shen H B. Cell-PLoc: A package of Web servers for predicting subcellular localization of proteins in various organisms. Nature Protocols, 2008, 3(2): 153–162
Article Google Scholar
Xu Q, Pan S J, Xue H H, Yang Q. Multitask learning for protein subcellular location prediction. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2010
Wang F-Y, Carley K M, Zeng D, Mao W. Social computing: From social informatics to social intelligence. In: IEEE Intelligent Systems, March/April. 2007, 79–83
Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. JASIST, 2007, 58(7): 1019–1031
Article Google Scholar
Liben-Nowell D, Kleinberg J M. The link prediction problem for social networks. In: ACM Conference on Information and Knowledge Management. 2003, 556–559
Breese J, Heckerman D, Kadie C. Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on Uncertainty in Artificial Intelligence. 1998, 43–52
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. GroupLens: An open architecture for Collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work. 1994, 175–186
Herlocker J, Konstan J A, Riedl J. An empirical analysis of design choices in neighborhood-based collaborative Filtering algorithms. Information Retrieval, 2002, 5(4): 287–310
Article Google Scholar
Sarwar B, Karypis G, Konstan J, Reidl J. Item-based collaborative filtering recommendation algorithms. In: WWW. 2001, 285–295
Han J, Sun Y, Yan Y, Yu P S. Mining knowledge from databases: An information network analysis approach. In: SIGMOD Conference. 2010, 1251–1252
Gruhl D, Guha R V, Liben-Nowell D, Tomkins A. Information diffusion through blogspace. In: WWW. 2004, 491–501
Tang J, Sun J, Wang C, Yang Z. Social influence analysis in largescale networks. In: ACM KDD. 2009, 807–816
Leskovec J, Backstrom L, Kumar R, Tomkins A. Microscopic evolution of social networks. In: ACM KDD. 2008, 462–470
Linden G, Smith B, York J. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 2003, 7(1): 76–80
Article Google Scholar
Goldberg K, Roeder T, Gupta D, Perkins C. Eigentaste: A constant time collaborative filtering algorithm. Information Rretrieval, 2001, 4(2): 133–151
Article MATH Google Scholar
Ma H, King I, Lyu M. Effective missing data prediction for collaborative filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 39–46
Rennie J, Srebro N. Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the 22nd International Conference on Machine Learning. 2005, 713–719
Paterek A. Improving regularized singular value decomposition for collaborative filtering. In: Proceedings of KDD Cup and Workshop. 2007
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. IEEE Computer, 2009, 42(8): 30–37
Google Scholar
Hofmann T. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 2004, 22(1): 89–115
Article Google Scholar
Jin R, Si L, Zhai C, Callan J. Collaborative filtering with decoupled models for preferences and ratings. In: ACM Conference on Information and Knowledge Management. 2003, 309–316
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning. 2007, 791–798
Li B, Yang Q, Xue X. Transfer learning for collaborative filtering via a rating-matrix generative model. In: ICML. 2009, 617–624
Pan W, Xiang E W, Liu N, Yang Q. Transfer learning in collaborative filtering for sparsity reduction. In: Proceedings of the 24rd AAAI Conference on Artificial Intelligence. 2010. To appear
Kittur A, Chi E H, Suh B. Crowdsourcing user studies with Mechanical Turk. In: Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems (2008). CHI’ 08. ACM, New York, NY, 2008, 453–456
Chapter Google Scholar
Das A S, Datar M, Garg A, Rajaram S. Google news personalization: scalable online collaborative filtering. In: Proceedings of WWW. 2007, 271–280
Dean J, Ghemawat S. Mapreduce. Communications of the ACM, 2008, 51(1): 107–113
Article Google Scholar
Yin J, Chai X, Yang Q. High-level goal recognition in a wireless LAN. In: Proceedings of the 19th AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2004, 578–584
Chai X, Yang Q. Multiple-goal recognition from low-level signals. In: Proceedings of the 20 AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2005, 3–8
Hu D H, Yang Q. Cigar: Concurrent and interleaving goal and activity recognition. In: Proceedings of the 23 AAAI Conference on Artificial Intelligence, San Jose, California, USA. 2008, 1715–1720
Yin J, Yang Q, Pan J J. Sensor-based abnormal human-activity detection. IEEE Trans. on Knowl. and Data Eng., 2008, 20(8): 1082–1090
Article Google Scholar
Hu D H, Zhang X X, Yin J, Zheng VW, Yang Q. Abnormal activity recognition based on HDP-HMM models. In: International Joint Conferences on Artificial Intelligence (IJCAI). 2009, 1715–1720
Zheng V W, Zheng Y, Xie X, Yang Q. Collaborative location and activity recommendations with gps history data. In: WWW. 2010, 1029–1038
Zheng V W, Cao B, Zheng Y, Xie X, Yang Q. Collaborative filtering meets mobile recommendation: A user-centered approach. In: Proceedings of the 24rd AAAI Conference on Artificial Intelligence. 2010. To appear
Eagle N. Mobile Phones as Social Sensors. The Handbook of Emergent Technologies in Social Research. Oxford University Press, 2010

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang

Authors

Qiang Yang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Qiang Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Q. Three challenges in data mining. Front. Comput. Sci. China 4, 324–333 (2010). https://doi.org/10.1007/s11704-010-0102-7

Download citation

Received: 12 May 2010
Accepted: 04 June 2010
Published: 10 August 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s11704-010-0102-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Three challenges in data mining

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Challenges and Limitations in Data Mining

A comprehensive survey of data mining

Application of Data Mining Technology Based on Weka in Student Management

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Three challenges in data mining

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Challenges and Limitations in Data Mining

A comprehensive survey of data mining

Application of Data Mining Technology Based on Weka in Student Management

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now