Skip to main content

Scalable and High Performing Learning and Mining in Large-Scale Networked Environments: A State-of-the-art Survey

  • Chapter
Book cover Transactions on Computational Collective Intelligence X

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 7776))

  • 631 Accesses

Abstract

Scalability is a major issue in the application of machine learning and data mining to large-scale networked environments. While there has been important progress in the learnability of models for medium-sized datasets, there is still much challenge in facing large-scale systems. In particular, with the evolution of distributed and networked environments, the complexity of the learning and mining process has now grown due to the possibility to integrating more data in the learning process. This paper provides a survey on the state-of-the-art on the methods and algorithms to enhance scalability of machine learning and data mining for large-scale networked systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, N., Liu, H., Subramanya, S., Salerno, J.J., Yu, P.S.: Connecting Sparsely Distributed Similar Bloggers. In: Proc. of Ninth IEEE International Conference on Data Mining, pp. 11–20 (2009)

    Google Scholar 

  2. Aggarwal, C., Yu, P.: A framework for clustering uncertain data streams. In: Proc. of 24th International Conference on Data Engineering, Cancún, México (2008)

    Google Scholar 

  3. Ang, H.H., Gopalkrishnan, V., Ng, W.K., Hoi, S.: On classifying drifting concepts in P2P networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 24–39. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Bhaduri, K., Srivastava, A.N.: A Local Scalable Distributed Expectation Maximization Algorithm for Large Peer-to-Peer Networks. In: Proc. of Ninth IEEE International Conference on Data Mining, pp. 31–40 (2009)

    Google Scholar 

  5. Bhaduri, K., Das, K., Giannella, C., Mahule, T., Kargupta, H.: Scalable, asynchronous, distributed eigen monitoring of astronomy data streams. Statistical Analysis and Data Mining 4(3), 336–352 (2011)

    Article  MathSciNet  Google Scholar 

  6. Broecheler, M., Shakarian, P., Subrahmanian, V.S.: A Scalable Framework for Modeling Competitive Diffusion in Social Networks. In: Proceedings of the 2010 IEEE Second International Conference on Social Computing, SocialCom / IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2010, pp. 295–302 (2010)

    Google Scholar 

  7. Budhaditya, S., Pham, D., Lazarescu, M., Venkatesh, S.: Effective Anomaly Detection in Sensor Networks Data Streams. In: Proc. of Ninth IEEE International Conference on Data Mining, pp. 722–727 (2009)

    Google Scholar 

  8. Cantoni, V., Lombardi, L., Lombardi, P.: Challenges for Data Mining in Distributed Sensor Networks. In: Proc. of 18th International Conference on Pattern Recognition (ICPR 2006), vol. 1, pp. 1000–1007 (2006)

    Google Scholar 

  9. Chen, T., Zhong, S.: Privacy-preserving backpropagation neural network learning. IEEE Transactions on Neural Networks 20(10), 1554–1564 (2009)

    Article  Google Scholar 

  10. Chum, O., Matas, J.: Large-Scale Discovery of Spatially Related Images. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 371–377 (2010)

    Article  Google Scholar 

  11. Das, S., Egecioglu, O., Abbadi, A.E.: Anonymizing weighted social network graphs. In: Proc. of IEEE 26th International Conference on Data Engineering (ICDE), pp. 904–907 (2010)

    Google Scholar 

  12. Das, S., Matthews, B.L., Srivastava, A.N., Oza, N.C.: Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28. ACM (2010)

    Google Scholar 

  13. Doganay, M.C., Pedersen, T.B., Saygin, Y., Savas, E., Levi, A.: Distributed privacy preserving k-means clustering with additive secret sharing. In: Proc. of the 2008 International Workshop on Privacy and Anonymity in Information Society, Nantes, France, March 29-29 (2008)

    Google Scholar 

  14. Domingos, P.: Mining Social Networks for Viral Marketing. IEEE Intelligent Systems 20(1), 80–82 (2005)

    Article  MathSciNet  Google Scholar 

  15. Domingos, P.: Structured Machine Learning: Ten Problems for the Next Ten Years. Machine Learning 73, 3–23 (2008)

    Article  Google Scholar 

  16. Du, N., Wang, H., Faloutsos, C.: Analysis of large multi-modal social networks: Patterns and a generator. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 393–408. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Dutta, H., Zhu, X., Mahule, T., Kargupta, H., Borne, K., Lauth, C., Holz, F., Heyer, G.: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents. In: Proc. of Ninth IEEE International Conference on Data Mining Workshops, pp. 495–500 (2009)

    Google Scholar 

  18. Ge, Y., Xiong, H., Tuzhilin, A., Xiao, K., Gruteser, M., Pazzani, M.: An energy-efficient mobile recommender system. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA (2010)

    Google Scholar 

  19. Getoor, L., Taskar, B.: Introduction to statistical relational learning. MIT Press (2007)

    Google Scholar 

  20. He, J., Dai, X., Zhao, P.X.: Mixture Model Adaptive Neural Network for Mining Gene Functional Patterns From Heterogenous Knowledge Domains. International Journal of Information Technology and Intelligent Computing (2007)

    Google Scholar 

  21. He, D., Parker, D.S.: Topic dynamics: an alternative model of bursts in streams of topics. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28. ACM (2010)

    Google Scholar 

  22. Hoi, S., Lyu, M.: A Multimodal and Multilevel Ranking Scheme for Large-Scale Video Retrieval. IEEE Transactions on Multimedia 10(4), 607–619 (2008)

    Article  Google Scholar 

  23. Kargupta, H., Sarkar, K., Gilligan, M.: MineFleet®: an overview of a widely adopted distributed vehicle performance data mining system. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA (2010)

    Google Scholar 

  24. Bhaduri, K., Kargupta, H.: An efficient local Algorithm for Distributed Multivariate Regression in Peer-to-Peer Networks. In: Peer-to-Peer Computing, pp. 212–221 (2009)

    Google Scholar 

  25. Das, K., Bhaduri, K., Kargupta, H.: Multi-objective Optimization Based Privacy Preserving Distributed Data Mining in Peer-to-peer Networks. Peer-to-Peer Networking and Applications 4(2), 192–209 (2011)

    Article  MathSciNet  Google Scholar 

  26. Krishnaswamy, S., Loke, S.W., Rakotonirainy, A., Horovitz, O., Gaber, M.M.: Towards Situation-awareness and Ubiquitous Data Mining for Road Safety: Rationale and Architecture for a Compelling Application. In: Proc. of Conference on Intelligent Vehicles and Road Infrastructure (IVRI 2005), University of Melbourne, February 16-17 (2005)

    Google Scholar 

  27. Lahiri, M., Berger-Wolf, T.Y.: Mining Periodic Behavior in Dynamic Social Networks. In: Proc. of the 8th IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, December 15-19. IEEE Computer Society (2008)

    Google Scholar 

  28. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6(1), 29–123 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. Lin, C.X., Zhao, B., Mei, Q., Han, J.: PET: a statistical model for popular events tracking in social communities. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28. ACM (2010)

    Google Scholar 

  30. Liu, B., Li, Z., Yang, L., Wang, M., Tian, X.: Real-Time Video Copy-Location Detection in Large-Scale Repositories. IEEE MultiMedia 18(3), 22–31 (2011)

    Article  Google Scholar 

  31. Liu, K., Kargupta, H., Ryan, J.: Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining. IEEE Transactions on Knowledge and Data Engineering 18(1), 92–106 (2006)

    Article  Google Scholar 

  32. Liu, Y., Choudhary, A.K., Zhou, J., Khokhar, A.: A Scalable Distributed Stream Mining System for Highway Traffic Data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 309–321. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  33. Liu, S., Liu, Y., Ni, L.M., Fan, J., Li, M.: Towards mobility-based clustering. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA (2010)

    Google Scholar 

  34. Liu, K., Terzi, E.: A Framework for Computing the Privacy Scores of Users in Online Social Networks, pp.288-297. In: Proc. of the Ninth IEEE International Conference on Data Mining, pp. 932–937 (2009)

    Google Scholar 

  35. Lodi, S., Monti, G., Moro, G., Sartori, C.: Peer-to-Peer Data Clustering in Self-Organizing Sensor Networks. In: Intelligent Techniques for Warehousing and Mining Sensor Network Data, pp. 179–212. IGI Global (2010)

    Google Scholar 

  36. Luo, D., Huang, H.: Ball Ranking Machines for Content-Based Multimedia Retrieval. In: Walsh, T. (ed.) IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, pp. 1390–1395. IJCAI/AAAI (2011)

    Google Scholar 

  37. Nguyen, T.N., Ngo, T.D., Le Sh, D.: Satoh, B. H. Le, D. A. Duong. An efficient method for face retrieval from large video datasets. In: Li, S., Gao, X., Sebe, N. (eds.) Proceedings of the 9th ACM International Conference on Image and Video Retrieval, CIVR 2010, pp. 382–389. ACM (2010)

    Google Scholar 

  38. Magkos, E., Maragoudakis, M., Chrissikopoulos, V., Gritzalis, S.: Accurate and large-scale privacy-preserving data mining using the election paradigm. Data and Knowledge Engineering 68(11), 1224–1236 (2009)

    Article  Google Scholar 

  39. Marinai, S., Fujisawa, H. (eds.): Machine Learning in Document Analysis and Recognition. SCI, vol. 90. Springer, Heidelberg (2008)

    MATH  Google Scholar 

  40. Maserrat, H., Pei, J.: Neighbor query friendly compression of social networks. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28. ACM (2010)

    Google Scholar 

  41. Morchen, F., Dejori, M., Fradkin, D., Etienne, J., Wachmann, B., Bundschus, M.: Anticipating annotations and emerging trends in biomedical literature. In: Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27. ACM (2008)

    Google Scholar 

  42. Qiu, J., Lin, Z., Tang, C., Qiao, S.: Discovering Organizational Structure in Dynamic Social Network. In: Proc. of the Ninth IEEE International Conference on Data Mining, pp. 932–937 (2009)

    Google Scholar 

  43. Rodrigues, P.P., Gama, J., Lopes, L.: Knowledge Discovery for Sensor Network Comprehension. In: Intelligent Techniques for Warehousing and Mining Sensor Network Data, pp. 179–212. IGI Global (2010)

    Google Scholar 

  44. Römer, K.: Discovery of frequent distributed event patterns in sensor networks. In: Verdone, R. (ed.) EWSN 2008. LNCS, vol. 4913, pp. 106–124. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  45. Romer, K.: Distributed Mining of Spatio-Temporal Event Patterns in Sensor Networks. In: EAWMS / DCOSS 2006, pp. 103–116, San Francisco, USA (June 2006)

    Google Scholar 

  46. Roth, M., Ben-David, A., Deutscher, D., Flysher, G., Horn, I., Leichtberg, A., Leiser, N., Matias, Y., Merom, R.: Suggesting friends using the implicit social graph. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28. ACM (2010)

    Google Scholar 

  47. Saito, K., Kimura, M., Ohara, K., Motoda, H.: Selecting Information Diffusion Models over Social Networks for Behavioral Analysis. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 180–195. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  48. Shang, L., Yang, L., Wang, F., Chan, K., Hua, X.: Real-time large scale near-duplicate web video retrieval. In: ACM Multimedia 2010, pp. 531–540 (2010)

    Google Scholar 

  49. Song, C.: Mining and visualising wireless sensor network data Source. International Journal of Sensor Networks archive 2(5/6), 350–357 (2007)

    Article  Google Scholar 

  50. Sozio, M., Gionis, A.: The community-search problem and how to plan a successful cocktail party. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28. ACM (2010)

    Google Scholar 

  51. Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proc. of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

    Google Scholar 

  52. Xie, L., Yan, R., Yang, J.: Multi-concept learning with large-scale multimedia lexicons. In: Proceedings of the International Conference on Image Processing, ICIP 2008, October 12-15, pp. 2148–2151. IEEE, San Diego (2008)

    Google Scholar 

  53. Yan, Y., Fung, G., Dy, J.G., Rosales, R.: Medical coding classification by leveraging inter-code relationships. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28. ACM (2010)

    Google Scholar 

  54. Yi, X., Zhang, Y.: Privacy-preserving distributed association rule mining via semi-trusted mixer. Data and Knowledge Engineering 63(2), 550–567 (2007)

    Article  Google Scholar 

  55. Ying, Y., Campbell, C., Damoulas, T., Girolami, M.: Class Prediction from Disparate Biological Data Sources Using an Iterative Multi-kernel Algorithm. In: 4th IAPR International Conference on Pattern Recognition in Bioinformatics, Sheffield (2009)

    Google Scholar 

  56. Yu, H., Jianga, X., Vaidya, J.: Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proc. of the 2006 ACM Symposium on Applied Computing, Dijon, France, April 23-27 (2006)

    Google Scholar 

  57. Yan, X., He, B., Zhu, F., Han, J.: Top-K Aggregation Queries Over Large Networks. In: IEEE 26th International Conference on Data Engineering (ICDE), pp. 377–380 (2010)

    Google Scholar 

  58. Zhan, J., Matwin, S., Chang, L.: Privacy-preserving collaborative association rule mining. Journal of Network and Computer Applications 30(3), 1216–1227 (2007)

    Article  Google Scholar 

  59. Zhang, C., Krishnamurthy, A., Wang, R.Y., Singh, J.P.: Combining Flexibility and Scalability in a Peer-to-Peer Publish/Subscribe System. In: Alonso, G. (ed.) Middleware 2005. LNCS, vol. 3790, pp. 102–123. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  60. Zh. Zhao, J., Wang, H., Liu, J.: Ye, Yung Chang. Identifying biologically relevant genes via multiple heterogeneous data sources. In: Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27. ACM (2008)

    Google Scholar 

  61. Zhao, H., Lall, A., Ogihara, M., Jun, X.: Global iceberg detection over distributed data streams. In: Proc. of IEEE 26th International Conference on Data Engineering, ICDE (2010)

    Google Scholar 

  62. Zheng, L., Shen, C., Tang, L., Li, T., Luis, S., Chen, S., Hristidis, V.: Using data mining techniques to address critical information exchange needs in disaster affected public-private networks. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA (2010)

    Google Scholar 

  63. Zhu, Y., Fu, Y., Fu, H.: On privacy in time series data mining. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 479–493. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  64. Wang, Y., Cong, G., Song, G., Xie, K.: Community-based greedy algorithm for mining top-K influential nodes in mobile social networks. In: Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA (2010)

    Google Scholar 

  65. White, B., Yeh, T., Lin, J., Davis, L.: Web-scale computer vision using MapReduce for multimedia data mining. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, MDMKDD 2010, ACM, New York (2010)

    Google Scholar 

  66. Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)

    Google Scholar 

  67. Wurst, M., Morik, K.: Distributed feature extraction in a p2p setting: a case study. Future Generation Computer Systems 23(1), 69–75 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Trandafili, E., Biba, M. (2013). Scalable and High Performing Learning and Mining in Large-Scale Networked Environments: A State-of-the-art Survey. In: Nguyen, NT., Kołodziej, J., Burczyński, T., Biba, M. (eds) Transactions on Computational Collective Intelligence X. Lecture Notes in Computer Science, vol 7776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38496-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38496-7_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38495-0

  • Online ISBN: 978-3-642-38496-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics