Skip to main content
Log in

Data mining service recommendation based on dataset features

  • Original Research Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

Quality of service (QoS)-based web service selection has been studied in the service computing community for some time. However, characteristics of the input dataset that is going to be processed by the web service are not usually considered in the selection process, even though they might have impact on QoS values of the service, e.g. latency on processing a bigger dataset is higher than that on a smaller dataset, one service takes longer time to process a certain dataset than another service. To address this issue, in this work, we take into consideration the dataset features in the QoS-based service recommendation process and we focus on data mining services because their QoS values could be highly dependent on dataset features. We propose two approaches for data mining service recommendations and compare their performances. In the first approach, we use a meta-learning algorithm to incorporate dataset features in the recommendation process and study the use of different machine learning algorithms (both classification models and regression models) as meta-learners in recommending data mining services for the given dataset. We also investigate the impact of the number of dataset features on the performance of the meta-learners. In the second approach, we propose a novel technique of using factor analysis for web service recommendation. We use decomposition technique to identify latent features of the input dataset and then recommend services by exploiting these latent variables. Our proposed approach of web service recommendation based on latent features was shown to be a more robust model with an accuracy of 85% compared to meta-feature-based recommendation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Witten I, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Amsterdam

    Google Scholar 

  2. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software: an update. SIGKDD Explor 11:10–18

    Article  Google Scholar 

  3. R Core-Team (2013) R: a language and environment for statistical computing. The R Foundation for Statistical Computing, Vienna

    Google Scholar 

  4. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from http://tensorflow.org

  5. Microsoft, “Azure”. http://azure.microsoft.com/en-us/services/machine-learning/

  6. Rastogi R (2015) Machine learning @ Amazon. Presented at the proceedings of the 2nd IKDD conference on data sciences, Bangalore

  7. Zaharia M, Chowdhury M, Franklin M, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. Presented at the proceedings of the 2nd USENIX conference on hot topics in cloud computing, Boston

  8. Ferrucci DA (2011) IBM’s Watson/DeepQA. In: SIGARCH comput. archit. news, vol 39

    Article  Google Scholar 

  9. Wang Y, Stroulia E (2003) Structural and semantic matching for assessing web-service similarity. In: First international conference, Trento, 2003. Proceedings, pp 194–207

  10. Lemke C, Budka M, Gabrys B (2015) Metalearning: a survey of trends and technologies. Artif Intell Rev 44:117–130

    Article  Google Scholar 

  11. Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, Niculae V et al (2013) API design for machine learning software: experiences from the scikit-learn project

  12. Talia D, Trunfio P, Verta O (2008) The Weka4WS framework for distributed data mining in service-oriented grids. Concurr Comput Pract Exp 20:1933–1951

    Article  Google Scholar 

  13. Kritikos K, Plexousakis D (2009) Mixed-integer programming for QoS-based web service matchmaking. IEEE Trans Serv Comput 2:122–139

    Article  Google Scholar 

  14. Brazdil P, Carrier CG, Soares C (2008) Metalearning: application to data mining. Springer, Berlin

    MATH  Google Scholar 

  15. Martinez W, Martinez A (2008) Computational statistics handbook with MATLAB, 2nd edn. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  16. Ruz-Cortes A (2005) Improving the automatic procurement of web services using constraint programming. Int J Coop Inf Syst, p 439

  17. Hu Y, Peng Q, Hu X, Yang R (2015) Web service recommendation based on time series forecasting and collaborative filtering, pp 233–240

  18. Jain N, Ding C, Liu X (2016) Data-dependent QoS-based service selection. In: Sheng QZ, Stroulia E, Tata S, Bhiri S (eds) Service-oriented computing: 14th international conference, ICSOC 2016, Banff, 10–13 Oct 2016, Proceedings. Springer, Cham, pp 617–625

  19. Chen G (2017) Latent discriminant analysis with representative feature discovery. In: AAAI

  20. Gado NEI, Grall-Maës E, Kharouf M (2017) Linear discriminant analysis based on fast approximate SVD

  21. Dua D, Taniskidou KE (2017) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine

  22. Zhou Y, Wilkinson D, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the netflix prize. In: Proceedings of the 4th international conference on algorithmic aspects in information and management. AAIM’08. Springer, Berlin, pp 337–348

  23. Rahman MS, Ding C, Liu X, Chi C-H (2016) A testbed for collecting QoS data of cloud-based analytic services. In 2016 IEEE 9th international conference on cloud computing, pp 236–243

  24. Sun Q, Pfahringer B (2013) Pairwise meta-rules for better meta-learning-based algorithm ranking. Mach Learn 93:141–161

    Article  MathSciNet  Google Scholar 

  25. GepSoft (2014) Analyzing GeneXproTools models statistically. http://www.gepsoft.com. Accessed 25 Dec 2018

  26. Liu Y, Ngu A, Zeng L (2004) QoS computation and policing in dynamic web service selection. In: Proceedings of the 13th international World Wide Web conference on alternate track papers and posters, pp 66–73

  27. Herssens C, Jureta I, Faulkner S (2008) Dealing with quality tradeoffs during service selection, pp 77–86

  28. The MathWorks, Inc. (2014) MATLAB and statistics toolbox release 2014b

  29. Zheng Z, Ma H, Lyu MR, King I (2009) WSRec: a collaborative filtering based web service recommender system, pp 437–444

  30. Kang G, Liu J, Tang M, Liu X, Cao B, Xu Y (2012) AWSR: active web service recommendation based on usage history. In: 2012 IEEE 19th international conference on web services, Honolulu, pp 186–193

  31. Cao J, Wu Z, Wang Y, Zhuang Y (2013) Hybrid collaborative filtering algorithm for bidirectional web service recommendation. Knowl Inf Syst 36(3):607–627

    Article  Google Scholar 

  32. Chen X, Zheng Z, Yu Q, Lyu MR (2014) Web service recommendation via exploiting location and QoS information. IEEE Trans Parallel Distrib Syst 25(7):1913–1924

    Article  Google Scholar 

  33. Al-Masri E, Mahmoud Q (2007) QoS-based discovery and ranking of web services. IEEE, pp 529–534

  34. Yan J, Piao J (2008) Towards QoS-based web services discovery. In: ICSOC, pp 200–210

  35. Menasc DA, Dubey V (2007) Utility-based QoS brokering in service oriented architectures. IEEE, pp 422–430

  36. Tran V, Tsuji H, Masuda R (2009) A new QoS ontology and its QoS-based ranking algorithm for web services. Simul Model Pract Theory 17:1378–1398

    Article  Google Scholar 

  37. Yu Q, Bouguettaya A (2010) Computing service skyline from uncertain QoWS. IEEE Trans Serv Comput 3:16–29

    Article  Google Scholar 

  38. Skoutas D, Sacharidi D, Simitsis A, Kantere V, Sellis T (2009) Top-k dominant web services under multi-criteria matching. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 898–909

  39. Brazdil P, Soares C, Da Costa J (2003) Ranking learning algorithms-using IBL and meta-learning on accuracy and time results. Mach Learn 50:271–277

    Article  Google Scholar 

  40. de Souto M, Prudencio R, Soares R, Araujo D, Costa I, Ludermir T et al (2008) Ranking and selecting clustering algorithms using a meta-learning approach. In: Neural networks, 2008. IJCNN 2008, pp 3729–3735

  41. Guerra S, Prudencio R, Ludermir TB (2008) Predicting the performance of learning algorithms using support vector machines as meta-regressors. In: ICANN, pp 523–532

  42. Soares R, Ludermir T, Carvalho F (2009) An analysis of meta-learning techniques for ranking clustering algorithms applied to artificial data. In: ICANN, pp 131–140

  43. Handl J (2009) Cluster generators for large high-dimensional data sets with large numbers of clusters. In: ICANN

  44. Ferrari DG, de Castro LN (2012) Clustering algorithm recommendation: a meta-learning approach. In: SEMCCO, pp 143–150

  45. Liu X, Fulia I (2015) Incorporating user, topic, and service related latent factors into web service recommendation. In: 2015 IEEE international conference on web services, New York, pp 185–192

  46. Li S, Wen J, Luo F, Gao M, Zeng J, Dong ZY (2017) A new QoS-aware web service recommendation system based on contextual feature recognition at server-side. IEEE Trans Netw Serv Manag 14(2):332–342

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bayan I. Alghofaily.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alghofaily, B.I., Ding, C. Data mining service recommendation based on dataset features. SOCA 13, 261–277 (2019). https://doi.org/10.1007/s11761-019-00272-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-019-00272-y

Keywords

Navigation