Skip to main content
Log in

A Web service search engine for large-scale Web service discovery based on the probabilistic topic modeling and clustering

  • Original Research Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

With the ever increasing number of Web services, discovering an appropriate Web service requested by users has become a vital yet challenging task. We need a scalable and efficient search engine to deal with the large volume of Web services. The aim of this approach is to provide an efficient search engine that can retrieve the most relevant Web services in a short time. The proposed Web service search engine (WSSE) is based on the probabilistic topic modeling and clustering techniques that are integrated to support each other by discovering the semantic meaning of Web services and reducing the search space. The latent Dirichlet allocation (LDA) is used to extract topics from Web service descriptions. These topics are used to group similar Web services together. Each Web service description is represented as a topic vector, so the topic model is an efficient technique to reduce the dimensionality of word vectors and to discover the semantic meaning that is hidden in Web service descriptions. Also, the Web service description is represented as a word vector to address the drawbacks of the keyword-based search system. The accuracy of the proposed WSSE is compared with the keyword-based search system. Also, the precision and recall metrics are used to evaluate the performance of the proposed approach and the keyword-based search system. The results show that the proposed WSSE based on LDA and clustering outperforms the keyword-based search system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Mongodb. https://www.mongodb.com/

  2. programmableweb website. http://www.programmableweb.com/

  3. scikit-learn, machine learning in python. http://scikit-learn.org/stable/

  4. snowball. http://snowball.tartarus.org/

  5. Al-Masri E, Mahmoud QH (2007) Wsce: a crawler engine for large-scale discovery of web services. In: IEEE International conference on Web Services, 2007. ICWS 2007, pp 1104–1111

  6. Aznag M, Quafafou M, Rochd EM, Jarir Z (2013) Service-oriented and cloud computing: second European Conference, ESOCC 2013, Málaga, Spain, September 11–13, 2013. In: Proceedings, chapter probabilistic topic models for Web services clustering and discovery, pp 19–33. Springer, Berlin, Heidelberg, Berlin, Heidelberg

  7. Chen L, Hu L, Zheng Z, Wu J, Yin J, Li Y, Deng S (2011) Wtcluster: Utilizing tags for web services clustering. In: Service-Oriented Computing, pp 204–218

  8. Chen L, Wang Y, Yu Q, Zheng Z, Wu J (2013) Service-oriented computing: 11th International Conference, ICSOC 2013, Berlin, Germany, December 2–5, 2013. In: Proceedings, chapter WT-LDA: user tagging augmented LDA for Web service clustering, . Springer, Berlin, Heidelberg, pp 162–176

  9. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227

    Article  Google Scholar 

  10. Elgazzar K, Hassan A, Martin P (2010) Clustering wsdl documents to bootstrap the discovery of web services. In: IEEE international conference on Web services (ICWS), 2010, pp 147–154

  11. Elshater Y, Elgazzar K, Martin P (2015) Godiscovery: Web service discovery made efficient. In: IEEE International Conference on Web Services (ICWS), 2015, pp 711–716

  12. Fensel D, Kerrigan M, Zaremba M (2008) Implementing semantic web services: the SESA framework, chapter discovery. Springer, Berlin, pp 169–172

    Book  Google Scholar 

  13. Griffiths T (2002) Gibbs sampling in the generative model of latent dirichlet allocation. Technical report

  14. Hatzi O, Batistatos G, Nikolaidou M, Anagnostopoulos D (2012) A specialized search engine for web service discovery. In: IEEE 19th International Conference on Web Services (ICWS), 2012, pp 448–455

  15. Lo W, Yin J, Wu Z (2015) Accelerated sparse learning on tag annotation for web service discovery. In: IEEE international conference on Web services (ICWS), 2015, pp 265–272

  16. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1: statistics, University of California Press, Berkeley, pp 281–297

  17. The Mathworks, Inc. (2015) Natick, Massachusetts. MATLAB version 8.5.0.197613 (R2015a)

  18. McCallum AK (2002) Mallet: a machine learning for language toolkit. http://mallet.cs.umass.edu

  19. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  20. PleplÃl Q, Perplexity to evaluate topic models

  21. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

    Article  MathSciNet  Google Scholar 

  22. Xia Y, Chen P, Bao L, Wang M, Yang J (2011) A qos-aware web service selection algorithm based on clustering. In: 2011 IEEE international conference on Web services (ICWS), pp 428–435

  23. Xie P, Xing EP (2013) Integrating document clustering and topic modeling. CoRR. arxiv:1309.6874

  24. Zhang Y, Zheng Z, Lyu M (2010) Wsexpress: a qos-aware search engine for web services. In: IEEE International Conference on Web services (ICWS), 2010, pp 91–98

  25. Zhou J, Li S (2009) Semantic web service discovery approach using service clustering. In: International conference on information engineering and computer science, ICIECS 2009, pp 1–5

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afnan Bukhari.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bukhari, A., Liu, X. A Web service search engine for large-scale Web service discovery based on the probabilistic topic modeling and clustering. SOCA 12, 169–182 (2018). https://doi.org/10.1007/s11761-018-0232-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-018-0232-6

Keywords

Navigation