Enhancing web service clustering using Length Feature Weight Method for service description document vector space representation
Introduction
In today’s scenario, various business applications are carried out with the help of web services due to its seamless benefits like data integration, information exchange, code reuse, versatility, cost-saving, etc. Web services are growing at a rapid speed as various vendors like IBM, Amazon, Microsoft, etc. are relying on web services standards and providing tools & software to the customers according to their requirements. Web services are the software components that are based on some standards for communication, data transfer and service description. In general, two types of web services exist: (1) SOAP-based services (2) REST-based services. The power of SOAP-based service lies in XML and three core technologies: UDDI (Universal Description Discovery and Integration), WSDL (Web Service Description Language) and SOAP (Simple Object Access Protocol). Firstly the vendor creates a service and publishes the description file of service in WSDL format in UDDI which is a repository to store web services. WSDL document includes the functionality of service, name, bindings, type of messages, etc. which are required by a customer to have information about that service. The customer lookups web service according to its requirements from UDDI and communicates to the vendor to use its functionality by using SOAP messages as illustrated in Fig. 1 (Bhardwaj & Sharma, 2015). REST-based services (Web APIs) use HTTP for message transmission and URI for identifying resources. The functionality of services is described using XML or simple natural language text.
In recent years, text mining and web mining techniques have gained a lot of attention from the researchers due to the proliferation of large growing data and its management. Only syntactic analysis of web services for matchmaking according to the customer’s needs is not an efficient way. Semantic web services have also evolved for interpreting the functionality and capability of web service in an improved manner. Various other models are proposed by eminent researchers for semantically specifying web services such as Web Service Modeling Language (WSML) (De Bruijn, Lausen, Polleres, & Fensel, 2006), Web service Modeling Ontology (WSMO) (Fensel et al., 2006), Web Service Modeling ontology for semantics (OWL-S) (Martin et al., 2004). But practically it is identified that a lot of web services do not have explicit semantic information in terms of ontological impressions. For manually annotating these web services, one needs to have appropriate knowledge of the domain to which it belongs to. As the bulk of web services and ontologies are created exponentially every day, where each ontology encloses a thousand of concepts and relationships, so it is a tedious and burdensome task to manually find out the appropriate domain and to annotate these massive accessible web services manually (Nisa and Qamar, 2015, Aznag et al., 2013). Taking these limitations into consideration, mainly WSDL documents and Web APIs are preferred by many researchers for text mining techniques and various methods have been proposed for the extraction of semantic meaning from web services.
With the rapid proliferation of web services, there is a need for an intelligent system that can be able to retrieve efficient results for customer’s web services queries. When services are stored in a clustered manner in repositories then they can be efficiently discovered. For web service clustering, firstly preprocessing is to be done on web service documents, and after that web services are represented in vector form so that clusters can be created according to their similarity. For web service representation, mainly TF-IDF (Term Frequency – Inverse Document Frequency) method is preferred by many researchers so that clustering can be carried out (Elshater et al., 2015, Sharma et al., 2014). In WSDL files and Web APIs, services are described in short text form. Due to the lack of frequent terms in service and inability to determine the dissemination of terms across all the services, TF-IDF does not work well for web service representation. For web service discovery and clustering, researchers have tried to enhance the TF-IDF method used for web service representation into vector form. An enhanced method for vector representation of text document has been proposed by the researcher in which terms can be easily discriminated and due to that, the performance of text document clustering is improved (Abualigah, 2019).
The main aim of this paper is to overcome the limitations of TF-IDF method in web service clustering so that the results of clustering algorithms can be improved. In this paper, the LFW+K method is proposed in which Length Feature Weight (LFW) Method is used to determine the most informative term from the service followed by K-Means clustering. This method tries to assign the weight to the term according to the importance and dissemination of that term across the services. The contribution of the paper is to facilitate the performance of web service clustering by enhancing the representation of service in vector space. By achieving this, web services can be efficiently discovered from the large repositories and it will work as an intelligent system that will be expert to handle the service queries.
This paper is organized as follows. Section 2 provides the related work in the domain of web service discovery and clustering using various similarity approaches. Section 3 presents the proposed methodology, defines the web service document, feature extraction, preprocessing steps and clustering method. Section 4 shows the comparative analysis of TF-IDF and LFW method with K-Means clustering. Limitations of the proposed methodology are discussed in Section 5. Section 6 concludes the paper and throws light on future work.
Section snippets
Related work
There are mainly 3 approaches for the discovery of web services from large repositories (Bhardwaj and Sharma, 2015, Bukhari and Liu, 2018).
- i.
Discovery in a directory such as UDDI.
- ii.
Discovery in web portals.
- iii.
Generic search engines
Discovery in UDDI is a traditional approach that is mainly based on keyword searching. It is not an efficient one, as it provides syntactic information only and many UDDI registries are also unavailable in the current era. In the second approach, different web portals are
Proposed methodology
In this section, we describe the proposed methodology i.e. LFW+K which includes the following steps:
- •
The extraction of needed features from WSDL/Web API documents of web services.
- •
Necessary preprocessing steps to remove irrelevant features from extracted features.
- •
LFW method for representing preprocessed features of web service documents in the vector space.
- •
K-Means clustering to group similar web services.
Experiment setup and results
We have performed this experiment in windows 10 environment on a machine with an i7 processor and 8 GB RAM. Python 3.7 is used to perform our proposed methodology i.e. LFW+K discussed in Section 3. For representing web services or text documents in vector space, TF-IDF and LFW method are used and the effectiveness of these methods is tested by using the K-Means Clustering algorithm. For the execution of the K-Means Clustering algorithm, we have taken 10000 iterations and the algorithm is
Limitations
From results, it is proved that our proposed methodology is providing improvement in terms of clustering performance. By considering the dimension of service document, frequency of terms in other documents, and the highest frequency of a term in the document, we are able to efficiently represent services in vector space. The main limitation of the proposed work is that this approach is not able to find semantic relations among the words. It can not determine the synonym, antonyms, etc.
Conclusion and future work
In this paper, a better technique for vector representation of web service description documents is applied which overcomes the shortcomings of basic method i.e. TF-IDF. After vector space representation, the performance of the K-Means clustering approach is analyzed. Web service clustering is a challenging task in today’s scenario due to the rapid increase in the number of web services provided by different vendors. Results prove that our proposed method i.e. LFW+K has enhanced the performance
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (29)
- et al.
How much can k-means be improved by using better initialization and repeats?
Pattern Recognition
(2019) - et al.
Multi-co-training for document classification using various document representations: Tf–idf, lda, and doc2vec
Information Sciences
(2019) - et al.
Enhancing service discovery using cat swarm optimisation based web service clustering
Perspectives in Science
(2016) - et al.
Discovering web services in social web service repositories using deep variational autoencoders
Information Processing & Management
(2020) - et al.
A comparative study of tf* idf, lsi and multi-words for text classification
Expert Systems with Applications
(2011) Feature selection and enhanced krill herd algorithm for text document clustering
(2019)- et al.
Probabilistic topic models for web services clustering and discovery
- et al.
Machine learning in efficient and effective web service discovery
Journal of Web Engineering
(2015) - et al.
A web service search engine for large-scale web service discovery based on the probabilistic topic modeling and clustering
Service Oriented Computing and Applications
(2018) - et al.
The web service modeling language wsml: An overview
Clustering wsdl documents to bootstrap the discovery of web services
godiscovery: Web service discovery made efficient
Enabling semantic web services: The web service modeling ontology
Hierarchical clustering based web service discovery
Cited by (29)
Web service embedding: Representing the invocation association between services with practical-valued vectors
2024, Expert Systems with ApplicationsA service composition evolution method that combines deep clustering and a service requirement context model
2023, Expert Systems with ApplicationsWGSDMM+GA: A genetic algorithm-based service clustering methodology assimilating dirichlet multinomial mixture model with word embedding
2023, Future Generation Computer SystemsA semantic matching approach addressing multidimensional representations for web service discovery
2022, Expert Systems with ApplicationsCitation Excerpt :Descriptions can vary from simple text to large and sophisticated semantic descriptions based on ontologies (Jordy et al., 2013). Specifically, for a syntactic description language, the web service description language (WSDL) is one of the most representative systems (Agarwal et al., 2020). It can describe services in terms of what they do and how they are invoked, and it provides syntactic functional information through low-level message-exchanging descriptions (Renzis et al., 2017).
Word embedding dimensionality reduction using dynamic variance thresholding (DyVaT)
2022, Expert Systems with ApplicationsA systematic literature review on web service clustering approaches to enhance service discovery, selection and recommendation
2022, Computer Science ReviewCitation Excerpt :In paper [81], K-Means and improved fuzzy with KNN algorithm is proposed for efficient web service discovery. A novel method LFW+K (Length Feature Weight with K-Means clustering) is proposed for vector space representation of services [82]. This method has tried to overcome the limitations of TF-IDF approach.