Semantics-based Web service classification using morphological analysis and ensemble learning techniques
Semantics-based Web service classification using morphological analysis and ensemble learning techniques
S. Sowmya Kamath 0 1
V. S. Ananthanarayana 0 1
0 Department of Information Technology, National Institute of Technology Karnataka , Surathkal, Mangalore 575 025 , India
1 V. S. Ananthanarayana
With the emergence of the Programmable Web paradigm, the World Wide Web is evolving into a Web of Services, where data and services can be effectively reused across applications. Given the wide diversity and scale of published Web services, the problem of service discovery is a big challenge for service-based application development. This is further compounded by the limited availability of intelligent categorization and service management frameworks. In this paper, an approach that extends service similarity analysis by using morphological analysis and machine learning techniques for capturing the functional semantics of real-world Web services for facilitating effective categorization is presented. To capture the functional diversity of the services, different feature vector selection techniques are used to represent a service in vector space, with the aim of finding the optimal set of features. Using these feature vector models, services are classified as per their domain, using ensemble machine learning methods. Experiments were performed to validate the classification accuracy with respect to the various service feature vector models designed, and the results emphasize the effectiveness of the proposed approach.
Web service classification; Supervised machine learning; Natural language processing (NLP); Semantic analysis; Knowledge discovery
1 Introduction
Service-oriented computing (SOC) is a distributed
computing paradigm that employs fundamental computing entities
called services, as constituent elements in developing
complex business systems [29]. As per SOC concepts, a business
landscape comprised of service-centric applications, called
service-oriented architecture (SOA), allows reorganization
of business applications and infrastructure as a set of reusable
services. In domains such as e-commerce, e-government and
B2B,1 Web services are the most popular way of achieving
service orientation. Web services use the XML2 standard for
encapsulating the data to be exchanged between diverse
business platforms. Further, XML-based protocols are also used
for data transfer (SOAP3) and for describing the service
capabilities (WSDL4). In business ecosystems, most applications
are complex, which means that full service orientation can
help in designing new applications faster, using existing
functionality exposed as services [1]. Hence, the main advantage
of a service-oriented application development is that services
can be exposed as discoverable software components, thus
promoting reusability.
For service-based application development, a designer
either creates new services or tries to find appropriate
existing services for performing the individual tasks as per a
defined business workflow. The process of finding existing
services, capable of performing a particular task, is called
service discovery [14]. Despite considerable research effort in
simplifying this process, service discovery is still
challenging due to primarily keyword-based search for appropriate
1 Business-to-Business Systems.
2 Extensible Markup Language.
3 Simple Object Access Protocol.
4 Web Service Description Language.
services. A unified service registry such as the Universal
Business Registry is no longer available, and Web services
are currently available in some service portals such as
ProgrammableWeb and BioCatalogue or directly from service
providers’ websites [21]. These service portals mostly
provide keyword searching and manual categorization, due to
which finding the most relevant services for a given task
is still challenging. There may be several services already
developed by third-party developers which may be very well
suited for the given task that did not even appear in the search
results due to these issues.
The problem of adding semantics and machine
understanding to Web service capabilities to support automated
dynamic discovery, matchmaking, composition and
recommendation [9] has remained an area of active research
interest. The primary motivation for semanticizing data and
services on the Web is to facilitate seamless interoperation
and knowledge discovery over the Web [2,12]. However, at
present, semantically enhanced published services are very
few and the task of adding semantics to those lacking may
prove to be quite a monumental job, in terms of time and
cost. Therefore, alternate methods that are not dependent on
the immediate availability of semantic markup, but can still
overcome the problems associated with keyword-based
service discovery, are the need of the day.
In this paper, we use different feature vectors selection
techniques to represent a service document in vector space,
with the aim of finding (...truncated)