Research Article
Web Service Matching Based on Natural Semantic Annotation
Laiwu Vocational and Technical College, Laiwu 271100, China
Yingming Li
Laiwu Vocational and Technical College, Laiwu 271100, China
Semantic similarity is very important for the natural language processing and retrieval and has been widely used in many domains, such as text block, image retrieval, automatic hyperlinks. The semantic similarity of natural words is usually obtained from the similarity of their concepts (Li et al., 2003). Many Web service discovery methods based on semantics are feasible and effective (Ankolekar et al., 2002; Wang and Stroulia, 2003), especially there is the semantic annotation method of Web services. From the point of view of practical application, the semantic information is enhanced in the services with semantics annotation (Oliva et al., 2011).
WSDL(Web Services Description Language) accords with the W3C standard and is used to describe the Web service interface information (Daniela et al., 2005). A WSDL file contains the following elements: Type, Message, Part, Operation, Port Type/Interface, Binding, Port/Endpoint and Service. WSDL has been widely used, but it is lack of semantic information. To deal with this problem, a WSDL-NS (WSDL with Natural Semantics), an expanded WSDL, is proposed based on the natural semantic annotation in this paper. Also, a computing method of semantic similarity is improved and the matching services are found by computing the service similarity. Finally, a simulation example is shown to illustrate the correctness and effectiveness of the proposed method.
A NEW NATURAL SEMANTIC SIMILARITY
Word Net (McHale, 1998) is a set of English vocabulary and a network is structured by it according to the semantics of words. A lot of semantic work is researched by taking it as a data base. The data in Word Net may be used as an objective reflection of a natural language, since the formation and evolution of a natural language are a long process.
The words in Word Net are organized as a tree. The semantic similarity between two concepts is affected from two major factors. One factor is the minimum distance between the concepts, represented by l. The other one is the depth of the minimum public concept, represented by h. Obviously, the minimum distance between the concepts is equal to a sum of their distance to the minimum public concept, respectively. Reference (Li et al., 2003) uses formula (1) to calculate the similarity between two concepts and it is optimal when α = 0.2 and β = 0.6:
(1) |
The similarity of some word pairs is shown in Table 1.
However, the similarity of concepts from the minimum distance needs to be improved. For example, for concepts C1 and C2, if the distances from them to the minimum public concept are l1 and l2, respectively, then their minimum distance is l1+l2.
Table 1: | Similarity calculated by formula (1) |
Table 2: | Similarity calculated by formula (2) |
Although l1 and l2 can change, if l1+l2 is not changed, the similarity calculated by the method is equal. With the increase of the concept depth, the similarity is not changed linearly. Thus the formula (1) is not reflected from this kind of change. Therefore, this method is expanded as follows. The change of l1 and l2 is taken into consideration and a new formula (2) is obtained:
(2) |
where, ℓ = γ . ((4l1 . l2 /(l1+l2)2)-0.5)+l, 0≤γ≤1, l1≠0 and l2≠0.
The distances from each word pair to the minimum public concept are shown in Table 2 and the new similarity calculated by formula (2) is given. γ is a discrete parameter. The formulae (1) and (2) are equivalent if γ = 0.
NATURAL SEMANTIC ANNOTATION OF WEB SERVICES
There are various kind of methods to add semantics to Web services. Some semantic service representation languages (Klusch et al., 2009) are proposed, such as OWL-S, WSMO and METEOR. However, these languages are represented based on some kind of semantic description framework of Web services (Tocha et al., 2011). They all more or less abandon intrinsic semantic information of natural language. However, natural semantic annotation of Web services is represented based on natural language and does not require users to learn some semantic description framework specially. Thus it can promote the application of semantic Web.
An annotated Web service can be represented as follows:
S = (L,s)
where, S is the Web service with natural semantic annotation. l is the annotation information of natural language in Web service. s is the static Web service without semanteme.
"verb + noun" mode is used to add natural semantics in static Web services, such as "search+ticket", "book+hotel". Therefore, natural language annotation information can be described formally as follows:
L = (n,v)
where, n represents noun and v represents verb. The similarity of any two annotation information L1(n1,v1) and L2 (n2,v2) can be represented in the following:
sim = sim (L1, L2) = (sim (n), sim (v))
sim (n) = sim (n1, n2) represents the similarity between n1 and n2. sim (v) = sim (v1, v2) represents the similarity between v1 and v2.
when, n1 and v1 are the father concepts of n2 and v2, respectively, L1 (n1, v1) is the father annotation of L2 (n2, v2) and L2 (n1, v1) is the son annotation of L1 (n2, v2). For two similarities of annotation information, sim1 = (sim (n1), sim (v1)) and sim2 = (sim (n2), sim (v2)), sim1 = sim2 if and only if sim (n1) = sim (n2) and sim (v1) = sim (v2).
Some parameters of Web services can also be annotated, such as the pre-condition of invoking services (p), service input (i), service output (o) and service effect (e). The natural language annotation information Ls of Web services can be represented as follows:
Ls = (Lf , Lp, Li, Lo, Le)
where, Lf, Lp, Li, Lo and Le are the semantic annotation information of Web service function, precondition of service happen, service input, service output and service effect, respectively.
The semantic similarity of annotated Web services is defined in the following. Given two Web services s1 and s2, their description is Ls1 and Ls2, respectively. The semantic similarity of their functions, pre-condition of invoking services, service input, service output and service effect are simf = sim (Lf1, Lf2), simp = sim(Lp1, Lp2), simi = sim (Li1, Li2), simo = sim (Lo1, Lo2) and sime = sim (Le1, Le2), respectively. Thus, the similarity of the two Web services is defined as follows:
simservice = sim (s1, s2) = sim (Ls1, Ls2)
= (simf, simp, simi, simo, sime)
If Lf1, Lp1, Li1, Lo1 and Le1 are the father description of Lf2, Lp2, Li2, Lo2 and Le2, respectively, then Ls1 is the father description of Ls2, Ls2 is the son description of Ls1, s1 is the father service of s2 and s2 is the father service of s1.
Given two similarities, simservice = (simf, simp, simi, simo, sime) and sim'service = (simf, simp, simi, simo, sime), from the previous definition, have simservice≥simservice, if and only if simf≥sim'f, simp≥sim'p, simi≥sim'I, simo≥sim'o and sim e≥sim'e.
WSDL-NS AND MATCHING ALGORITHM
WSDL: WSDL plays a very important role in the practical application of Web services. It is always used as an important method of the data interaction between two heterogeneous systems. Its elements are used as a tag in real WSDL files. It mainly consists of the following elements:
• | Type: Using data definition (string, int) of grammar (such as XML). |
• | Message: Data to transfer, such as input parameters, output parameters. The tag representation of Message elements is shown as follows: |
• | Part: Message parameters. |
• | Operation: The abstract description of the operations supported by services. The tag representation of Operation elements is given in the following: |
• | Port Type/Interface: An abstraction set of operations supported by one or several endpoints. This name has been changed and it may meet one of them. |
• | Binding: Specific agreement and specification of Specific port type. |
• | Port/Endpoint: The combination of binding and network addresses. This name has been changed and it may meet one of them. |
• | Service: A set of the related endpoints, such as related ports, operations and messages. |
WSDL-NS: The format of WSDL files needs to be improved to support natural language annotation when adding semantic information in WSDL files, since the WSDL language itself has no semantic. The language generated in this way is seen as a WSDL-NS language in this study.
To semantically describe Web services, the WSDL language will be extended in the following. A new tag <description> includes an verb description attribute verb and a noun description attribute noun. In this way, the tag <description> can describe the semantic information of Web services.
For the tag <operation>, one tag <description> is added and used to represent the function semantic description of Web services on the basis of the existed three sub-tags. The detailed description is shown as follows:
To describe the semantic information of preconditions and effects, new tags <preConditionDescription> and <effectDescription> are added in tag <operation>. They describe the semantic information of service preconditions and influence, respectively. We can increase many < description > tags under the two new tags. The detailed description is given as follows:
Although, the input/output parameter tags of Web services are defined in tag <operation >, they are a kind of information, i.e., the content contained in tag <message>. So the semantic description of input/output parameters should be put in tag <message>, in other words, a <description> tag is added in tag <message>. The detailed description is as follows:
WSDL-NS is more strict than WSDL. The name space of WSDL is no longer suitable for WSDL-NS.
Matching algorithm: In the following, a matching algorithm of Web services is given based on WSDL-NS and natural semantics:
Algorithm 1: | Matching algorithm of Web services |
EXPERIMENT AND ANALYSIS
PanSchema is developed independently by PanSoft company. It is a software development platform supporting a new generation of software development technology. It is built based on MDA (Model Driven Architecture) and used as a system of developing application softwares and generating large size business components.
Data interaction in PanSchema involved in many Web services. We apply it to the verification of the proposed methods in this paper. The standard semantic description of each service is given firstly and the Web services are annotated with natural semantic information. Then the requirement Web services are matched based on Algorithm 1. The experiment results are shown in Fig. 1 and 2, where x axis represent sim (n), y axis represents sim (v) and z axis represents the precision ratio in Fig. 1 and the recall ratio in Fig. 2, respectively. The average recall ratio and precision ratio are computed based on different threshold values.
Fig. 1: | The surface chart of precision ratio in the experiment |
Fig. 2: | The surface chart of recall ratio in the experiment |
We can find that precision ratio is optimized when verb similarity sim (v)≥0.9 and noun similarity sim (n)≥0.8. The recall ratio is optimized when verb similarity sim (v)≤0.7 and noun similarity sim (n)≤0.6.
The computing method of natural semantic similarity is improved in this paper. By comparing with the traditional methods, the precision of semantic similarity of the proposed method is increased. To add the semantic information in Web services, the Web services with natural semantic information are defined in the paper. WSDL-NS is constructed by means of WSDL and a new discovering algorithm of Web services is proposed based on natural semantic. Thus it lays a foundation for the practical application of Web services. Some simulation experiments are given based on the PanSchema software platform and the recall ratio and precision ratio are analyzed. However, a very high recall ratio is not obtained by this method and the annotated services with errors can not be recognized. Therefore, the future research will focus on the consistency of annotation and services to get the better results.
This work is supported by the National Natural Science Foundation of China under grant 61170078; the doctoral program of higher education of the specialized research fund of China under grant 20113718110004.