Tuesday, August 23, 2011

XPath 2.0 and XSD schemas : sharing experiences

I was just playing with XPath 2.0 and thought of sharing my observations, about a specific use case.

We start with the following XSD schema document,
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="X">
       <xs:complexType>
          <xs:sequence>
             <xs:element name="a" type="xs:integer"/>
          </xs:sequence>
          <xs:attribute name="att1" type="xs:boolean"/>
       </xs:complexType>
    </xs:element> 
</xs:schema>

This schema intends to validate an XML instance document like following,
<X att1="0">
  <a>100</a>
</X>

I wrote an XPath (2.0) expression like following [1],

/X[if (@att1) then true() else false()]/a/text()      AND ran this after enabling validation of the input document.

I though that this would not return any result (i.e an empty sequence).

But the XPath expression above ([1]) returns the result "100". At first thought, I was little amazed by this result. I thought, that since attribute "att1" was declared with type xs:boolean in the schema, the "if condition" should return 'false' in this case. But that's not the correct interpretation of the XPath expression written above ([1]). Following is a little more explanation about this.

The reference @att1 in the XPath expression above (i.e if (@att1) ..) is a node reference (an attribute node) and is not a boolean value (which I thought initially, and I was wrong -- I incorrectly thought, that atomization of the expression @att1 would take place in this case; more about this below).

The XPath 2.0 spec says, that if the first item in the sequence is a non null node, then effective boolean value of such a sequence is 'true' (this interpretation is unaffected by the fact, if the input XML document was validated or not with the XSD schema). And in the expression like above (i.e if (@att1) ..), the effective boolean value of the sequence {@att1} is used to determine IF the "if condition" returns 'true' or not (in this case, this sequence has one item [which is also the first item of this sequence] which is an attribute node whose name is "att1", which makes the effective boolean value as 'true' -- and hence the XPath predicate evaluates to 'true'). I think this explains, why the "if condition" {if (@att1)} would return true for the above XML instance document (even if it was validated by the schema given above, and the XPath 2.0 expression above [1] was run in a schema aware mode).

To write the XPath expression correctly, as I wanted (i.e the expression of the "if condition" should return 'true' if the instance document had value true/1 for the attribute, and 'false' otherwise AND an XSD validation of instance document took place prior to the evaluation of the XPath expression), the XPath expression would need to be modified to either of the following styles [2],

/X[if (data(@att1)) then true() else false()]/a/text()

OR

/X[if (@att1 = true()) then true() else false()]/a/text()

To understand why the expressions given above ([2]) work correctly, one needs to understand the XPath 2.0 "data" function (for the first correct variant above, [2] -- this returns the typed value of the argument of the "data" function) and the process of atomization (for the second correct variant above, [2] -- in this case the attribute node "att1" is atomized to return a sequence of kind {xs:boolean}) as described by the XPath 2.0 spec.

That's all about this. I hope that my experience with this may be helpful to someone (to understand this, one just has to know the XPath [2.0] spec correctly, and how it interacts with XSD schemas!).

Thanks for reading this post.

@2011-11-11: updated in place, to correct few factual errors.