Sunday, December 13, 2009

XSD 1.1: few more assertions and CTA use cases

In my quest to test Xerces-J's XSD 1.1 implementation, I've come up with another example, using XSD 1.1 assertions and CTA (type alternatives) which I'll like to share here.

Here's a fictitious use-case and some discussions and analysis of the XSD technical options, for solving this use-case, later on in this post.

XML document [1]:
  <shapes>
    <polygon kind="square">
      <a>10</a>  
      <b>10</b>
      <c>10</c>
      <d>10</d>
    </polygon>
    <polygon kind="rectangle">
      <a>10</a>  
      <b>8</b>
      <c>10</c>
      <d>8</d>
    </polygon>
    <polygon kind="triangle">
      <a>5</a>  
      <b>10</b>
      <c>15</c>
    </polygon>
  </shapes>

XML document [2]:
  <shapes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <polygon kind="square" xsi:type="Quadrilateral">
      <a>10</a>  
      <b>10</b>
      <c>10</c>
      <d>10</d>
    </polygon>
    <polygon kind="rectangle" xsi:type="Quadrilateral">
      <a>10</a>  
      <b>8</b>
      <c>10</c>
      <d>8</d>
    </polygon>
    <polygon kind="triangle">
      <a>5</a>  
      <b>10</b>
      <c>15</c>
    </polygon>
  </shapes>

XSD 1.1 Schema [3]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="shapes">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="polygon" type="Triangular" maxOccurs="unbounded">
             <xs:alternative test="@kind = ('square', 'rectangle')" type="Quadrilateral" />
           </xs:element>
         </xs:sequence>   
       </xs:complexType>    
    </xs:element>

    <xs:complexType name="Triangular">
       <xs:sequence>
         <xs:element name="a" type="xs:positiveInteger" />
         <xs:element name="b" type="xs:positiveInteger" />
         <xs:element name="c" type="xs:positiveInteger" />
       </xs:sequence>
       <xs:attribute name="kind" type="xs:string" use="required" />    
    </xs:complexType>

    <xs:complexType name="Quadrilateral">
       <xs:complexContent>
         <xs:extension base="Triangular">
           <xs:sequence>
             <xs:element name="d" type="xs:positiveInteger" />
           </xs:sequence>
           <xs:assert test="if (@kind = 'square') then (a = b and b = c and c = d) else true()" />
           <xs:assert test="if (@kind = 'rectangle') then (a = c and b = d) else true()" />
         </xs:extension>
       </xs:complexContent>
    </xs:complexType>

  </xs:schema>

XSD 1.1 Schema [4]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="shapes">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="polygon" type="Polygon" maxOccurs="unbounded" />
         </xs:sequence>   
       </xs:complexType>    
    </xs:element>

    <xs:complexType name="Polygon">
       <xs:sequence>
         <xs:element name="a" type="xs:positiveInteger" />
         <xs:element name="b" type="xs:positiveInteger" />
         <xs:element name="c" type="xs:positiveInteger" />
         <xs:element name="d" type="xs:positiveInteger" minOccurs="0" />
       </xs:sequence>
       <xs:attribute name="kind" type="xs:string" use="required" />
       <xs:assert test="if (@kind = 'triangle') then not(d) else true()" />
       <xs:assert test="if (@kind = 'square') then (a = b and b = c and c = d) else true()" />
       <xs:assert test="if (@kind = 'rectangle') then (a = c and b = d) else true()" />    
    </xs:complexType>

  </xs:schema>

XSD 1.1 Schema [5]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="shapes">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="polygon" type="Polygon" maxOccurs="unbounded" />
         </xs:sequence>   
       </xs:complexType>
    </xs:element>

    <xs:complexType name="Polygon">
       <xs:sequence>
         <xs:element name="a" type="xs:positiveInteger" />
         <xs:element name="b" type="xs:positiveInteger" />
         <xs:element name="c" type="xs:positiveInteger" />
         <xs:element name="d" type="xs:positiveInteger" minOccurs="0" />
       </xs:sequence>
       <xs:attribute name="kind" use="required">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="square" />
              <xs:enumeration value="rectangle" />
              <xs:enumeration value="triangle" />
            </xs:restriction>
          </xs:simpleType>
       </xs:attribute>
    </xs:complexType>

  </xs:schema>

XSD 1.1 Schema [6]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="shapes">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="polygon" type="Polygon" maxOccurs="unbounded" />
         </xs:sequence>   
       </xs:complexType>    
    </xs:element>

    <xs:complexType name="Polygon">
       <xs:sequence>
         <xs:element name="a" type="xs:positiveInteger" />
         <xs:element name="b" type="xs:positiveInteger" />
         <xs:element name="c" type="xs:positiveInteger" />
       </xs:sequence>
       <xs:attribute name="kind" use="required">
         <xs:simpleType>
           <xs:restriction base="xs:string">
             <xs:enumeration value="square" />
             <xs:enumeration value="rectangle" />
             <xs:enumeration value="triangle" />
           </xs:restriction>
         </xs:simpleType>
        </xs:attribute>    
    </xs:complexType>
 
    <xs:complexType name="Quadrilateral">
       <xs:complexContent>
         <xs:extension base="Polygon">
           <xs:sequence>
             <xs:element name="d" type="xs:positiveInteger" />
           </xs:sequence>
         </xs:extension>
       </xs:complexContent>
    </xs:complexType>

  </xs:schema>

The goal of this use-case is following:
To define a XSD content model, for the XML document [1].

Solution of the use-case, and analysis:
The XSD 1.1 way of solving this would be schema's, [3] or [4] (These are possible solutions, that come to my mind. There could be other solutions too).

The Schema [3] uses both CTA and assertions. Whereas, Schema [4] uses only assertions. To solve this particular use-case, I might likely prefer Schema [4], because the content model defined in this schema is simpler/smaller, which is achieved by defining less of Schema types (only the 'Polygon' type here), and achieving further validation objectives, by defining assertions within this type.

Though, Schema [3] is also an useful solution to this problem, which according to me depicts better XSD type modularity, and also offers better possibilies to reuse the types, defined here in other contexts/use-cases.

But my gut feeling, is to go for Schema [4], for this use-case :)

I am next trying to think, how to solve this use-case in XSD 1.0 way. Here are the things, that come to my mind (with some of of my analysis):
1. Write a XSD Schema, as number [5] above. This is close to the desired solution of the use-case, described in this post. But this schema, doesn't solve this problem completely, as it doesn't strictly enforce the properties of a traingle (has 3 sides), square (has 4 sides, and all sides are equal) or a rectangle (has 4 sides, and opposite sides are equal). This is where, XSD assertions are really needed, if we want to specify XML validation entirely in the XSD layer (I think specifying much of XML validation in XSD layer is good, from application design point of view, as constraints specified with assertions, are entirely declarative and can be easily specified/modified by people, responsible for maintaining business rules, and without requiring to write say procedural code for these kind of validations, in imperative/OO languages like Java).
2. Modify the XML instance, to something like [2]. i.e, make use of XSD 1.0 construct xsi:type (which needs to be specified in the XML instance document) in some way, and validate it with a Schema like, [6]. This solution again, doesn't (and I think with XSD 1.0, we cannot do so) enforce properties of different kind of polygons (as specified in point 1, above), and this also makes the XML document XSD language specific (because it contains the instruction, xsi:type from XSD namespace), making it inconvenient to use such an XML document in environments, where XSD is not available, or where XSD processing is not needed.

The solutions presented in this post, are some of the possible ways, in which the given problem description here might be solved. But I can imagine, that there could be few other possibilities too (from XSD, syntax point of view), to solve such a use-case.

That's all about, I wanted to write at the moment :)

I hope that this post was useful!

Thursday, December 10, 2009

Xerces-J: XSD 1.1 assertions implementation updates

There have been some improvements lately, to the XSD 1.1 assertions support in Xerces-J.

Here are the summary of recent assertion implementation changes, in Xerces-J:

1) XPath 2 expressions, in assertion facets should not access the XPath 2 context, because XPath context is "undefined" during assert facet evaluation

This implies that, the right way to invoke assertion facets, is as follows:
  <xs:simpleType>
    <xs:restriction base="xs:int">
      <xs:assertion test="$value mod 2 = 0" />
    </xs:restriction>
  </xs:simpleType>

(i.e, we need to use the XPath "dynamic context" variable, $value to access the XSD simple type value.)

If an attempt is made to access the XPath context in above XPath expression, like say as follows (using the expression, "." here):
<xs:assertion test=". mod 2 = 0" />

Xerces returns an error message like, following:
test.xml:4:21:cvc-assertion.4.3.15.3: Assertion evaluation ('. mod 2 = 0') for element 'x (attribute => a)' with type '#anonymous' did not succeed (undefined context).

Or an XPath expression, like following:
./@a mod 2 = 0

Would result in a similar error.

A special error message, was constructed (designating, "undefined context" to the user) in Xerces, for this use case.

2) Ability to evaluate assertions, on XML attributes

If attributes in XML document use user-defined XSD simple types, then assertions would also apply to attributes, as they do for XML elements.

Following is a little example for this, use case.

XML document:
  <Example>
    <x a="210">101</x>
  </Example>

Corresponding XSD 1.1 schema:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Example">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="x" type="X_Type" maxOccurs="unbounded" />
         </xs:sequence>
       </xs:complexType>
    </xs:element>
 
    <xs:complexType name="X_Type">
       <xs:simpleContent>
          <xs:extension base="xs:int">
             <xs:attribute name="a">
                <xs:simpleType>
                   <xs:restriction base="xs:int">
                     <xs:assertion test="$value mod 2 = 0" />
                   </xs:restriction>
                </xs:simpleType>
             </xs:attribute>
          </xs:extension>
       </xs:simpleContent>
     </xs:complexType>

  </xs:schema>

Please note, that how we specify a XSD user-defined simple type for attribute "a" above, and an assertion facet on the simple type (there could by 0-n assertion facets here, as we have been looking at earlier).

The assertion facet XPath expression, $value mod 2 = 0 would operate on the context variable, $value (which is the attribute's value) and such an assert facet doesn't have access to the XPath context (a "context undefined" error would be flagged by Xerces, if an attempt is made to access the XPath context).

I hope, that this post was useful.

Saturday, December 5, 2009

XPath 2.0: PsychoPath XPath processor update

I've just run all the PsychoPath XPath 2 processor (an Eclipse Web Tools, Source Editing sub-project) W3C test-suite tests, and here are the results for them:

Tests: 8143
Errors: 0
Failures: 0

So it seems, PsychoPath XPath engine passes, 100% of the W3C XPath 2.0 test suite, and some of it's own tests.

This should be a moment of cheer, and wow!

It also looks, like that the upcoming Xerces-J release, 2.10.0 (ref, http://wiki.apache.org/xerces/November2009) would be getting almost a compliant XPath 2.0, engine for XSD 1.1 assertions and CTA.

Ref: An earlier post about PsychoPath status: http://mukulgandhi.blogspot.com/2009/09/psychopath-xpath-20-processor-update.html.

Saturday, November 28, 2009

Xerces-J: XSD 1.1 assertions on simple types

I'm trying to put up a post here, with few examples for assertions on XSD simple types, and also for complex types with simple contents, and testing them with Xerces-J XSD 1.1 implementation. The previous couple of posts on this blog, described assertions on XSD complex types having complex content (i.e, elements having "element only" or mixed content, and/or attributes).

1) Here's an example, taken from Roger L. Costello's collections of XSD 1.1 examples, which he's published on his web site:

XML document [1]:
  <Example>
    <even-integer>100</even-integer>        
  </Example>

XSD 1.1 document [2]:
  <schema xmlns="http://www.w3.org/2001/XMLSchema"
          elementFormDefault="qualified">

    <element name="Example">
       <complexType>
          <sequence>
             <element name="even-integer">
                <simpleType>
                  <restriction base="integer">
                     <assertion test="$value mod 2 = 0" />
                  </restriction>
                </simpleType>
             </element>
          </sequence>
       </complexType>
    </element>

  </schema>

The above XSD 1.1 schema [2] constrains the XSD integer values, to only even ones (this works fine with Xerces!). XSD 1.1 defines a new facet named, assertion on XSD built in simple types, which the above example describes.

Please note that, "assertion" facet (applicable both to XSD simple types, and complex types with simple contents) is conceptually different than "assert" constraint on complex types (some of the explanation, about this is also given below as well).

The XSD 1.1 spec mentions, that the assertions XPath 2 "dynamic context" get's augmented with a variable, $value. The XSD type of variable, $value is that of the base simple type (in this example, the type of $value is xs:integer). The detailed rules, for using variable $value in XSD 1.1 schemas are described, here.

It looks to me, that the ability to have an assertion facet on simple types, significantly enhances the XSD author's capability to provide many new constraints on simple type values, which were not possible in XSD 1.0 (for e.g, an ability to constrain integer values to be even, was not possible in XSD 1.0).

For the above example, we could specify assertions to something like below, as well:
<assertion test="$value mod 2 = 0" />
<assertion test="$value lt 500" />
(i.e, a set of two assertion facet instances)

Or perhaps, specifying only one assertion facet instance as following, <assertion test="($value mod 2 = 0) and ($value lt 500)" /> if user wishes, which realizes the same objective.

This enforces that the simple type value should be even, and also should be less than 500. Also, there are no limits to the number of assertion facet instances that can be specified. To my opinion, an ability to specify unlimited number of assertion facets (and also the assert constraints on complex types), makes assertions a tremendously useful XSD validation constructs.

Notes: Interestingly, the following facet definition achieves the same results as met by the 2nd assertion facet instance, that's described above:
<maxExclusive value="500" />
(this was available in, XSD 1.0 as well)

2) Complex types with simple contents, using assertions:
XML document [3]:
  <root>
    <x label="a">2</x>
    <x label="b">4</x>
  </root>

Here, the element "x" should have an attribute "label" with type xs:string. But the content of element "x" is simple (of type, xs:int for this example).
Additional we also want, that the simple content value of "x", should be an even number.

The XSD document for these validation constraints, is as follows [4]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
   <xs:element name="root">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="x" maxOccurs="unbounded" type="X_Type" />
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   
   <xs:complexType name="X_Type">
     <xs:simpleContent>
        <xs:extension base="xs:int">    
          <xs:attribute name="label" type="xs:string" />
          <xs:assert test="$value mod 2 = 0" />
        </xs:extension>
     </xs:simpleContent>
   </xs:complexType>
  
  </xs:schema>

The use of xs:assert instruction is stressed in this example.

It's interesting to see, that if we change value of one of "x" elements as follows:
<x label="a">21</x>
(I changed the first "x")

Xerces fails the validation of XML instance, and returns following error message to the user:
test.xml:2:22:cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'x' with type 'X_Type' did not succeed.

Here, the XML validation did not succeed, because the value 21 is not an even number.

3) The last example of this post is following:
This describes the scenario of Complex types with simple contents. But here, the simple content get's its value by "restriction of a complex type". The previous example described Complex types with simple contents, using derivation by extension.

The XML file remains same [3], while the new XSD document is following [5]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
   <xs:element name="root">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="x" maxOccurs="unbounded" type="X_Type" />
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   
   <xs:complexType name="X_Type">
     <xs:simpleContent>
        <xs:restriction base="x_base">      
           <xs:assertion test="$value mod 2 = 0" />
           <xs:assert test="@label = ('a','b')" />
        </xs:restriction>
     </xs:simpleContent>
   </xs:complexType>
   
   <xs:complexType name="x_base">
     <xs:simpleContent>
        <xs:extension base="xs:int">    
          <xs:attribute name="label" type="xs:string" />
        </xs:extension>
     </xs:simpleContent>
   </xs:complexType>
  
 </xs:schema>

Please notice, how assertions are specified on the complex type, "X_Type" (shown with bold emphasis). Here, we have two assertion instructions (xs:assertion and xs:assert). In this example, xs:assertion is a facet for the atomic value, of the complex type (the value of complex type is simple in this case!). While xs:assert is the assertions instruction on the complex type (which has access to the element tree).

The complexType -> simpleContent -> restriction, type definition can specify assertions with following grammar:
... assertion*, ..., assert* (i.e, 0-n xs:assertion components can be followed by 0-n xs:assert components (this ordering is significant, otherwise the XSD 1.1 processor will flag an error).
There could be other constructs as well, before xs:assertion here (and some after it. But anything after xs:assertion*, needs to be before the trailing xs:assert's). This is described in the relevant XSD 1.1 grammar at, http://www.w3.org/TR/2009/CR-xmlschema11-1-20090430/#dcl.ctd.ctsc.

Notes: The XML Schema WG decided to have two different names for assertion instructions (xs:assertion and xs:assert), for this particular scenario, so that the XSD Schema authors could decide, whether they are writing assertions as a facet for simple values, or assertions for complex types (which have access to the element tree). If this naming distinction was not made in XSD 1.1 assertions, then specification of asserts in XSD documents, in this case would have caused ambiguity (i.e, the XSD 1.1 processor could not tell, which assertion is a facet, and which is an assertion for the complex type).

Acknowledgements:
I must mention that XSD 1.1 examples shared by Roger L. Costello, helped us fix quite a bit of bugs in Xerces assertions implementation. Our sincere thanks are due, to Roger.

References:
1. Reader's could also find this article useful, http://www.ibm.com/developerworks/library/x-xml11pt2/ about XSD 1.1 co-occurence constraints, which describes XSD 1.1 assertions facility in detail.

I hope that this post was useful.

Friday, November 27, 2009

XSD 1.1: another assertions example with Xerces-J !

Here's another XSD 1.1 assertions example, which I came up with today :)

An XML document is something like below:
  <person_db>
    <person id="1">
      <fname>john</fname>
      <lname>backus</lname>
      <dob>1995-12-10</dob>
    </person>
    <person id="2">
      <fname>rick</fname>
      <lname>palmer</lname>
      <dob>2001-11-09</dob>
    </person>
    <person id="3">
      <fname>neil</fname>
      <lname>cooks</lname>
      <dob>1998-11-10</dob>
    </person>
  </person_db>

Other than constraining the XML document to a structure like above, the XSD schema should specify following additional validation constraints, as well:
1) Each person's dob field should specify a date, which must be later than or equal to the date, 1900-01-01.
2) Each "person" element, should be sorted numerically according to "id" attribute, in an ascending fashion.

I wanted to achieve these validation objectives, completely with XSD 1.1 assertions. Here's the XSD 1.1 document, which I find that works fine, with Xerces-J:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
     <xs:element name="person_db">
       <xs:complexType>
          <xs:sequence>
            <xs:element name="person" maxOccurs="unbounded" type="Person" />
          </xs:sequence>
          <xs:assert test="every $p in person[position() lt last()] satisfies
                            ($p/@id lt $p/following-sibling::person[1]/@id)" />
       </xs:complexType>
     </xs:element>
   
     <xs:complexType name="Person">
        <xs:sequence>
          <xs:element name="fname" type="xs:string" />
          <xs:element name="lname" type="xs:string" />
          <xs:element name="dob" type="xs:date" />
        </xs:sequence>
        <xs:attribute name="id" type="xs:int" use="required" />
        <xs:assert test="dob ge xs:date('1900-01-01')" />
     </xs:complexType>
  
   </xs:schema>

Notes: It also seems, that above XSD validation requirements could be met, with following changes as well:
1. Remove assertion from the complex type, "Person".
2. Have an additional assertion on the element, "person_db" which will now look something like following:
<xs:assert test="every $p in person[position() lt last()] satisfies
($p/@id lt $p/following-sibling::person[1]/@id)" />
<xs:assert test="every $p in person satisfies ($p/dob ge xs:date('1900-01-01'))" />

i.e, we'll now have two assertions on the element, "person_db" (which are actually specified on the element's schema type).

Though, I seem to like the first solution as it seems elegant to me, and more logically in place.

I am happy, that this particular example worked fine as I expected, with Xerces.

I hope that this post was useful.

Friday, November 20, 2009

XSD 1.1: some CTA samples with Xerces-J

I've been trying to write few XSD 1.1 Conditional Type Assignment (CTA) samples, and trying them to run with the current Xerces-J schema development SVN code.

To start with, here's the first example (a very simple one indeed) that I find, which runs fine with Xerces-J:

XML document [1]:
  <root>
    <x>hello</x>
    <x kind="int">10</x>
  </root>

XSD 1.1 document [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

     <xs:element name="root">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="x" type="xs:anyType" maxOccurs="unbounded">
             <xs:alternative test="@kind='int'" type="xInt_Type" />
             <xs:alternative type="xString_Type" />
           </xs:element>
         </xs:sequence>
       </xs:complexType>
     </xs:element>

     <xs:complexType name="xInt_Type">
       <xs:simpleContent>
         <xs:extension base="xs:int">
           <xs:attribute name="kind" type="xs:string" />
         </xs:extension>
       </xs:simpleContent>
     </xs:complexType>

     <xs:complexType name="xString_Type">
       <xs:simpleContent>
         <xs:extension base="xs:string">
           <xs:attribute name="kind" type="xs:string" />
         </xs:extension>
       </xs:simpleContent>
     </xs:complexType>

  </xs:schema>

Please note the presence of XSD 1.1 instruction, xs:alternative (which is newly introduced in XSD 1.1, and makes this XSD Schema, a type alternative scenario), within the declaration for element, "x" in above schema [2]. If the value of "kind" attribute on element "x" is 'int', then a schema type "xInt_Type" will be assigned to element "x". If the attribute "kind" is not present on element, "x" or if it's present, and it's value if not 'int', the schema type xString_Type get's assigned to element, "x".

Xerces-J successfully validates the above XML document [1] with the given XSD 1.1 Schema [2].

If we introduce the following change to the XML document:
<x kind="int">not an int</x>

Xerces-J would display following error messages:
cvc-datatype-valid.1.2.1: 'not an int' is not a valid value for 'integer'.

The above error message is correct, because the value 'not an int' in the XML document is not of type, xs:int.

Notes:
The schema types specified on xs:alternative instructions, need to validly derive (also referred to as, "type substitutable" in XSD 1.1 spec) from the default type specified on the element (which is, xs:anyType in this example), or the type on xs:alternative could be xs:error (this is a new schema type defined in XSD 1.1 spec, and is particularly useful with XSD type alternatives. The schema type xs:error has an empty lexical and value space, and any XML element or attribute which has this type, will always be invalid).

So for example, if we write an element declaration like following (demonstrating type substitutability/derivation of XSD types, specified on xs:alternative instructions):
  <xs:element name="x" type="xs:string" maxOccurs="unbounded">
    <xs:alternative test="@kind='int'" type="xInt_Type" />
  ...

Xerces-J would return following error message:
e-props-correct.7: Type alternative 'xInt_Type' is not xs:error or is not validly derived from the type definition, 'string', of element 'x'.

Making use of type xs:error, in CTAs:
Let's assume, that XML document remains same as document [1], and declaration of element "x" is now written like following:
  <xs:element name="x" type="xs:anyType" maxOccurs="unbounded">
    <xs:alternative test="@kind='int'" type="xInt_Type" />
    <xs:alternative type="xs:error" />
  </xs:element>

Now Xerces returns an error message like following:
cvc-datatype-valid.1.2.1: 'hello' is not a valid value for 'error'.

For this particular example, this error would occur if attribute "kind" is not present, or if the attribute "kind" is present, and it's value is not 'int'.

Xerces-J CTA implementation, using PsychoPath XPath 2 engine:
The XSD 1.1 spec, defines a small XPath 2 language subset, to be used by XSD 1.1 CTA instructions. Xerces-J has a native implementation of this XPath 2 subset (implemented by Hiranya Jayathilaka, a fellow Xerces-J committer), which get's selected by Xerces as a default XPath 2 processor, if CTA XPath 2 expressions conform to this XPath 2 subset (this was designed into Xerces, to make efficient XPath 2 evaluations using the CTA XPath 2 subset, since evaluating every XPath 2 expression with PsychoPath engine could have been computationally expensive).

But if, the XSD CTA XPath 2 expressions cannot be compiled by the native Xerces-J CTA XPath 2 subset, Xerces will attempt to use the PsychoPath XPath engine to evaluate CTA XPath expressions, as a fall back option (and also to enable users to use the full XPath 2 language with Xerces CTA implementation, if they want to).

To test, that PsychoPath engine does work with Xerces CTA implementation, I modified the type alternative instruction for the XSD example [2] above, to following:
<xs:alternative test="@kind='int' and (tokenize('xxx xx', '\s+')[1] eq 'xxx')" type="xInt_Type" />
I added a dummy XPath "and" clause, which can only succeed with Xerces, if PsychoPath engine would evaluate this XPath expression. This additional "and" clause doesn't make any difference to the validity of the XML document [1], as in this example it would always evaluate to a boolean "true". If we try to introduce any error into the above XPath expression like say, to following:
tokenize('xxx xx', '\s+')[1] eq 'xx' (please note the change from eq 'xxx' to eq 'xx', which will cause this XPath expression to evaluate to a boolean "false"), Xerces would report a XML validity error, which is really expected of the Xerces CTA implementation.

I hope that this post was useful.

Wednesday, November 18, 2009

XSD 1.1: some XSD 1.1 samples running with Xerces-J

I was thinking lately to functionally stress test, the upcoming Xerces-J XSD 1.1 preview release (using the SVN code we have now, and later using the public binaries which will be provided by the Xerces project). I'm just curious to know, if there are any non-compliant parts in Xerces-J XSD 1.1 implementation, that I can find, which could probably serve as inputs to improving Xerces-J XSD 1.1 code base. To start with, I'll try to write few XSD 1.1 schemas, using the XSD 1.1 assertions and "Conditional Type Assignment (CTA)/type alternative" instructions.

Assertions examples

Example 1
Sample XML [1]
  <x a="xyz">
    <foo>5</foo>
    <bar>10</bar>
  </x>

XSD 1.1 Schema [2]
(Use Case: "the value of the foo element must be less than or equal to the value of the bar element")
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
    <xs:element name="x">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="foo" type="xs:int" />
           <xs:element name="bar" type="xs:int" />
         </xs:sequence>
         <xs:attribute name="a" type="xs:string" use="required" />
         <xs:assert test="foo le bar" />
      </xs:complexType>
    </xs:element>
  
  </xs:schema>

Using Xerces-J XSD 1.1 validator, the XML document [1] above validates fine with the given XSD document [2].

If the assertion is written as follows (which is a false assertions. this is just to check for false assertions, and the error messages):
<xs:assert test="(foo + 10) le bar" />

Then that would make the XML instance document ([1] above) invalid, and following error message is returned by Xerces:
test.xml:4:5:cvc-assertion.3.13.4.1: Assertion evaluation ('(foo + 10) le bar') for element 'x' with type '#anonymous' did not succeed.

Use Case: "if the value of the attribute "a" is xyz, then the bar and baz elements are required, but otherwise they are optional".

This would require following assertion definition:
<xs:assert test="if (@a eq 'xyz') then (foo and bar) else true()" />

This works fine with Xerces-J.

Acknowledgements: Thanks to Douglass A Glidden for contributing these use cases, on xml-dev list.

Example 2
Sample XML [3]
  <Example>
    <x>hi</x>
    <y>there</y>
    <ASomeNameSuffix/>
  </Example>

XSD 1.1 Schema [4]
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Example" type="myType" />
 
    <xs:complexType name="myType">
      <xs:sequence>
        <xs:element name="x" type="xs:string" />
        <xs:element name="y" type="xs:string" />
        <xs:any processContents="lax" />
      </xs:sequence>
      <xs:assert test="starts-with(local-name(*[3]), 'A')" />
    </xs:complexType>

  </xs:schema>

In this particular example (Example 2), the immediate sibling element, of element "y" is defined via the XSD wild-card instruction, <xs:any/>. The assertion in XSD Schema [4] enforces, that name of the sibling element, that appears after element "y" must start with letter "A". I think, this could not have been accomplished (i.e, defining a constraint on an element name, in xs:any wild-card instruction) with XSD 1.0.

Example 3
Sample XML [5]
  <record>
    <wins>20</wins>
    <losses>15</losses>
    <ties>8</ties>
    <!--
      0 to n no's of well-formed elements, allowed here
      by XSD wild-card instruction, <xs:any />
    -->
  </record>

XSD 1.1 Schema [6]
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:complexType name="Record">
      <xs:sequence>
        <xs:element name="wins" type="xs:nonNegativeInteger"/>
        <xs:element name="losses" type="xs:nonNegativeInteger"/>
        <xs:element name="ties" type="xs:nonNegativeInteger" minOccurs="0"/>
        <xs:any minOccurs="0" maxOccurs="unbounded" namespace="##any" processContents="lax"/>   
      </xs:sequence>
      <xs:assert test="every $x in ties/following-sibling::* satisfies
                     not(empty(index-of(('x','y','z'), local-name($x))))" />
    </xs:complexType>

    <xs:element name="record" type="Record"/>

  </xs:schema>

The XSD schema, [6] validates the XML document [5]. The <xs:any ../> instruction in this schema ([6]) allows, 0-n number of well-formed XML elements after element, "ties". This facility was available in XSD 1.0 as well (for the interest of readers, XSD 1.1 has a weakened wild-card support, which makes the above XSD schema [6] valid -- in XSD 1.0 this schema was invalid, due to enforcement of UPA (unique particle attribution) constraint. An example of this is given in an article here, http://www.ibm.com/developerworks/xml/library/x-xml11pt3/index.html#N10122.).

The assertion in this schema ([6]) enforces that, any element after element, "ties" which is allowed by the xs:any wild-card, should have a name (i.e, a name without namespace prefix -- a XML local-name) among this list, ('x', 'y', 'z'). Something like this, was not possible with XSD 1.0, and to my opinion this is nice :)

PS: more examples to follow, in the next few posts :)

References:
XSD 1.1 Part 1: Structures
XSD 1.1 Part 2: Datatypes

I must acknowledge (a long enough acknowledgement. but I must do it anyway :)), that Xerces assertions is really powered by the PsychoPath XPath 2 engine, and the credit for bringing PsychoPath engine to almost 100% compliance to W3C XPath 2.0 test suite (as of now, PsychoPath is 99% + compliant to the W3C XPath 2.0 test suite) should largely go to Dave Carver and Jesper Steen Møller. I was fortunate enough to contribute somewhat to PsychoPath XPath implementation (the freedom given to me as a Eclipse Source Editing project committer -- thanks to Dave Carver for this, helped me to drive Xerces assertions development quickly). Needless to mention the original PsychoPath code contribution by Andrea Bittau and his team, to Eclipse Foundation. I must also mention the numerous reviews, and improvements suggested by Khaled Noaman and general design advice by Michael Glavassevich (both are Xerces committers) helped tremendously while developing Xerces assertions. I must also mention Ken Cai's contribution, who wrote the original Xerces-PsychoPath interface, and also an initial implementation of that interface.

Saturday, November 14, 2009

Xerces-J XSD 1.1 update: bug fixes and enhancements

The Xerces-J team did few enhancements to the XSD 1.1 implementation, which solves few important XSD namespace URI issues, which affected Xerces assertions and Conditional Type Alternatives (CTA) implementation. These changes went into the Xerces-J SVN repository today.

Here are the summary of these improvements:
1) There is now an ability with Xerces-J XSD 1.1 implementation, to pass on the XSD language namespace prefix (which is declared on the XSD <schema> element), along with the XSD language URI as a prefix-URI binding pair to PsychoPath XPath 2.0 engine. This enhancement allows, the XSD language prefix declared on the "XSD 1.1 Schema instance" 's <schema> element to be used in the assertions and CTA XPath 2.0 expressions, for example as following:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" ...>
    ...
    <xs:assert test="xs:string(test) eq 'xxx'" />
    ...
  </xs:schema>

OR say,

  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" ...>
    ...
    <xsd:assert test="xsd:string(test) eq 'xxx'" />
    ...
  </xsd:schema>

The earlier code in Xerces SVN (before the today's commit), hardcoded the XML Schema prefix to string, "xs" while communicating to the PsychoPath XPath 2 engine interface. That didn't allow the XPath 2 expressions in assertions and CTA to evaluate correctly (the Xerces code before this fix, always returned false for assertions, due to the presence of this bug), which used any other XSD prefix, like say "xsd" (even if the prefix "xsd" was bound to the XSD namespace, on the XSD root element, <schema>).

This was a significant Xerces assertions and CTA bug, which got solved today, and the fix for this is now available on the Xerces-J XSD 1.1 development SVN repository.

2) Another enhancement which went into Xerces-J SVN repository today, is the ability to specify the XPath 2.0 F&O namespace declaration on the XSD document root element, <schema>.

This enhancement makes possible something like, the following XSD 1.1 Schema to become valid:
  <xs:schema xmlns:xs="" xmlns:fn="http://www.w3.org/2005/xpath-functions" ...>
    ...
     <xs:assert test="xs:string(test) eq fn:string('xxx')" />
    ...
  </xs:schema>

Here the XML Schema author can, qualify the XPath 2 function calls in assertions XPath expressions, with the XPath 2 F&O namespace prefix, like fn:string('xxx') above. The F&O namespace prefix must be bound to the F&O namespace URI, "http://www.w3.org/2005/xpath-functions" for such a XSD Schema to be valid.

Even the following XSD 1.1 Schema is also valid (this happened to work correctly, earlier also before this Xerces SVN commit):
  <xs:schema xmlns:xs="" ...>
    ...
     <xs:assert test="xs:string(test) eq string('xxx')" />
    ...
  </xs:schema>

Here the XML Schema author, can use XPath 2 functions in Xerces assertions without specifying any prefix, for example like string('xxx') in the above example. The XPath 2 function calls without specifying the XPath 2 F&O prefix, would work correctly for all the XPath 2.0 built in functions, in Xerces assertions XPath 2 expressions.

World community grid

There seems to be a nice initiative, "world community grid". I think, IBM sponsors this community computing grid. I have been participating on this grid, since quite a few days now, and it really works! and I believe, it does make a difference to community good.

This grid is composed by, computers which could be normal public personal computers at home, or office or any kind of computers that all can connect to the web. When a grid client is connected to the web, enabled by user authentication, the client computer participates in numerous public computing projects. Joining the grid, helps us to donate our computer's processing power to computations needed by these public projects, normally those that require massive computing simulations in short time.

Joining the grid, doesn't disrupt the normal user activity on client computers, and the grid client intelligently utilizes memory (a very less amount of memory is needed by the grid client, while it works, which is normally as less as 5-10 MB) and the CPU, without disrupting anything for user's personal activities. It is also possible to configure the user's grid activity, about how to use one's CPU. Somebody may want to work in the default mode, or can give more CPU usage to the grid project tasks. The default mode works, well for me.

All these details, and much more are available on the "world community grid", web page.

Friday, November 13, 2009

XML spec and XSD

A few days ago, I started of a pretty length discussion on xml-dev mailing list about the following topic,

"Should the W3C XML specification specify XML Schema (a.k.a XSD) also as a XML validation language, as it specifies DTD (Document Type Definition)."

The XML spec seems to convey, that an XML document is valid, *only* if it's valid according to a DTD. I had a contention to this point, and started of a debate on xml-dev list related to this question. I argued, that since there are now newer XML validation languages like XSD, RelaxNG, Schematron etc, the XML spec now can modify the XML validation definition to refer to other XML Schema languages as well, rather than saying, that XML document is valid *only* if DTD is associated with the XML document.

Unfortunately, may people who spoke on xml-dev, who have been working with XML for long, did not agree to this idea. But alas, I still feel I had/have a valid point about this :(

I am referring to this threaded discussion again here, for records of this blog. Please follow this link, if anybody wants to read this whole discussion.

Sunday, November 1, 2009

XSLT 1.0: Regular expression string tokenization, and Xalan-J

Some time ago, XSLT folks were debating on xsl-list (ref, http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/200910/msg00365.html) about how to implement string tokenizer functionality in XSLT. XPath 2.0 (and therefore, XSLT 2.0) has a built in function for this need (ref, fn:tokenize). XPath 2.0 string tokenizer method, 'fn:tokenize' takes a string and a tokenizing regular expression pattern as arguments. This is something, which cannot be done natively in XSLT 1.0. To do this, with XSLT 1.0 we need to write a recursive tokenizing "named XSLT template". But a "named XSLT template" using XSLT 1.0, for string tokenization has limitation, that it cannot accept natively an arbitrary regular expression, as a tokenizing delimiter.

I got motivated enough, to write a Java extension mechanism for regular expression based, string tokenization facility for XSLT 1.0 stylesheets, using the Xalan-J XSLT 1.0 engine.

Here's Java code and a sample XSLT stylesheet for this particular, functionality:

String tokenizer Xalan-J Java extension:
package org.apache.xalan.xslt.ext;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.apache.xpath.NodeSet;
import org.w3c.dom.Document;

public class XalanUtil {
    public static NodeSet tokenize(String str, String regExp) throws ParserConfigurationException {
      String[] tokens = str.split(regExp);
      NodeSet nodeSet = new NodeSet();
       
      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
      DocumentBuilder docBuilder = dbf.newDocumentBuilder();
      Document document = docBuilder.newDocument();
       
      for (int nodeCount = 0; nodeCount < tokens.length; nodeCount++) {
        nodeSet.addElement(document.createTextNode(tokens[nodeCount]));   
      }
       
      return nodeSet;
    }
}
Sample XSLT stylesheet, using the above Java extension (named, test.xsl):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0"                                                    
                xmlns:java="http://xml.apache.org/xalan/java"
                exclude-result-prefixes="java">
                 
   <xsl:output method="xml" indent="yes" />
   
   <xsl:param name="str" />
   
   <xsl:template match="/">
     <words>
       <xsl:for-each select="java:org.apache.xalan.xslt.ext.XalanUtil.tokenize($str, '\s+')">
         <word>
           <xsl:value-of select="." />
         </word>
       </xsl:for-each>
     </words>
   </xsl:template>
   
 </xsl:stylesheet>
Now for e.g, when the above stylesheet is run with Xalan as follows: java -classpath <path to the extension java class> org.apache.xalan.xslt.Process -in test.xsl -xsl test.xsl -PARAM str "hello world", following output is produced:
<?xml version="1.0" encoding="UTF-8"?>
<words>
 <word>hello</word>
 <word>world</word>
</words>

This illustrates, that regular expression based string tokenization was applied as designed above, for XSLT 1.0 environment.

The above Java extension, should be running fine with a min JRE level of, 1.4 as it relies on the JDK method, java.lang.String.split(String regex) which is available since JDK 1.4.

PS: For easy reading and verboseness, the package name in the above Java extension class may be omitted, which will cause the corresponding XSLT instruction to be written like following:
xsl:for-each select="java:XalanUtil.tokenize(... I would personally prefer this coding style, for production Java XSLT extensions. Though, this should not matter and to my opinion, decision to handle this can be left to individual XSLT developers.

I hope, that this was useful.

Sunday, October 25, 2009

Mozilla firefox and XPath namespace axis

C. M. Sperberg-McQueen shared with us, that Mozilla Firefox browser doesn't implement the XPath namespace axis (ref, http://cmsmcq.com/mib/?p=757). CMSMcQ has encouraged us to cast a vote on Mozilla forum, to push Firefox team, to implement XPath namespace axis. I agree with CMSMcQ, and also find that XPath namespace axis is quite a critical functionality for XPath data model. This is certainly true for XPath 1.0, where namespace axis is very critical (and Mozilla, implements XPath 1.0). In XPath 2.0, namespace axis is deprecated but namespace nodes still is a core part of, XPath 2.0 data model as well.

I have already casted my vote for this with my support at, https://bugzilla.mozilla.org/show_bug.cgi?id=94270.

Other's might follow, please.

Saturday, October 24, 2009

Martin Fowler: UML Distilled, 3rd Edition

I have been reading Martin Fowler's book, "UML Distilled, 3rd Edition" since last few months (my book reading has been very slow, keeping in mind the time I spend on web these days, to do most of my learnings).

This is a great UML book (and has only, 175 pages but very good), and I recommend it to anybody wanting to know about UML (Unified Modeling Language).

How XPath compare values in prediates

A user asked question similar to following, on IBM developerWorks XQuery and XPath forum:

What does A = B and A != B mean in XPath expressions?

Michael Kay provided a very nice explanation to this:
The operators "=" and "!=" in XPath use "implicit existential quantification". So A=B is shorthand for "some $a in A, $b in B satisfies $a eq $b" (the longhand form is legal in XPath 2.0), while A!=B is shorthand for "some $a in A, $b in B satisfies $a ne $b".

So, not(A=B) is true if there is no pair of items from A and B that are equal, while (A!=B) is true if there is a pair of values that are not equal. In practice, you nearly always want not(A=B).

Sunday, September 27, 2009

OO multiple inheritance, and Java

I have been thinking again, about multiple inheritance and why Java doesn't support it. I wrote a bit about this topic, some time ago.

There are so, so many resources on web about this, and it's actually very easy to find the answer to this, via a simple web search. Here is an article from where I started to know an answer to this, http://www.javaworld.com/javaqa/2002-07/02-qa-0719-multinheritance.html, which pointed me to this white paper by James Gosling and Henry McGilton. Really, I did not read this white paper by Java creators earlier (it never came across my eyes :)), in spite being familiar and working with Java since long time. Sometimes, we find gems on web in an unexpected ways (I mean, this paper is a gem for me :)). I'll try to read this paper (hopefully fully, and being able to understand it) over the next few days.

And here is a link in this white paper, which explains why Java doesn't support multiple inheritance. The following white paper link is also interesting, which gives a complete overview of C and C++ features, that were omitted in Java language (Java has been influenced from C and C++).

My personal opinion is, that if we must need to use multiple inheritance, we should just try to write programs in C++. On the contrary, my experience in using Java for about a decade, convinces me, that Java is suitable to solve almost any business application problem, and absence of multiple inheritance in Java, is not an hindrance to design good programming abstractions for problem domain. The advantages like Java's byte code portability and web friendliness far outweigh, any disadvantages caused by absence of multiple inheritance. On numerous occasions, I have created Java byte code on Windows, and used it without modification on Unix based systems (and vice versa). This is something which is built into the Java language, and it is cool!

Sunday, September 20, 2009

Xerces-J: XML Schema 1.1 Conditional Type Alternatives (CTA), enhancements

Apache Xerces-J team has made enhancements, to the way XML Schema 1.1, Conditional Type Alternatives (CTA) are evaluated.

Following are summary of the recent XML Schema 1.1 CTA enhancements, being done in Xerces-J:

1) The XML Schema 1.1 spec, allows the implementations to use a smaller XPath 2.0 subset for CTA, and can also provide the full XPath 2.0 language, for CTA evaluations.

Xerces-J had the smaller XPath 2.0 susbset already implemented, for CTA since quite some time (which was contributed by Hiranya Jayathilaka), and that was the only XPath 2.0 support which, Xerces-J CTA implementation had earlier. Xerces-J team recently added full XPath 2.0 support as well, for XML Schema 1.1 CTA, using the PsychoPath XPath 2.0 engine (which is used, by Xerces-J assertions facility as well).

If the user, writes XPath 2.0 expressions adhering to the XPath 2.0 subset for CTA, then the native XPath 2.0 implementation in Xerces-J shall process those XPath expressions. But if the XPath expression parsing fails by the native Xerces-J XPath 2.0 processor, Xerces-J falls back to the PsychoPath processor for XPath evaluation, allowing users to use the full XPath 2.0 language, for CTA.

2) Some time ago, Xerces-J implemented the XSD 1.1 data type, xs:error as well, which is useful in XML Schema 1.1 CTA.

It's been quite a pleasure, working on some of these patches.

Xerces-J: XML Schema 1.1 assertions enhancements

The XML Schema 1.1 language, defines an assertions facility (xs:assert and xs:assertion), which constrain the XML Schema simple and complex types.

Apache Xerces-J implements XML Schema 1.1 assertions. As described in the XSD 1.1 spec, assertions typically have following XML representation:
  <assert
    id = ID
    test = an XPath expression
    xpathDefaultNamespace = (anyURI | (##defaultNamespace | ##targetNamespace | ##local)) 
{any attributes with non-schema namespace . . .}>
    Content: (annotation?)
 </assert>

For XML Schema simple type facets, the assertions are named, xs:assertion (as opposed to the xs:assert instruction for complex types) and rest of assertion contents are same.

Some background about XML Schema 1.1 assertions, and it's implementation in Xerces-J, could be referred at following blog post, which I wrote some time ago.

During the last week, we enhanced Xerces-J assertions to support the assertions attribute, 'xpathDefaultNamespace'. I did contribute this patch to Xerces-J, and it's now available on Xerces-J SVN repository.

The following XML Schema 1.1, specification description, describes how an assertions attribute 'xpathDefaultNamespace' works (please see the section, "XML Mapping Summary for XPath Expression Property Record {default namespace}").

Here's a simple example, about how 'xpathDefaultNamespace' functions in XML Schema 1.1 assertions:

XML document [1]:
  <X xmlns="http://xyz">
    <message>hello</message>
  </X>

XSD 1.1 document [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
             targetNamespace="http://xyz"
             elementFormDefault="qualified">

     <xs:element name="X">
       <xs:complexType>
          <xs:sequence>
             <xs:element name="message" type="xs:string" />
          </xs:sequence>
          <xs:assert test="message = 'hello'" xpathDefaultNamespace="##targetNamespace" />
       </xs:complexType>
     </xs:element>

  </xs:schema>

In the XML document [1] above, since element "message" belongs to the namespace, "http://xyz" (by virtue of the default namespace declaration, xmlns="http://xyz" on element "X"), therefore the XPath (2.0) expression message = 'hello', on xs:assert instruction would return a boolean value "true", only if an element reference "message" in the XPath expression belongs to the namespace, "http://xyz". This namespace information needs to be provided to the XPath engine, via the 'xpathDefaultNamespace' attribute, on xs:assert instruction. If for the above XML instance document [1], 'xpathDefaultNamespace' attribute is not provided on the xs:assert instruction, then the XPath expression, message = 'hello' would return false (as then, the element "message" would be considered in no/null namespace, by the XPath engine), and the element instance at runtime, would become invalid according to such a XSD 1.1 Schema.

Allowing the 'xpathDefaultNamespace' attribute to be working on XML Schema 1.1 assertions, further increases the usefulness of XML Schema 'assertions' instruction, because now the XPath expressions, on assertions can be XML namespace aware.

The implementation of 'xpathDefaultNamespace' attribute on assertions, required enhancing PsychoPath XPath 2.0 engine as well. The updated PsychoPath library, JAR has been copied to Xerces-J SVN repository as well.

Saturday, September 12, 2009

Xerces-J: XML Schema 1.1 inheritable attributes and CTA

I just had my XML Schema 1.1 inheritable attributes patch, and it's integration with XML Schema 1.1 Conditional Type Assignment (CTA) facility, committed to Xerces-J XML Schema 1.1 SVN dev stream. I had earlier written about this, on this blog.

Credits for Xerces-J inheritable attributes work, should also go to Khaled Noaman (a fellow Xerces-J committer), who reviewed my patch and suggested many improvements. The Xerces-J inheritable attributes work, adds to an earlier work for XML Schema 1.1 Conditional Type Assignment (CTA), by Hiranya Jayathilaka (also a fellow Xerces-J committer).

Tuesday, September 8, 2009

PsychoPath XPath 2.0 processor update

Some time ago, I wrote a progress update, about PsychoPath XPath 2.0 processor's (an Eclipse, Web Tools Source Editing subproject) compliance with, W3C XPath 2.0 test suite.

As of today, following are the test suite success rate, for PsychoPath engine:

Tests: 8137
Failures: 163
Errors: 5

This reflects a success rate of: 97.9%.

I think, these recent PsychoPath improvements with W3C XPath 2.0 test suite, reflect a much increased quality of PsychoPath product.

I am hoping, we would reach 100% test suite compliance, with PsychoPath in a near future!

Credits, for most of the recent PsychoPath improvements, should go to Dave Carver and Jesper Steen Møller.

Update on 2009-09-18: Took an update today from the Eclipse CVS servers, for PsychoPath source code, and following are the W3C XPath 2.0 test results:
Tests: 8137
Failures: 132
Errors: 5

Update on 2009-10-17: As of today, following are the W3C XPath 2.0 test results:
Tests: 8137
Failures: 81
Errors: 1

This reflects the W3C XPath 2.0 test suite pass percentage, for PsychoPath engine of about 98.9%. It seems, we are moving closer to the 100% test success rate, for PsychoPath. Dave Carver has been working a lot, on these improvements during last few days.

Saturday, September 5, 2009

Roger L. Costello: About XML Schema 1.1

Roger L. Costello a brilliant thinker, about XML and related technologies, has recently posted extensive study materials related to XML Schema 1.1, on his web site (ref, http://www.xfront.com/xml-schema-1-1/).

Roger mentions, Saxon 9.2 as one of the products that implements XML Schema 1.1 features.

From the information I have from Apache Xerces-J developer's forum, I am happy to share with the community, that Apache Xerces-J is planning to release, a preview of XML Schema 1.1 implementation, by around the end of this year (i.e, Dec, 2009). A latest snapshot of Xerces-J XML Schema 1.1 source code repository and related dependencies, is located at, https://svn.apache.org/repos/asf/xerces/java/branches/xml-schema-1.1-dev/. User's are welcome to build binaries from the sources, available here and try XML Schema 1.1 features, currently implemented in Apache Xerces-J. Much of the XML Schema 1.1 features, are implemented at this Xerces-J snapshot URL.

References:
1) XML Schema Definition Language (XSD) 1.1 Part 1: Structures
2) XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes

Saturday, August 22, 2009

XML document validation, while parsing with Java DOM API

I spent few hours, discovering this while working with the DOM XML parsing API, and using it with Xerces-J, in a Java program.

I wanted to parse an XML document in Java using a plain DOM parser, along with doing validation, using either W3C XML Schema or a DTD.

Following is a sequence of instructions which needs to be written for this:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(true);
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema ...
dbf.setSchema(schema);
DocumentBuilder docBuilder = dbf.newDocumentBuilder();
docBuilder.parse(..


These statements, are all that are necessary to accomplish this task. But there, are few catches here, which I wish to share.

1) If dbf.setValidating(true) is specified, then a DTD is mandatory. Even if W3C XML Schema is provided with dbf.setSchema .., parsing would fail, since dbf.setValidating(true) was specified, and if a DTD is absent.

2) If we only want to do validation with W3C XML Schema, then we shouldn't specify dbf.setValidating(true), which is required only for DTD validation.

I spent a few hours discovering this, and thought that somebody might benefit from this post.

Saturday, August 8, 2009

XML Schema 1.1: inheritable attributes, and it's implementation in Apache Xerces-J

The XML Schema 1.1, language has defined a new facility to define attributes as inheritable.

The XML Schema, attribute definition(s) can now specify an additional property (in 1.1 version of the XML Schema language), inheritable (having a schema type, xs:boolean), which will indicate that all the descendant elements to the element (which specifies an inheritable attribute), can access the inheritable attribute by it's name.

It could first appear to the reader of the XML Schema 1.1 spec, that inheritable attributes are something, which can physically be present (i.e., a copy of it) on descendant elements. But this is not the correct interpretation of the inheritable attributes concept. I'll try to illustrate this point with few examples in this post.

Please consider the following XML Schema 1.1, fragment:
  <xs:element name="X">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Y" type="xs:string" />
      </xs:sequence>
      <xs:attribute name="attr" type="xs:int" inheritable="true" />
    </xs:complexType>
  </xs:element>

This corresponds to, an XML structure like following:

  <X attr="1">
    <Y>hello</Y>
  </X>

The above XML Schema 1.1 fragment, indicates that attribute, "attr" is inheritable. The word inheritable seems to convey, that the following XML fragment could be valid as well, for the above XML Schema 1.1 fragment:

  <X attr="1">
    <Y attr="1">hello</Y>
  </X>

But this interpration of inheritable attributes is not correct. The inheritable attributes, cannot be physically copied to the descendant elements. In the above examples, the Schema type of element, "Y" is a simple type (i.e., xs:string). So how, could Y have an attribute, "attr" (since by definition, elements with simple types cannot have attributes)? Only XML Schema "complex types", can specify attributes. XML Schema 1.1, inheritable attributes do not change the nature of XML Schema simple types, and simple contents. The presence of attributes on any XML element, is governed only by the attribute declarations on the complex type definition of the element. This meaning for attributes with respect to XSD complex types is preserved, in XML Schema 1.1 as well.

Then it's interesting to think, that what could be the use of specifying the attribute as inheritable (when it cannot be physically present in the descendant elements)?

Inheritable attributes, are useful in a XML Schema 1.1 facility, like Conditional Type Assignment (CTA) / type alternatives.

Please consider the following XML Schema 1.1 example, defining an XML element and it's Schema type, using CTA and inheritable attributes:

  <xs:element name="X">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="Y" type="xs:anyType">
           <xs:alternative test="@attr = 'INT'" type="xs:int" />
           <xs:alternative type="xs:error" />
         </xs:element>
       </xs:sequence>
       <xs:attribute name="attr" type="xs:int" inheritable="true" />
     </xs:complexType>
   </xs:element>

As per the above Schema (using type alternatives), the following XML instance is valid:

  <X attr="INT">
    <Y>100</Y>
  </X>

But the following XML instance would be invalid:

  <X attr="INT">
    <Y>hello</Y>
  </X>

The inheritable attribute is also particularly useful, to define the attribute xml:lang as inheritable in XML elements.

I got to know these facts, after raising a query last week, to W3C XML Schema comments forum.

I am thankful to following gentlemen, for answering my queries, on the W3C XML Schema forum:

C. M. Sperberg-McQueen
Noah Mendelsohn
Michael Kay

The fact, which I really wanted to share on this blog post (other than, sharing what the XML Schema, inheritable attributes are used for), was that I've written an implementation of inheritable attributes, for Apache Xerces-J's XML Schema 1.1 validator. I've submitted a patch for this, to Apache Xerces-J JIRA issue tracking system.

This patch currently, has a full implementation of attribute syntax changes (i.e, the presence of inheritable attribute itself, and it's binding with the XML Schema type, xs:boolean).

I'm in a process to, enhance the Xerces-J implementation of Conditional Type Assignment (CTA) facility, to be able to use inheritable attributes. I hope to complete the CTA changes in Xerces-J, for inheritable attributes in near future.

After all necessary reviews are done for this patch, by Xerces-J committers, I hope to have the inheritable attributes implementation, go to Xerces-J SVN repository, which will in most likelihood subsequently become part of an official future release, of Xerces-J.

2009-08-14: Today, I submitted all the Conditional Type Assignment (CTA) related changes, for inheritable attributes, to Apache Xerces-J JIRA issue tracking system. I would say, the XML Schema 1.1 inheritable attributes, and it's integration with CTA is completed, for Xerces-J. I'm feeling good about it :)

Thursday, July 30, 2009

Grady Booch: about Linux, and TTY interfaces

Reading through Grady Booch's latest blog post, I found that Grady has shared interesting information about TTY interfaces in Linux, and of UNIX based systems.

I read the article, which Grady pointed, almost completely and found it a great read.

Something interesting to share, I thought!

Saturday, July 18, 2009

Niklaus Wirth: On current state of software development, and future

Navigating from Dr. Niklaus Wirth's wikipedia web page, I could find a very interesting interview conversation, Dr. Wirth had, on the following web site (ref, http://www.eptacom.net/pubblicazioni/pub_eng/wirth.html).

This interview is dated, in 1997. I found Dr. Wirth's views in this interview, quite good to read.

Something interesting to share, I thought!

Sunday, July 12, 2009

Niklaus Wirth: On recursive algorithms

I have started reading the computer science, classic collection "ALGORITHMS + DATA STRUCTURES = PROGRAMS" by Niklaus Wirth. Dr. Wirth wrote this text in 1975. It's a great book.

Though "recursive algorithms" are widely known to computer science community since long time, I still could find some good advice in Dr. Wirth's book on usage of recursive algorithms.

Dr. Wirth mentions:
"An object is said to be recursive if it partially consists or is defined in terms of itself. Recursion is a particularly powerful means in mathematical definitions. The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions."

We all know, what recursive algorithms are. It's a widely known programming technique. But I found particularly the advice, "When not to use recursion" in Dr. Wirth's book very worth while to apply.

Dr. Wirth further mentions:
"Recursive algorithms are particularly appropriate when the underlying problem or the data to be treated are defined in recursive terms. This does not mean, however, that such recursive definitions guarantee that a recursive algorithm is the best way to solve the problem.

Programs in which the use of algorithmic recursion is to be avoided can be characterized by a schema which exhibits the pattern of their composition. Such schema's can described as following:

[1]
P => if B then (S; P)

or, equivalently

P => (S; if B then P)
"

Dr. Wirth illustrates this principle with a well known, recursive definition of the factorial computation (mentioned below):

F0 = 1
F(i+1) = (i + 1) * f(i)

Dr. Wirth maps the factorial problem with the recursive anti-pattern he defines ([1] above):

[2]
P => if I < n then (I := I + 1; F := I * F; P)
I := 0; F := 1; P

In the above definition [2], S (ref, [1]) refers to,
I := I + 1; F := I * F

Dr. Wirth in the book, illustrates a following, iterative definition of factorial computation:

I := 0;
F := 1;
while I < n do
begin I := I + 1; F := I * F
end


Dr. Wirth says, "The lesson to draw is to avoid the use of recursion when there is an obvious solution by iteration.
This, however, should not lead to shying away from recursion at any price. The fact that implementations of recursive procedures on essentially non-recursive machines exists proves that for practical purposes every recursive program can be transformed into a purely iterative one. This, however, involves the explicit handling of a recursion stack, and these operations will often obscure the essence of a program to such an extent that it becomes most difficult to comprehend. The lesson is that algorithms which by their nature are recursive rather than iterative should be formulated as recursive procedures."

Just thought of sharing a bit of text, from this nice book and encouraging readers to read the book!

Saturday, July 4, 2009

PsychoPath XPath 2.0 processor update

Dave Carver and I have been trying to improve the Eclipse XPath 2.0 processor (a.k.a PsychoPath) during last couple of weeks. My motivation to keep working on PsychoPath engine has been a desire, to help Eclipse and Apache (Apache Xerces-J uses PsychoPath engine for XML Schema 1.1 processing) communities to be able to have a highly compliant XPath 2.0 engine.

Dave has written today, a progress update of PsychoPath development on this blog. I feel, we now have a pretty good XPath 2.0 implementation with PsychoPath. We are continuing to work on remaining non-compliant items, with PsychoPath. The remaining non compliance cases, to my opinion are near edge cases which users don't use too often. But we'll continue to solve them, with each future day and weeks.

Update on 2009-07-11: During last couple of days, Dave Carver has made quite a few useful improvements to PsychoPath, and the W3C XPath 2.0 test suite within Eclipse. I took an update today of the latest PsychoPath sources, and the XPath 2.0 test suite, and following are the latest test results:

Total tests: 8137
Failures: 811
Errors: 48

This reflects, the test pass success rate of about 89.5%. I think, this is quite good. Lot of credit of these improvements should go to Dave Carver. Dave has single handedly, created a JUnit version of the full W3C XPath 2.0 test suite, which is in itself a great feat! Having JUnit tests, helps us tremendously to run the XPath 2.0 tests, from within Eclipse.

Update on 2009-08-09: Following are the current PsychoPath test data:
Total tests: 8137
Failures: 386
Errors: 24
This reflects a test suite pass percentage of about, 95% which looks very impressive. The test suite code coverage, is about 75-80%.
Lot of credit for the latest PsychoPath improvements should go to, "Jesper S Møller" who has recently volunteered to help improve PsychoPath with the XPath 2.0 test suite. Dave Carver is also putting in his time, on PsychoPath improvements.

Friday, June 26, 2009

Multiple inheritance in Java

I have always missed true multiple inheritance in Java (like, in C++). For e.g., we are not able to define a class as follows in Java:

class X extends A, B, C {

}

Though, I do not see any inheritance use case which cannot be solved by the current Java facilities, but I would love to have this facility in Java. I think, the most latest Java version (1.7) doesn't have this feature.

One workaround I can see, for multiple inheritance, is to define a class like following:

class X {
A a;
B b;
C c;
}

i.e., we could create private class members inside X (whose functionality we want to use in class X).

Though this might serve purpose for some of the cases, but it's not true multiple inheritance! This I think, is actually aggregation pattern.

Of course, Java has multiple inheritance of interfaces. But that is inheritance of method signatures, and not of implementation.

I guess, keeping the number of base classes to one, Java is much simpler syntactically, and has a simpler compiler implementation. Though I agree, that having a simple syntax (as the current Java inheritance facilities) which is powerful enough, and can solve many use cases is better, than having a complex syntactical facility, which might serve even more uses cases, but could also lead to semantically difficult programs, which may be difficult to maintain and debug, as complexity of the problem domain increases.

Sunday, June 21, 2009

Primitive long a subtype of float

The Java language specification defines, that primitive "long" is a subtype of primitive "float" (ref, http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.10.1).

But XML Schema Datatypes spec, shows no relationship between xs:float and xs:long (
ref: XML Schema 1.0 data types, XML Schema 1.1 data types).

I'm a little confused, that which concept is correct (Java's definition of this data-type inheritance, or XML Schema). I seem to be in favor of XML Schema definition. But perhaps, XML Schema type system is for XML oriented data, and Java type system is for a wider class of applications. But I'm not sure, if this is the reason for the differences of definitions in Java language spec and XML Schema.

Saturday, June 20, 2009

Became Eclipse WTP committer

I was nominated as Eclipse WTP project committer, for the WTP Source Editing subproject.

As per the voting process, for becoming an Eclipse project committer, the Eclipse WTP Source Editing team, granted me project committership on 18, Jun 2009.

My contributions to PsychoPath XPath 2.0 engine (which is one of the components in WTP Source Editing tooling), helped me become a committer to this project.

I am happy to be included in the Eclipse WTP team. I look forward to contribute more to PsychoPath, and other WTP components. Apart from PsychoPath, I look forward to work on WTP XSL components in near future.

Monday, June 15, 2009

Running first XSLT 2.0 stylesheet with IBM XSLT 2.0 engine

I could run my first XSLT 2.0 stylesheet with IBM XSLT 2.0 engine (ref, WAS XML Feature Pack Open Beta).

I tried the following XSLT 2.0 stylesheet, using xsl:for-each-group instruction, which worked well with the IBM XSLT engine.


<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">

<xsl:output method="xml" indent="yes" />

<xsl:template match="books">
<books>
<xsl:for-each-group select="book" group-by="author">
<author name="{current-grouping-key()}">
<xsl:for-each select="current-group()">
<book>
<xsl:copy-of select="name" />
<xsl:copy-of select="publisher" />
</book>
</xsl:for-each>
</author>
</xsl:for-each-group>
</books>
</xsl:template>

</xsl:stylesheet>

Saturday, June 13, 2009

XML Schema validation with Xerces-J

Hiranya Jayathilaka raised an interesting discussion some time ago on xerces-dev list, that how we could validate an XML Schema 1.0 document using Xerces-J. Hiranya was looking for a solution using a Java API with Xerces-J. We were looking for verifying the correctness of the Schema document, and not doing an XML instance document validation.

I'm providing a summary of the discussion we had on the list, and the conclusions we made.

There are basically three ways of doing this:

1. Using a JAXP SchemaFactory
Using this technique, we do something like below:

SchemaFactory sf = SchemaFactory.newInstance ..
sf.setErrorHandler ..
Schema s = sf.newSchema(new StreamSource(schemapath));

The 'SchemaFactory.newSchema' call would not succeed if XML Schema has a grammar error.

2. Using XSLoader
Using this technique, we do something like below:

XSLoaderImpl xsLoader = new XSLoaderImpl();
XSModel xsModel = xsLoader.loadURI(xsdUri);

Michael Glavassevich suggested, how we could add an error handler to this mechanism:

DOMErrorHandler myErrorHandler = ...;

XSImplementation xsImpl = (XSImplementation) registry.getDOMImplementation("XS-Loader");
XSLoader xsLoader = xsImpl.createXSLoader(null);

DOMConfiguration config = xsLoader.getConfig();
config.setParameter("error-handler", myErrorHandler); // <-- set the error handler

3. Using XMLGrammarPreparser
Using this technique, we do something like below (thanks to Hiranya Jayathilaka for sharing this code):

XMLGrammarPreparser preparser = new XMLGrammarPreparser();
preparser.registerPreparser(XMLGrammarDescription.XML_SCHEMA, null);
preparser.setFeature("http://xml.org/sax/features/namespaces", true);
preparser.setFeature("http://xml.org/sax/features/validation", true);
preparser.setFeature("http://apache.org/xml/features/validation/schema", true);
preparser.setErrorHandler(new MyErrorHandler());
Grammar g = preparser.preparseGrammar(XMLGrammarDescription.XML_SCHEMA, new XMLInputSource(null, xsdUrl, null));

Michael Glavassevich provided a nice comparison of these three approaches:
SchemaFactory - it is an entry point into the JAXP Validation API for loading schemas for validation. If it was a user asking I'd recommend SchemaFactory of the three choices since it's in Java 5+ and would work in environments where Xerces isn't available.
XSLoader- it is an entry point into the XML Schema API for obtaining an XSModel for analysis/processing of the component model.
XMLGrammarPreparser - it provides API for preparsing schemas and DTDs for use in grammar caching (i.e. a lower-level alternative to SchemaFactory).

Saturday, May 30, 2009

Function parameters vs global variables

I bumped upon this problem, and thought of sharing my experiences here.

There are occasions, where I have to write an XSLT function and simply use it. For e.g. (this is just an illustration. we could have more function parameters, and a different return type),

<xsl:function name="fn:somefunction" as="xs:boolean">
<xsl:param name="pName" as="xs:string" />

<!-- use $pName and the variable, $someList (defined below) -->
<xsl:sequence select="something.." />
</xsl:function>


The evaluation of this function also depends on some data/information other than the parameters being passed. This external information on which the function depends, could be a global variable. Say for e.g.,

<xsl:variable name="someList" as="element()+">
<x>a</x>
<x>b</x>
..
..
</xsl:variable>


In my case, this is a fairly static data (the variable, $someList), and is needed for the evaluation of above function.

I could see two option, on how the function may use an external data ($someList in this case):
1) Have an external data as a global variable (as illustrated above)
2) Supply this data as additional function parameter

I was in a sort of dilemma recently, where I had to decide whether I should go for option 1) or 2).

In my case, I opted for option 1) i.e., the global variable.

I can think of few pros and cons of both of the above options:
1. Having a global variable: This is good, if the external information is fairly static and perhaps has big chunk of data. Having global variable could be also useful, if the data is shared between multiple functions.
2. Having a parameter for the data: This option looks good from the point of view of the principle of composability. Functional programming advocates like this idea. I think, in classical computer science theory, a function (a callable module) is an abstraction which takes some input and produces some output. I think, the notion of functions accessing data which exists outside it's body is a mechanism devised by specific programming languages, and not as such defined by computer science theory. So from the point of view of this idea, having parameter for data is a good option. In fact I would also support this option, as far as possible.

In my case, I was working with XSLT. But I guess, these concepts would apply to many of other programming languages as well.

This topic could turn into a discussion, about how we must write good computer programs.

Any ideas are welcome please.

Tuesday, May 26, 2009

PsychoPath XPath 2.0 processor update

We recently implemented quite a few built in XSD numeric data types in PsychoPath XPath 2.0 processor (ref, http://www.w3.org/TR/xmlschema-2/#built-in-datatypes). Now all (I mean, really all of xs:decimal ones :)) the data types in the xs:decimal hierarchy are available in PsychoPath, and these should be available in Eclipse WTP 3.2 M1 (which should be released sometime soon after the Eclipse Galileo release, at around June 09' end).

Now almost all the major built in Schema types are available in PsychoPath, except for few subtypes of xs:string (like xs:normalizedString, xs:token etc.). These shouldn't be much difficult to add.

Dave Carver reported, that the improvements we have done recently in PsychoPath have significantly improved it's compliance to the W3C XPath 2.0 test suite.

PsychoPath processor version is now enhanced from 1.0 to 1.1.

Sunday, May 17, 2009

Xerces-J XSD 1.1 assertions and PsychoPath XPath 2.0 processor update

I recently contributed few patches to the Eclipse PsychoPath XPath 2.0 engine, to support Schema aware XPath (2.0) expressions. These patches would enhance Schema aware support in PsychoPath XPath2 engine, for element and attribute nodes, for the XML Schema primitive types.

These enhancements in PsychoPath engine would make XPath expressions like following possible, to be evaluated by PsychoPath engine:

person/@dob eq xs:date('2006-12-10') // if dob is an attribute, and of schema type xs:date

person/dob eq xs:date('2006-12-10') // if dob is an element, and of schema type xs:date

@max ge @min // this would work if 'max' and 'min' have say schema types, xs:int

The patch in my local environment already exhibits these improvements. As promised by Dave Carver (the PsychoPath engine project lead), users would likely get these improvements in Eclipse WTP (Web Tools Project) 3.2.
2009-05-24: These changes are now committed to the Eclipse CVS server, and the improvements are flagged to be delivered in Eclipse WTP 3.2 M1, which should be quite sooner. Thanks to Dave Carver for testing all my patches, and committing them to the server.

PsychoPath engine already has a framework (thanks to Andrea Bittau and his team) for supporting Schema awareness (based on the Xerces-J XSD schema model). I just added in small pieces of code in attribute and element node implementations (particularly, improving the "typed value" of attribute and element nodes for built in XSD schema types), to enhance schema aware support.

I think, we are gradually moving to a more mature schema aware support in PsychoPath.

I'm also using these new PsychoPath processor capabilities, to implement schema aware XPath 2.0 evaluations in Xerces-J XSD assertions support.

I'm currently working on to construct a typed XPath data model instance, for XSD 1.1 assertions evaluations. Having this capability, would allow users to write XPath expressions like, following:

@max ge @min

or

person/@dob eq xs:date('2006-12-10')

In the absence of this (i.e, typed XDM nodes), currently users have to make explicit cast operations, like following:

xs:int(@max) ge xs:int(@min)

or

xs:date(person/@dob) eq xs:date('2006-12-10')

The XML Schema 1.1 assertions spec recommends a typed XDM instance.

We hope to provide this capability within Xerces-J, inline with the XML Schema 1.1 assertions specification.

2009-05-23: These improvements are now implemented, and I've submitted the code improvements to the Apache Xerces-J JIRA server. I'm hoping, we'll have these improvements committed on the Xerces-J SVN server some time soon.

Friday, May 8, 2009

Became Apache Xerces-J committer

As per the voting process for becoming an Apache project committer, the Apache Xerces-J team granted me the committer status for the Xerces-J project on May 5, 2009.

This gives me an opportunity to contribute to the Xerces-J codebase, in a more direct way.

It's indeed a privilege for me to be part of the core Xerces team. Starting from being an Xerces user (since long time ago :)), to becoming project committer has been a rewarding journey in numerous ways.

Sunday, May 3, 2009

Apache Xerces-J assertions implementation and PsychoPath XPath 2.0 processor

I shared sometime back on this blog, on the work I am doing regarding XML Schema 1.1 assertions support in Xerces-J. The XML Schema 1.1 assertions processing requires a XPath 2.0 processor for performing Schema validation.

The Xerces-J team has opted to use the open source XPath 2.0 processor, PsychoPath. PsychoPath was developed by Andrea Bittau and his team. The PsychoPath team donated the PsychoPath code base to Eclipse community, where it is now formally used in the Eclipse, Web Tools Platform project. Future enhancements to PsychoPath are now taking place at Eclipse.

Since Xerces-J is using PsychoPath XPath 2.0 engine, we wish that PsychoPath be ideally 100% compliant to the XPath 2.0 specification, so Xerces-J users can use much of the failities of the XPath 2.0 language while using XML Schema 1.1 assertions.

After looking at the PsychoPath source code and using it quite a bit, my personal observation is, that PsychoPath has a pretty good XPath 2.0 implementation. Please refer to this documentation for knowing more about PsychoPath and the current compliance status.

The Eclipse WTP team is working actively to solve any remaining non-compliant items in PsychoPath. Incidentally, I have been working recently to help improve PsychoPath's compliance to the XPath 2.0 spec, and have contributed few patches to Eclipse.

We are also planning to run the W3C XPath 2.0 test suite on PsychoPath, and targetting PsychoPath to pass the test suite, with 100% coverage. This should give the PsychoPath adopters more confidence while using it.

Thursday, April 23, 2009

WAS XML Feature Pack Open Beta

There was an annoucement recently from IBM (http://webspherecommunity.blogspot.com/2009/04/was-open-xml-feature-pack-beta.html), about availability of "WAS XML Feature Pack Open Beta" supporting XPath 2.0, XSLT 2.0 and XQuery 1.0. It was good to know this.

Therefore, users would be able to use XPath 2.0, XSLT 2.0 and XQuery 1.0 in a WAS environment, using IBM's own processors for these languages.

This is an early preview release, with more enhancements expecting to come later.

I'm looking forward to try these language processors myself.

Tuesday, April 21, 2009

Xerces-J: XML Schema 1.1 assertions support

This post is related to my earlier blog post, http://mukulgandhi.blogspot.com/2008/07/assertions.html about the XML Schema 1.1 assertions implementation into Xerces-J.

Today, I reached an important milestone with all the development finished for assertions in Xerces-J, and submitted an Apache JIRA issue for review.

Here is a small example of what XML Schema 1.1 assertions means:
<xs:complexType name="book">
    <xs:sequence>
      <xs:element name="name" type="xs:string" />
      <xs:element name="author" type="xs:string" />
      <xs:element name="price" type="xs:string" />
      <xs:element name="publisher" type="xs:string" />
      <xs:element name="pub-date" type="xs:date" /> 
    </xs:sequence>
    <xs:assert test="ends-with(price, 'USD')" />
    <xs:assert test="pub-date > xs:date('2007-12-31')" />
  </xs:complexType>

With this XML Schema 1.1 fragment, the user wants to have a validation constraint that, the price string should end with literal 'USD' and pub-date should be greater than the date 2007-12-31. This is a very simple example, but it does signify the usefulness of assertions syntax. We could have unlimited (0-n) numbers of xs:assert elements in a XSD schema type (which could be a simple type or a complex type. Though the assertions facet name in simple types is named xs:assertion). The value of 'test' attribute in assertions is an XPath 2.0 expression. All the assertions have to evaluate to boolean, "true" for an element to be locally valid.

There could be many other scenarios (and some of them quite complex, like for e.g., assertions present in a Schema type hierarchy) for writing assertions in XML Schema 1.1. It's difficult to specify all of them here. I'd ask the reader, to read the article [2] below, for learning about many of other, XML Schema 1.1 assertions scenarios.

With assertions in XML Schema 1.1 language, we could express much more involved XML validation constraints, that were almost impossible to specify in XML Schema 1.0. Using assertions, we can specify relationships between elements (like element names, contents etc), between elements and attributes, between attributes, and perhaps much more.

The assertions processing in XML Schema 1.1 works as follows:
When a XML Schema (1.1) processor encounters an element in the XML instance document, it must validate the element (if the user has requested validation) with it's associated type in the Schema (which could be a simple type or a complex type). The element's type declaration could be anonymous, or it could be a named type (which has a "name" attribute, and they are globally defined in the schema) declaration in the Schema. The XML Schema processor builds a XPath data model (XDM) tree rooted at this element (with Xerces, a XDM tree is built only if any assertions (which could be, 1-n in numbers) are associated with an element's type. If schema types of XML attributes have assertion facets, then these assertion facets work upon the attribute's value, and no XDM tree is constructed in this case). The XDM tree consists of the root element, it's attributes and all it's descendants. When an element validation is going on within Xerces, assertions evaluation also takes place as part of the validation process. Each assertion is evaluated on the XDM tree rooted at a given context element. Therefore, also any attempt by the assert XPath expression to access any node outside this element tree will not succeed.

We also have a wiki page for Xerces assertions implementation, http://wiki.apache.org/xerces/XML_Schema_1.1_Assertions. It describes a bit of implementation details of assertions in Xerces.

I'm happy to share that we expect Xerces-J to support the whole of assertions implementation in a near future release. And of course, Xerces would support lot of other XML Schema 1.1 features as well.

Following are few nice articles related to XML Schema 1.1, which are worth reading:
1. Overview of XML Schema 1.1 language
2. XML Schema 1.1 co-occurence constraints using XPath 2.0