Tuesday, July 26, 2011

[revisiting] Xerces-J XSModel serializer

I started playing a bit with Xerces-J XSSerializer utility (it's actually a sample within Xerces-J and was introduced in Xerces-J 2.10.0 -- the version in SVN is slightly better and will be released with a future Xerces release; and it serializes an in-memory Xerces XSModel instance into a lexical XSD syntax), and thought of writing something about it's features.

XSModel serializer has following two important (and currently the only ones) serialization features/options:
1. Selecting the XSD language version, the XSModel serializer should work with. By default this is XSD 1.0, but it can be set to XSD 1.1 via the following command line parameter, {-version 1.1}. There are very few XSD 1.1 features that the XSModel serializer currently supports. We'll try to add more XSD 1.1 features in future to the XSModel serializer. But the XSD 1.0 support with Xerces's XSModel serializer is fairly complete.
2. The XSD language prefix during serialization output can be configured with the option, {-prefix <prefix-value>}. For e.g "-prefix xsd". If this option is not specified, the prefix "xs" is generated as default during XSModel instance serialization.

I've had few interesting observations while using the Xerces XSSerializer (illustrated with small examples below),

1. I supplied the following XSD document (only the element declaration is shown, since this is the focus of this point) to the XSModel serializer,
<xs:element name="E1">
   <xs:simpleType>
      <xs:list>
         <xs:simpleType>
           <xs:restriction base="xs:string">
              <xs:minLength value="5"/>
           </xs:restriction>
         </xs:simpleType>
      </xs:list>
   </xs:simpleType>
</xs:element>

and the XSModel serializer echoed this element instance (the XSModel serializer converted the lexical schema into XSModel instance, and then serialized the XSModel again to lexical XSD syntax) to following,
<xs:element name="E1">
   <xs:simpleType>
      <xs:list>
         <xs:simpleType>
            <xs:restriction base="xs:string">
               <xs:whiteSpace value="preserve"/>
               <xs:minLength value="5"/>
            </xs:restriction>
         </xs:simpleType>
      </xs:list>
   </xs:simpleType>
</xs:element>

The interesting thing I notice in this example is, the generation of the built in facet "whiteSpace" for the XSD type xs:string.

2. Serializing the following XSD element,
<xs:element name="E1">
   <xs:simpleType>
      <xs:list>
         <xs:simpleType>
            <xs:restriction base="xs:integer">
               <xs:minInclusive value="5"/>
            </xs:restriction>
         </xs:simpleType>
      </xs:list>
   </xs:simpleType>
</xs:element>
produces the following round-trip output with the XSModel serializer,
<xs:element name="E1">
   <xs:simpleType>
      <xs:list>
         <xs:simpleType>
            <xs:restriction base="xs:integer">
               <xs:whiteSpace value="collapse"/>
               <xs:fractionDigits value="0"/>
               <xs:minInclusive value="5"/>
               <xs:pattern value="[\-+]?[0-9]+"/>
            </xs:restriction>
         </xs:simpleType>
      </xs:list>
   </xs:simpleType>
</xs:element>
this shows the built in facets for the XSD type xs:integer ("whiteSpace", "fractionDigits" and others).

I personally like this feature of XSModel serializer, that it is able to generate certain hidden properties of XML Schema components, which the schema authors normally don't specify while writing the schema documents for applications.

3. I provided the following XSD Schema fragment to XSModel serializer (a complexType referring to a model group),
<xs:element name="E1">
  <xs:complexType>
     <xs:group ref="gp1"/>
  </xs:complexType>
</xs:element>
   
<xs:group name="gp1">
   <xs:sequence>
      <xs:element name="x" type="xs:string"/>
      <xs:element name="y" type="xs:string"/>
   </xs:sequence>
</xs:group>

and the XSModel serializer generated the following round-trip serialization result,
<xs:element name="E1">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="x" type="xs:string"/>
         <xs:element name="y" type="xs:string"/>
      </xs:sequence>
   </xs:complexType>
</xs:element>

<xs:group name="gp1">
   <xs:sequence>
      <xs:element name="x" type="xs:string"/>
      <xs:element name="y" type="xs:string"/>
   </xs:sequence>
</xs:group>
The global "model group" is serialized as expected. But the complexType within the element declaration was serialized with it's element declarations expanded. The lexical group reference is not present in the serialized output.

At first this may look odd (i.e the absence of the model group reference) in the serialized output. But the fact is, that Xerces XSModel instance in it's complete compiled form, doesn't know whether a group particle (in this case xs:sequence) comes from a group reference. And I had to live with this XSModel serialization characteristic. But the serialized schema output in this example is equivalent to the original schema document (which was supplied to the XSModel serializer) from validation perspective (but the global group definition in the output in this case is redundant from validation perspective, and it's just a characteristic of the XSModel serializer currently).

That's all I have to say now. Thanks for reading this post.