Sunday, September 25, 2016

XML Schema 1.1 : accessing an XML tree structure during validation

In this post, I'll discuss an XML Schema validity definition that spans sibling elements in an XML document. Implementing this has become possible with XML Schema 1.1, by its new co-occurence facility.

Here's the XML document that needs to be validated by an XML Schema document:

<?xml version="1.0" encoding="UTF-8"?>
<X>
    <a>1</a>
    <b>2</b>
    <c>3</c>
    <d>4</d>
    <e>5</e>
</X>

The validation requirement is : write an XML Schema document, that meets following conditions:
Element "X" is valid, if sum of values within its child elements is greater than 7 (this is a hypothetical number for this problem).

The following XML Schema document solves this problem:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
         <xs:complexType>
             <xs:sequence>
                 <xs:element name="a" type="xs:int"/>
                 <xs:element name="b" type="xs:int"/>
                 <xs:element name="c" type="xs:int"/>
                 <xs:element name="d" type="xs:int"/>
                 <xs:element name="e" type="xs:int"/>
            </xs:sequence>         
            <xs:assert test="sum(*) gt 7"/>
            <!-- this assert also does the same thing : <xs:assert test="sum(a | b | c | d | e) gt 7"/> -->
         </xs:complexType>
    </xs:element>
   
</xs:schema>

This and earlier few posts illustrates the usefulness that XML Schema 1.1 <assert> (and also <assertion>) construct has. The simplicity behind this is, that XML Schema 1.1 <assert> / <assertion> can use the whole 'schema type aware' XPath 2.0 language, expressions of which work on the context tree (in case of <assert>) on which a particular set of schema <assert>'s works. Remember that, <assertion> is a facet (just like <minInclusive> for example) that has access only to an atomic value that is validated.

Please don't be mislead by the title of this post, "accessing an XML tree structure during validation" in a sense that, it applies only to an <assert>. It means also similarly, for example during the complex type definition of an XML element (in which we're defining the XML structure as a tree below a specific XML element). This post uses this terminology for XML Schema 1.1 <assert> trees, and not for other kinds of trees as mentioned.

Saturday, September 24, 2016

XML Schema 1.1 assertion facet on a simple type list and union

Here's some more information I have on using an XML Schema 1.1 <assertion> facet, when a simple type is used that has variety list or union. Note that, an XSD simple type can be of following 3 kinds:

1) <xs:simpleType
        <xs:restrition base="some-type

2) <xs:simpleType
       <xs:list itemType="some-type

3) <xs:simpleType
       <xs:union memberTypes="type-1, type-2, ..."


Example of an XML Schema simple type with variety list:
XML document:
<?xml version="1.0" encoding="UTF-8"?>
<X>1 2 3 4 5</X>

Write an XML Schema 1.1 document, that will report an XML document as valid when following conditions are met:
The element "X" has a simple type with variety list, such that the item type of the list is a simple type that validates even numbers.

The following XML Schema 1.1 document, is the solution for this requirement:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
        <xs:simpleType>
            <xs:list itemType="EvenNum"/>
        </xs:simpleType>
   </xs:element>
  
   <xs:simpleType name="EvenNum">
        <xs:restriction base="xs:int">
            <xs:assertion test="$value mod 2 = 0"/>
        </xs:restriction>
   </xs:simpleType>
 
</xs:schema>

In this example, the XML document has following values as invalid in the list: 1, 3 & 5. When validated with Apache Xerces, following XML Schema validation outcome is reported:

[Error] list.xml:2:17: cvc-assertions-valid: Value '1' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] list.xml:2:17: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'X' on schema type 'EvenNum' did not succeed. Assertion failed for an xs:list member value '1'.
[Error] list.xml:2:17: cvc-assertions-valid: Value '3' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] list.xml:2:17: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'X' on schema type 'EvenNum' did not succeed. Assertion failed for an xs:list member value '3'.
[Error] list.xml:2:17: cvc-assertions-valid: Value '5' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] list.xml:2:17: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'X' on schema type 'EvenNum' did not succeed. Assertion failed for an xs:list member value '5'.

A valid XML document will be, for example: <X>2 4</X>.

Example of an XML Schema simple type with variety union (its called union, because the value space of the simple type is a union of 2 or more simple types):
XML document:
<?xml version="1.0" encoding="UTF-8"?>
<X>
    <val>3</val>
    <val>2017-12-05</val>
</X>

Write an XML Schema 1.1 document, that will report an XML document as valid when following conditions are met:
The element "X" has an XSD complex type with following description,
Its a sequence of "val" elements (let's say maxOccurs of it is 5, or it could be unbounded if you wish). The value of element "val" is defined by the following simple type,
Its an union of 2 simple types T1 and T2 with following definitions:

<!-- a simple type validating even numbers -->
<xs:simpleType name="T1">
        <xs:restriction base="xs:int">
            <xs:assertion test="$value mod 2 = 0"/>
        </xs:restriction>
</xs:simpleType>  

<!-- a simple type that validates specific date values; values that are less than a specific date -->
<xs:simpleType name="T2">
      <xs:restriction base="xs:date">
          <xs:assertion test="$value lt xs:date('2016-01-01')"/>
      </xs:restriction>
</xs:simpleType>

The following XML Schema 1.1 document is a complete schema document, that is a solution for this requirement:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
        <xs:complexType>
            <xs:sequence>
               <xs:element name="val" maxOccurs="5">
                  <xs:simpleType>
                      <xs:union memberTypes="T1 T2"/>
                  </xs:simpleType>
               </xs:element>
            </xs:sequence>
        </xs:complexType>
   </xs:element>
  
   <xs:simpleType name="T1">
        <xs:restriction base="xs:int">
            <xs:assertion test="$value mod 2 = 0"/>
        </xs:restriction>
   </xs:simpleType>
  
   <xs:simpleType name="T2">
        <xs:restriction base="xs:date">
            <xs:assertion test="$value lt xs:date('2016-01-01')"/>
        </xs:restriction>
   </xs:simpleType>
 
</xs:schema>

For the XML document given (an invalid one), the following validation outcomes are reported by Apache Xerces's XML Schema 1.1 validator:

[Error] union.xml:3:15: cvc-assertions-valid-union-elem: Value '3' is not facet-valid with respect to the specified assertions, on type '#AnonType_valX' on element 'val'.
[Error] union.xml:3:15: cvc-datatype-valid.1.2.3: '3' is not a valid value of union type '#AnonType_valX'.
[Error] union.xml:3:15: cvc-type.3.1.3: The value '3' of element 'val' is not valid.
[Error] union.xml:4:24: cvc-assertions-valid-union-elem: Value '2017-12-05' is not facet-valid with respect to the specified assertions, on type '#AnonType_valX' on element 'val'.
[Error] union.xml:4:24: cvc-datatype-valid.1.2.3: '2017-12-05' is not a valid value of union type '#AnonType_valX'.
[Error] union.xml:4:24: cvc-type.3.1.3: The value '2017-12-05' of element 'val' is not valid.

It should be fairly easy, to specify one of a valid XML documents for this requirement.

XML Schema 1.1 assertion facet revisited

Consider the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<X>
    <a>1</a>
    <a>2</a>
    <a>3</a>
    <a>4</a>
    <a>5</a> 
</X>

We have the following requirement for XML Schema validation : The element "X" will be considered valid, when the values in each of element "a" within it has mathematical even values. In the example above, following three values of elements "a" makes the element "X" invalid : 1, 3 & 5. The following XML Schema 1.1 document using the <assertion> facet (its a facet just like XML Schema 1.0 facets "minInclusive" etc), implements these requirements:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="a" maxOccurs="10">
                    <xs:simpleType>
                        <xs:restriction base="xs:int">
                            <xs:assertion test="$value mod 2 = 0"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
   </xs:element>
 
</xs:schema>

Implementing this requirement, requires using the <assertion> facet, since we have to use the XPath 2.0 "mod" operator to test for even values.

When using Apache Xerces as an XML Schema 1.1 validation engine, we get the following outputs for the validation attempt:
 [Error] x1.xml:3:11: cvc-assertions-valid: Value '1' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] x1.xml:3:11: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'a' on schema type '#AnonType_aX' did not succeed.
[Error] x1.xml:5:11: cvc-assertions-valid: Value '3' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] x1.xml:5:11: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'a' on schema type '#AnonType_aX' did not succeed.
[Error] x1.xml:7:11: cvc-assertions-valid: Value '5' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] x1.xml:7:11: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'a' on schema type '#AnonType_aX' did not succeed.

This is a really nice capability of XML Schema 1.1 I think. Also note that, within error messages we see type names as '#AnonType_aX' (this is fine and great, and is a historical Apache Xerces error reporting, and it stands for anonymous XML Schema types since the type doesn't have a name). Had we given a specific name to the complex type, like "TypeX", then that would have appeared in the error messages if errors are there during the XML document validation.


Saturday, September 10, 2016

dual booting Windows 10 professional with Linux. Is this doable?

I've tried somewhat dual booting Windows 10 professional with Linux, and wish to share my experiences. In a nutshell my experience is, it shouldn't be done if you already have Windows 10 professional installed on your machine and you wish to retain it after the dual boot installation. Here's what I found: 1) First of all, do this if you really need a dual boot. Don't do specially for fun-learning, or if your computer is doing an important work. 2) Please remember before attempting a dual boot, that Windows 10 professional is a great OS. Just for the sake of using Linux (for example, if you don't understand or cannot perform a specific function on Windows 10). Please make every attempt to repair or know the needed function of Windows 10, or ask Microsoft for support. 3) I ended up wasting approximately 22 GB of my hard disk space (although it should be recoverable with a good hard-disk partition manager. they're able to merge different kinds of partitions), in an attempt to make a dual boot as said. 4) If you want to use Linux, my suggestion is please use a workstation or a server that only has Linux. i.e when needing or wishing to enjoy Linux, don't do dual boot with Windows. For home users, Windows is usually the best choice. 5) When using Linux, it's usually hard to decide (if you're not very technical minded) which Linux variant you'd require best (for e.g Ubuntu, RedHat etc). I was searching for a free Linux version. I tried Ubuntu, but failed to grasp the dual-boot fundamentals with it to some extent. I also wanted to try Fedora (since I trust RedHat very much), but didn't had the guts to proceed (i.e risking my Windows installation). Please do these things very carefully, and if required please take the help of professionals who have done this work before. 09/17/2016 : I was able to recover my 22GB of unpartitioned hard-disk space (that I allocated to install Linux), and merged that into my Windows C:\ drive, using a tool called Minitool Partition Wizard. I'm happy for the time being.

Sunday, May 19, 2013

thanks to OxygenXML folks

On behalf of Xerces-J XML Schema team, I would like to thank folks from Oxygen XML team to highlight many important bugs within Xerces-J XSD 1.1 validator. We've been able to solve many of those reported bugs, and I feel this has made implementation of Xerces-J XSD 1.1 validator quite better.

Here's the list of issues reported by Oxygen folks during the past 1-2 years I guess, which are either resolved or closed:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20XERCESJ%20AND%20issuetype%20%3D%20Bug%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20reporter%20in%20%28radu_coravu%2C%20%22octavian.nadolu%22%29

In the above report, you might ignore bugs dated as old as 2006, which must have been resolved within an existing or an earlier Xerces-J version.

Other than the bugs reported by Oxygen XML folks, we also received bug reports from other members of XML community. Thanks to those persons also. 

I'm not sure when we're going to release next version of Xerces-J which should have many implementation improvements. Taking a very pessimistic view wrt this, I expect a new version of Xerces-J sometime later this year or might slip to next year.

Thursday, November 15, 2012

new thoughts about XSD 1.1 assertions

I've been thinking on these XSD topics for a while, and thought of summarizing my findings here.

Let me start this post by writing the following XML instance document (which will be the focus of all analysis in this post):

XML-1
<list attr="1 2 3 4 5 6">
    <item>a1</item>
    <item>a2</item>
    <item>a3</item>
    <item>a4</item>
    <item>a5</item>
    <item>a6</item>
</list>

We need to specify an XSD schema for the XML document above (XML-1), providing the following essential validation constraints:
1) The value of attribute "attr" is a sequence of single digit numbers. A number here can be modeled as an XSD type xs:integer, or as a restriction from xs:string (as we'll see below).
2) Each string value within an element "item" is of the form a[0-9]. i.e, this string value needs to be the character "a" followed by a single digit numeric character. We'll simply specify this with XSD type xs:string for now. We want that, each numeric character after "a" should be pair-wise same as the value at corresponding index within attribute value "attr". The above sample XML instance document (XML-1) is valid as per this requirement. Therefore, if we change any numeric value within the XML instance sample above (either within the attribute value "attr", or the numeric suffix of "a") only within the attribute "attr" or the elements "item", the XML instance document must then be reported as 'invalid' (this observation follows from the requirement that is stated in this point).

Now, let me come to the XSD solutions for these XML validation requirements.

First of all, we would need XSD 1.1 assertions to specify these validation constraints (since, this is clearly a co-occurrence data constraint issue.). Following is the first schema design, that quickly came to my mind:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
    <xs:element name="list">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="item" type="xs:string" maxOccurs="unbounded"/>
           </xs:sequence>
           <xs:attribute name="attr">
              <xs:simpleType>
                 <xs:list itemType="xs:integer"/>
              </xs:simpleType>
           </xs:attribute>
           <xs:assert test="deep-equal(item/substring-after(., 'a'), data(@attr))"/>
        </xs:complexType>
    </xs:element>
   
</xs:schema>

The above schema is almost correct, except for a little problem with the way assertion is specified. As per the XPath 2.0 spec, the "deep-equal" function when comparing the two sequences for deep equality checks, requires that atomic values at same indices in the two sequences must be equal as per the rules of equality of an XSD atomic type. Within an assertion in the above schema, the first argument of "deep-equal" has a type annotation of xs:string* and the second argument has a type annotation xs:integer* (note that, the XPath 2.0 "data" function returns the typed value of a node) and therefore the "deep-equal" function as used in this case returns a 'false' result.

Assuming that we would not change the schema specification of "item" elements and the attribute "attr", the following assertion would therefore be correct to realize the above requirement:

<xs:assert test="deep-equal(item/substring-after(., 'a'), for $att in data(@attr) return string($att))"/>

(in this case, we've converted the second argument of "deep-equal" function (highlighted with a different color) to have a type annotation xs:string* and did not modify the type annotation of the first argument)

An alternative correct modification to the assertion would be:

<xs:assert test="deep-equal(item/number(substring-after(., 'a')), data(@attr))"/>

(in this case, we convert the type annotation of the first argument of "deep-equal" function to xs:integer* and do not modify the type annotation of the second argument)

I now propose a slightly different way to specify the schema for above requirements. Following is the modified schema document:

XS-2
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
    <xs:element name="list">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="item" type="xs:string" maxOccurs="unbounded"/>
           </xs:sequence>
           <xs:attribute name="attr">
              <xs:simpleType>
                 <xs:list itemType="NumericChar"/>
              </xs:simpleType>
           </xs:attribute>
           <xs:assert test="deep-equal(item/substring-after(., 'a'), data(@attr))"/>
        </xs:complexType>
    </xs:element>
  
    <xs:simpleType name="NumericChar">
       <xs:restriction base="xs:string">
          <xs:pattern value="[0-9]"/>
       </xs:restriction>
    </xs:simpleType>
  
</xs:schema>

This schema document is right in all respects, and successfully validates the XML document specified above (i.e, XML-1). In this schema we've made following design decisions:
1) We've specified the itemType of list (the value of attribute "attr" is this list instance) as "NumericChar" (this is a user-defined simpleType, that uses the xs:pattern facet to constrain list items).
2) The "deep-equal" function as now written in the schema XS-2, has the type annotation xs:string* for both of its arguments. And therefore, it works fine.

I'll now try to summarize below the pros and cons of schema XS-2 wrt the other correct solutions specified earlier:
1) If the simpleType definition of attribute "attr" is not used in another schema context (i.e, ideally if this simpleType definition is the only use of such a type definition). Or in other words there is no need of re-usability of this type. Then the solution with schema XS-2  is acceptable.
2) If a schema author thought, that list items of attribute "attr" need to be numeric (due to semantic intent of the problem, or if the list's simpleType definition needs to be reused at more than one place and the other place needs a list of integers), then the schema solutions like shown earlier would be needed.

Here's another caution I can point wrt the schema solutions proposed above,
The above schemas would allow values within "item" elements like "pqra5" to produce a valid outcome with the "substring-after" function that is written in assertions. Therefore, the "item" element may be more correctly specified like,

<xs:element name="item" maxOccurs="unbounded">
    <xs:simpleType>
         <xs:restriction base="xs:string">
              <xs:pattern value="a[0-9]"/>
         </xs:restriction>
    </xs:simpleType>
</xs:element>

It is also evident, that XPath 2.0 "data" function allows us to do some useful things with simpleType lists, like getting the list's typed value and specifying certain checks on individual list items (possibly different checks on different list items) or accessing list items by an index (or a range of indices). For e.g, data(@attr)[2] or data(@attr)[position() gt 3]. This was not possible with XSD 1.0.

I hope that this post was useful, and hoping to come back with another post sometime soon.

Sunday, July 22, 2012

XSD 1.1 assertions with complexType extensions

I thought, it would be good to write this post here and sharing with XML Schema folks.

There was an interesting debate on xmlschema-dev list recently, where we argued that what is the benefit of specifying an XSD 1.1 assertion within a XSD complexType that is derived from another complexType via an extension operation. It was initially thought, that an assertion within such a derived complexType would produce (and always) an XML content model restriction effect (which is opposed to the actual intent of complexType extension) -- if this is the only affect of assertions in this case, then using assertions in this case is counter intutive. Therefore, would there be any benefit of specifying assertions within a derived XSD complexType when using an extension derivation (and XSD 1.1 language currently provides this facility)?

After some thought, we found a benefit of using assertions for this scenario. Following is an example, illustrating one of the benefits of assertions for this case:

XSD Schema document (XS1):
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:complexContent>
             <xs:extension base="T1">
                <xs:sequence>
                   <xs:element name="c" type="xs:string"/>
                </xs:sequence>
                <xs:assert test="a = c">
                   <xs:annotation>
                      <xs:documentation>
                         The value of element "a" must be equal to value of element "c".
                      </xs:documentation>
                   </xs:annotation>
                </xs:assert>
             </xs:extension>
          </xs:complexContent>
       </xs:complexType>
    </xs:element>
    
    <xs:complexType name="T1">
       <xs:sequence>
          <xs:element name="a" type="xs:string"/>
          <xs:element name="b" type="xs:string"/>
       </xs:sequence>
    </xs:complexType>

</xs:schema>

XML instance document (XML1):
<X>
  <a>same</a>
  <b/>
  <c>same</c>
</X>

We want to validate the XML instance document, XML1 above with the schema shown above (XS1). The XML content within element "X", is declared via an XSD complexType that is derived by extension from another complexType. The xs:assert element specified in the schema XS1 above, has the following semantic intent: "to specify a relational constraint between two sibling elements" (elements "a" and "c" in this case).

Summarizing the design thoughts, for the schema specified above (XS1):
1) An assertion within XSD complexType extension derivation, doesn't always produce a restriction effect. As illustrated in the example above, an assertion is specifying a orthogonal (along with the traditional xs:extension constraint) co-occurence constraint -- this is intuitive, and useful.
2) We should be careful though, to be aware that an xs:assert element within complexType extension can easily inject a content model restriction effect. If this is not wanted, an assertion shouldn't be used for such derived XSD complex types. Following is an XML Schema example, illustrating this scenario:

XSD Schema document (XS2):
(intended to validate the XML document XML1 above)
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:complexContent>
             <xs:extension base="T1">
                <xs:sequence>
                   <xs:element name="c" type="xs:string"/>
                </xs:sequence>
                <xs:assert test="not(b)">
                   <xs:annotation>
                      <xs:documentation>
                         The element "b" is prohibited.
                      </xs:documentation>
                   </xs:annotation>
                </xs:assert>
             </xs:extension>
          </xs:complexContent>
       </xs:complexType>
    </xs:element>
    
    <xs:complexType name="T1">
       <xs:sequence>
          <xs:element name="a" type="xs:string"/>
          <xs:element name="b" type="xs:string" minOccurs="0"/>
       </xs:sequence>
    </xs:complexType>

</xs:schema>

The schema, XS2 above illustrates following design intents:
1) An xs:assert element within complexType of element "X" prohibits element "b" from occuring within XML instance element "X". An assertion like this, is restricting the complex type "content model" of the base type. If we wouldn't like a content model restricting effect like this, then we shouldn't use an xs:assert with complexType extension.
2) The schema document, XS2 specified above can still thought to be useful to design. The complexType definition of element "X" in schema XS2 above, is quite like a mixture of extension and restriction derivation both. It is an extension derivation, because some of the element particles of the base type are made available within the derived type via an xs:extension element (element "a" for this example). It is also a restriction derivation, because the element "b" of the base type is prohibited to occur in the derived type via an xs:assert element. The complexType definition of element "X" in this case, is unlike any of the facilities of the XSD 1.0 language which allows a pure extension derivation or a pure restriction derivation but not both. Assertions can sometimes thought to be useful via a schema design like this, when we want some of the complexType extension and restriction derivation effects both.

Therefore, here's my final take of these design issues:
1) An assertion is very much intutive (and useful), to specify co-ccurence constraints between XML elements within the sibling XPath axis, and very much so also with the XSD xs:extension element (this is unlike any of XSD 1.0 facilities). Other content model co-occurence scenarios are also useful in this case, like specifying co-constraints between an  element and a attribute etc. XSD assertions are certainly recommended for this case.
2) An assertion is also very much intutive, to specify a mixture of complexType extension and restriction derivation operations (as illustrated in schema example, XS2 above). XSD assertions are certainly also recommended for this case.
3) If an XSD schema author desires to strictly use the element xs:extension for expressing pure content model extension, then using assertion within xs:extension is counter intutive (since it may inject a content model restriction effect) and is not recommended.

Therefore, if we have to do some new kinds of XML Schema modeling with XSD 1.1 assertions (for e.g, with xs:extension derivations), assertions are certainly a nice XML Schema constructs.

I hope, that this post was useful.