Sunday, December 31, 2023

XSLT 3.0 grouping use case

I've just been playing this evening, trying to improve XalanJ prototype processor's XSLT 3.0 xsl:for-each-group instruction's implementation. Following is an xsl:for-each-group instruction use case, that I've been trying to solve.

XML input document,

<?xml version="1.0" encoding="utf-8"?>

<root>

  <a>

    <itm1>hi</itm1>

    <itm2>hello</itm2>

    <itm3>there</itm3>

  </a>

  <b>

    <itm1>this</itm1>

    <itm2>is</itm2>

    <itm3>nice</itm3>

  </b>

  <c>

    <itm1>hello</itm1>

    <itm2>friends</itm2>

  </c>

  <d>

    <itm1>this is ok</itm1>

  </d>

</root>

XSLT 3.0 stylesheet, using xsl:for-each-group instruction to group XML instance elements from an XML document cited above,

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         version="3.0">

      <xsl:output method="xml" indent="yes"/>

     <xsl:template match="/root">

           <result>

               <xsl:for-each-group select="*" group-by="(count(*) eq 1) or (count(*) eq 3)">

            <group groupingCriteria="{if (current-grouping-key() eq true()) then '1,3' else 'not(1,3)'}">

                <xsl:copy-of select="current-group()"/>

            </group>

              </xsl:for-each-group>

          </result>

      </xsl:template>

</xsl:stylesheet>

The stylesheet transformation result, of above cited XSLT transform is following as produced by XalanJ,

<?xml version="1.0" encoding="UTF-8"?><result>

  <group groupingCriteria="1,3">

    <a>

    <itm1>hi</itm1>

    <itm2>hello</itm2>

    <itm3>there</itm3>

  </a>

    <b>

    <itm1>this</itm1>

    <itm2>is</itm2>

    <itm3>nice</itm3>

  </b>

    <d>

    <itm1>this is ok</itm1>

  </d>

  </group>

  <group groupingCriteria="not(1,3)">

    <c>

    <itm1>hello</itm1>

    <itm2>friends</itm2>

  </c>

  </group>

</result>

Achieving such XML data grouping, was very hard with XSLT 1.0 language. Thank god, we've XSLT 3.0 language available now.


Thursday, December 28, 2023

Managing complexity of XPath 3.1 'if' expressions, in the context of XSLT 3.0

I've just been playing around, with the following XSLT transformation example, and thought of sharing this as a blog post here.

Let's consider following XSLT 3.0 stylesheet, that we'll use to transform an XML document mentioned thereafter,

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:xs="http://www.w3.org/2001/XMLSchema"

                         xmlns:fn0="http://fn0"

                         exclude-result-prefixes="xs fn0"

                         version="3.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:variable name="date1" select="xs:date('2005-10-12')" as="xs:date"/>

  <xsl:template match="/root">

      <root>

          <xsl:copy-of select="if (fn0:func1($date1)) then a else b"/>

     </root>

  </xsl:template>

  <!-- An XSLT stylesheet function, that performs a specific boolean valued computation. The result of this function, is used to perform computations of distinct branches of XPath 'if' condition used within xsl:copy-of instruction written earlier above. -->

 <xsl:function name="fn0:func1" as="xs:boolean">

     <xsl:param name="date1" as="xs:date"/>

     <xsl:sequence select="if (current-date() lt $date1) 

                                                                               then true() 

                                                                               else false()"/>

   </xsl:function>

</xsl:stylesheet>

The corresponding XML instance document is following,

<?xml version="1.0" encoding="utf-8"?>

<root>

    <a/>

    <b/>

</root>

The two possible XSLT transformation results (depending upon the result of following XPath expression comparison : current-date() lt $date1, for the above mentioned XSLT transformation are following:

<?xml version="1.0" encoding="UTF-8"?><root>

  <b/>

</root>

and,

<?xml version="1.0" encoding="UTF-8"?><root>

  <a/>

</root>

Within the above mentioned XSLT transformation example, we may observe how, the XPath 3.1 'if' expressions have been written to achieve the desired XSLT transformation results. We're able to write stylesheet functions that may be significantly complex to produce boolean result, which may act as XPath 'if' expression branching condition.

I hope that, the above mentioned XSLT transformation example is useful.


Wednesday, December 27, 2023

XML data grouping with XSLT 3.0, illustrations

I've just been playing this morning, writing an XSLT 3.0 stylesheet, that does grouping of an XML input data as follows (that I wish to share with XML and XSLT community).

XML input document,

<root>

  <a>

    <m/>

  </a>

  <b>

    <n/>

  </b>

  <a>

    <o/>

  </a>

  <a>

    <p/>

  </a>

  <a>

    <q/>

  </a>

  <b>

    <r/>

  </b>

  <b>

    <s/>

  </b>

</root>


XSLT 3.0 stylesheet, that does grouping of XML document's data mentioned above (i.e, grouping of xml element children of element "root"),

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                

                         version="3.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/root">

     <xsl:for-each-group select="*" group-by="name()">

        <xsl:element name="{current-grouping-key()}">

           <xsl:copy-of select="current-group()/*"/>

        </xsl:element>

     </xsl:for-each-group>

  </xsl:template>

</xsl:stylesheet>


The XSLT transformation output, of this XML document transform is following,

<?xml version="1.0" encoding="UTF-8"?><a>

  <m/>

  <o/>

  <p/>

  <q/>

</a><b>

  <n/>

  <r/>

  <s/>

</b>


The XML data grouping algorithm implemented by the XSLT stylesheet illustrated above is following,

The XML element children of element "root", are formed into multiple groups (there are two XML data groups that're possible for this stylesheet transformation example.) on the basis of XML element names (the XML sibling elements which are child elements of element "root").

I hope that, this XSLT stylesheet example has been useful for us to study.

This XSLT stylesheet example, has been tested with Apache XalanJ's XSLT 3.0 prototype processor.

Tuesday, September 12, 2023

XSLT 3.0, XPath 3.1 and XalanJ

It's been a while that, I've written a blog post here. I've few new updates, about the work which XalanJ team has been doing over the past few months, that I wish to share with the XML community.

XalanJ project, provides XSLT and XPath processors that are written with Java language. An XSLT processor transforms an XML input document (or even only text files), into other formats like XML, HTML and text.

XalanJ project, has released a new version (2.7.3) of XalanJ on 2023-04-01. This XalanJ release, essentially is a bug fix release over the previous release. The XalanJ 2.7.3 release was extensively tested by XalanJ team, and it has very good compliance with XSLT 1.0 and XPath 1.0 specs.

Since Apr 2023, XalanJ team has been working to develop implementations of XSLT 3.0 and XPath 3.1 language specifications. These XalanJ codebase changes are currently not released by XalanJ team, but are available on XalanJ dev repos branch.

I further wish to write about, XSLT 3.0 user-defined callable component implementation enhancements within XalanJ, that should be available within one of the future XalanJ release. The callable components within a programming language are, essentially functions and procedures. XSLT 1.0 language has only one kind of user-defined callable component, which is written with an XML element name xsl:template.

XSLT 3.0 provides another kind of user-defined callable component, defined with an XML element name xsl:function. An XSLT instruction xsl:function was first made available within XSLT 2.0 language. A user-defined function present within an XSLT stylesheet, may be called within an XPath expression.

Following is an example of XSLT 3.0 stylesheet, that makes use of an xsl:function element,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                         xmlns:ns0="http://ns0"
                         exclude-result-prefixes="ns0"
                         version="3.0">
    
    <xsl:output method="xml" indent="yes"/>
    
    <xsl:template match="/">       
         <result>
             <one>
                 <xsl:value-of select="ns0:func1(6, 5, true(), false())"/>
             </one>
             <two>
         <xsl:value-of select="ns0:func1(2, 5, true(), false())"/>
             </two>
         </result>
    </xsl:template>
    
    <xsl:function name="ns0:func1">
         <xsl:param name="val1"/>
         <xsl:param name="val2"/>
         <xsl:param name="a"/>
         <xsl:param name="b"/>
       
         <xsl:value-of select="if ($val1 gt $val2) then ($a and $b) else ($a or $b)"/>
    </xsl:function>
    
</xsl:stylesheet>

The above cited XSLT stylesheet, defines an user-defined function named "func1" bound to the specified non-null XML namespace. This function definition requires four arguments with a function call, and produces a boolean result based on few logical conditions.

The above cited XSLT stylesheet, produces following output with XalanJ,

<?xml version="1.0" encoding="UTF-8"?><result>
  <one>false</one>
  <two>true</two>
</result>

XPath 3.1 provides a new kind of callable component (that wasn't available with XPath 1.0), which is an inline function definition which when compiled by an XPath processor, produces an XPath data model (XDM) function item.

An XPath 3.1 function item, may be called via an XPath dynamic function call expression.

Following is an XSLT 3.0 stylesheet, that specifies an XPath inline function expression, and is an alternate solution to above cited XSLT stylesheet,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                         version="3.0">
    
    <xsl:output method="xml" indent="yes"/>
    
    <xsl:variable name="func1" select="function($val1, $val2, $a, $b) { if ($val1 gt $val2) then ($a and $b) else ($a or $b) }"/>
    
    <xsl:template match="/">       
         <result>
             <one>
                   <xsl:value-of select="$func1(6, 5, true(), false())"/>
             </one>
             <two>
          <xsl:value-of select="$func1(2, 5, true(), false())"/>
             </two>
         </result>
    </xsl:template>
    
</xsl:stylesheet>

The above cited XSLT stylesheet, specifies an XPath inline function expression assigned to an XSLT variable "func1". This makes, XPath expressions like $func1(..) as function calls (which are termed as dynamic function calls by XPath 3.1 language).

The above cited XSLT stylesheet, produces an output with XalanJ, which is same as with an earlier cited stylesheet.

Its perhaps also interesting to discuss and analyze, which of the above mentioned XSLT callable components approaches an XSLT stylesheet author should choose?

An XPath 3.1 inline function expression is an *XPath expression*, therefore its function body is limited to have XPath syntax only.

Whereas, an xsl:function is an XSLT instruction (which may be invoked as a function call, from within XPath expressions). The xsl:function function's body may have significantly complex logic (with any permissible XSLT syntax and XPath expressions) as compared to XPath inline function expressions.

To conclude, I believe that, when using XSLT 3.0 and XPath 3.1, we have following three main kinds of user-defined callable components which may be used by XSLT stylesheet authors,

1) xsl:template   (which is very important within an XSLT stylesheet, and is the core of an XSLT stylesheet)

2) xsl:function

3) XPath inline function expression

That's all I wished to say within this blog post.



Monday, April 10, 2023

XPath 2.0 quantified expressions. Implementation with XSLT 1.0

XPath 2.0 language has introduced new syntax and semantics as compared to XPath 1.0 language, for e.g like the XPath 2.0 quantified expressions.

Following is an XPath 2.0 grammar, for the quantified expressions (quoted from the XPath 2.0 language specification),

QuantifiedExpr    ::=    ("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle

The XPath 2.0 quantified expression, when evaluated over a list of XPath data model items, returns either boolean 'true' or a 'false' value.

I'm able to, suggest an XSLT 1.0 code pattern (tested with Apache XalanJ), that can implement the logic of XPath 2.0 like quantified expressions. Following is an example, illustrating these concepts,

XML input document:

<?xml version="1.0" encoding="UTF-8"?>

<elem>

  <a>5</a>

  <a>5</a>

  <a>4</a>

  <a>7</a>

  <a>5</a>

  <a>5</a>

  <a>7</a>

  <a>5</a>

</elem> 

XSLT 1.0 stylesheet, implementing the XPath 2.0 "every" like quantified expression (i.e, universal quantification):

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         exclude-result-prefixes="exslt"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">           

            <xsl:if test="number(.) &gt; 3">

              <yes/>

            </xsl:if>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="count(exslt:node-set($temp)/yes) = count(a)"/>

   </xsl:template>

</xsl:stylesheet>

The above XSLT stylehseet, produces a boolean 'true' result, if all XML "a" input elements have value greater than 3, otherwise a boolean 'false' result is produced.

XSLT 1.0 stylesheet, implementing the XPath 2.0 "some" like quantified expression (i.e, existential quantification):

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         exclude-result-prefixes="exslt"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">           

            <xsl:if test="number(.) = 4">

              <yes/>

            </xsl:if>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="count(exslt:node-set($temp)/yes) &gt;= 1"/>

   </xsl:template>

</xsl:stylesheet>

The above XSLT stylehseet, produces a boolean 'true' result, if at-least one XML "a" input element has value equal to 4, otherwise a boolean 'false' result is produced.

Within the above cited XSLT 1.0 stylesheets, we've used XSLT "node-set" extension function (that helps to convert an XSLT 1.0 "result tree fragment" into a node set).

We can therefore conclude that, within an XSLT 1.0 environment, we can largely simulate logic of many XPath 2.0 language constructs.

Thursday, April 6, 2023

XSLT 1.0 transformation : find distinct values

In continuation to my previous blog post on this site, this blog post describes how to use XSLT 1.0 language (tested with Apache XalanJ 2.7.3 along with its JavaScript extension function bindings), to find distinct values (i.e, doing de-duplication of data set) from data set originating from an XML instance document.

Following is an XSLT transformation example, illustrating these features.

XML instance document:

<?xml version="1.0" encoding="UTF-8"?>

<elem>

  <a>2</a>

  <a>3</a>

  <a>3</a>

  <a>5</a>

  <a>3</a>

  <a>1</a>

  <a>2</a>

  <a>5</a>

</elem>

Corresponding XSLT 1.0 transformation:

<?xml version="1.0"?>

<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                          xmlns:xalan="http://xml.apache.org/xalan"

          xmlns:js="http://js_functions"

                          extension-element-prefixes="js"

                          version="1.0">

   <xsl:output method="text"/>

   <xalan:component prefix="js" functions="reformString">

      <xalan:script lang="javascript">

        function reformString(str)

        {

           return str.substr(0, str.length - 1);

        }

      </xalan:script>

   </xalan:component>

   <xsl:template match="/elem">

      <xsl:if test="count(a) &gt; 0">

         <xsl:variable name="result">

            <xsl:call-template name="distinctValues">

               <xsl:with-param name="curr_node" select="a[1]"/>

               <xsl:with-param name="csv_result" select="concat(string(a[1]), ',')"/>

            </xsl:call-template>

         </xsl:variable>

         <xsl:value-of select="js:reformString(string($result))"/>

      </xsl:if>

   </xsl:template>

   <xsl:template name="distinctValues">

      <xsl:param name="curr_node"/>

      <xsl:param name="csv_result"/>

      <xsl:choose>

        <xsl:when test="$curr_node/following-sibling::*">

           <xsl:variable name="temp1">

              <xsl:choose>

         <xsl:when test="not(contains($csv_result, concat(string($curr_node), ',')))">

            <xsl:value-of select="concat($csv_result, string($curr_node), ',')"/>

         </xsl:when>

         <xsl:otherwise>

            <xsl:value-of select="$csv_result"/>

         </xsl:otherwise>

              </xsl:choose>

           </xsl:variable>

           <xsl:call-template name="distinctValues">

      <xsl:with-param name="curr_node" select="$curr_node/following-sibling::*[1]"/>

      <xsl:with-param name="csv_result" select="normalize-space($temp1)"/>

           </xsl:call-template>

        </xsl:when>

        <xsl:otherwise>

           <xsl:value-of select="$csv_result"/>

        </xsl:otherwise>

      </xsl:choose>      

   </xsl:template>

</xsl:stylesheet>

The above mentioned, XSLT transformation produces the following, desired result,

2,3,5,1

XalanJ users could find the, JavaScript language related jars (which needs to be available within, the jvm classpath at run-time during XSLT transformation) within XalanJ src distribution. These relevant jar files are : bsf.jarcommons-logging-1.2.jarrhino-1.7.14.jar (Rhino is mozilla's javascript engine implementation, bundled with XalanJ 2.7.3 src distribution).


Wednesday, April 5, 2023

XSLT 1.0 transformation : finding maximum from a list of numbers, from an XML input document

Apache Xalan project has released XalanJ 2.7.3 few days ago, and I thought to write couple of blog posts here, to report on the basic sanity of XalanJ 2.7.3's functional quality.

Following is a simple XML transformation requirement.

XML input document :

<?xml version="1.0" encoding="UTF-8"?>

<elem>

    <a>2</a>

    <a>3</a>

    <a>5</a>

    <a>1</a>

    <a>7</a>

    <a>4</a>

</elem>

We need to write an XSLT 1.0 stylesheet, that outputs the maximum value from the list of XML "a" elements mentioned within above cited XML document.

Following are the three XSLT 1.0 stylesheets that I've come up with, that do this correctly,

1)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">

           <xsl:sort select="." data-type="number" order="descending"/>

           <e1><xsl:value-of select="."/></e1>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="concat('Maximum : ', exslt:node-set($temp)/e1[1])"/>

   </xsl:template>

</xsl:stylesheet>

2)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      Maximum : <xsl:call-template name="findMax"/>

   </xsl:template>

   <xsl:template name="findMax">

      <xsl:variable name="temp">

         <xsl:for-each select="a">

            <xsl:sort select="." data-type="number" order="descending"/>

            <e1><xsl:value-of select="."/></e1>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="exslt:node-set($temp)/e1[1]"/>

   </xsl:template>

</xsl:stylesheet>

3)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:choose>

         <xsl:when test="count(a) = 0"/>

         <xsl:when test="count(a) = 1">

            Maximum : <xsl:value-of select="a[1]"/>

         </xsl:when>

         <xsl:otherwise>

            <xsl:variable name="result">

               <xsl:call-template name="findMax">

                  <xsl:with-param name="curr_max" select="a[1]"/>

                  <xsl:with-param name="next_node" select="a[2]"/>

               </xsl:call-template>

            </xsl:variable>

            Maximum :  <xsl:value-of select="$result"/> 

         </xsl:otherwise>

      </xsl:choose>

   </xsl:template>

   <xsl:template name="findMax">

      <xsl:param name="curr_max"/>

      <xsl:param name="next_node"/>

      <xsl:choose>

         <xsl:when test="$next_node/following-sibling::*">

            <xsl:choose>

               <xsl:when test="number($next_node) &gt; number($curr_max)">

                  <xsl:call-template name="findMax">

     <xsl:with-param name="curr_max" select="$next_node"/>

     <xsl:with-param name="next_node" select="$next_node/following-sibling::*[1]"/>

                  </xsl:call-template>

               </xsl:when>

               <xsl:otherwise>

          <xsl:call-template name="findMax">

             <xsl:with-param name="curr_max" select="$curr_max"/>

             <xsl:with-param name="next_node" select="$next_node/following-sibling::*[1]"/>

          </xsl:call-template>

               </xsl:otherwise>

            </xsl:choose>

         </xsl:when>

         <xsl:otherwise>

            <xsl:choose>

               <xsl:when test="number($next_node) &gt; number($curr_max)">

                  <xsl:value-of select="$next_node"/>

               </xsl:when>

               <xsl:otherwise>

                  <xsl:value-of select="$curr_max"/>

               </xsl:otherwise>

            </xsl:choose>

         </xsl:otherwise>

      </xsl:choose>

   </xsl:template>

</xsl:stylesheet>

I somehow, personally like the XSLT solution 3) illustrated above, for these requirements. This solution, traverses the sequence of XML "a" elements till the end of "a" elements list, and outputs the maximum value from the list at the end of XML elements traversal. This solution, seems to have an algorithmic time complexity of O(n), with a little bit of possible overhead of XSLT recursive template calls than the other two XSLT solutions.

The XSLT solutions 1) and 2) illustrated above, seem to have higher algorithmic time complexity than solution 3), due to the use of XSLT xsl:sort instruction (which probably has algorithmic time complexity of O(n * log(n)) or O(n * n)). The XSLT solutions 1) and 2) illustrated above, also seem to have higher algorithmic "space complexity" (this measures the memory used by the algorithm) due to storage of intermediate sorted result.

The XalanJ command line, to run above cited XSLT transformations are following,

java org.apache.xalan.xslt.Process -in file.xml -xsl file.xsl