Grouping Data


The previous topic, Restructuring Data: FLWOR Expressions, described how to use XQuery FLWOR expressions with XML and relational data sources to restructure data. This topic describes how to group data using the XQuery FLWOR expression window clause.

The XQuery FLWOR expression window clause is supported in XQuery 1.1 only.

This section covers the following topics:

What Is Grouping

Grouping is a technique that allows you to group XML data and then perform some sort of query – a transformation, for example – on the data in that group.

In XQuery 1.1, you can achieve grouping using the window clause in a FLWOR expression. The window clause is powerful because it allows you to bind the for clause variable to a group of elements, instead of to a single element only. In XQuery 1.0, the for clause variable in FLWOR expressions could be bound to a single element only.

The window Clause

The window clause in the XQuery FLWOR expression allows you to group data in sequences of consecutive items; these sequences are called windows. The start and end of a window are based on user-defined criteria – the WindowStartCondition (start $var when ExprSingle) and WindowEndCondition (end $var when ExprSingle), respectively. To create the window, the window clause iterates over the sequence, referred to as the binding sequence. The resulting window contains the binding sequence’s start item, end item, and all the items in between.

In a window clause, the starting item of the window is determined by the window type.

Types of Windows

You can use the window clause to create two types of windows:

Examples of both types of windows appear in the following sections.

Example: Tumbling Windows

Consider the following XML document, which contains information customer orders. Note that the document structure is flat – customer and order elements are intermingled:

<?xml version="1.0"?> 
<orders> 
  <customer id="1"/> 
  <order type="book" id="1" price="10.0"/>  
  <order type="DVD" id="3" price="24.0"/>  
  <customer id="2"/> 
  <order type="game" id="5" price="50.0"/>  
</orders>   

We want to use XQuery to group all orders by customer, like this:

<?xml version="1.0"?> 
<orders> 
  <customer id="1"> 
    <order type="book" id="1" price="10.0"/>  
    <order type="DVD" id="3" price="24.0"/>  
  </customer> 
  <customer id="2"> 
    <order type="game" id="5" price="50.0"/>  
  </customer> 
</orders> 

Using the tumbling windows in XQuery 1.1, the code required to generate the same output is straightforward and concise. Here, the XQuery 1.1 code iterates through all of the elements in the XML document. The start of the binding sequence is an element customer; the end of the binding sequence occurs when the element immediately after the context is not an order (that is, in this example, it is another customer).

<orders>{ 
 for tumbling window $customer in $data/orders/* 
 start $start when $start/self::customer 
 end next $next when $next[self::customer] 
 return  
    <customer>{ 
        $start/@*, 
        subsequence($customer,  2) 
    }</customer> 
}</orders>  

While generating this output is possible in XQuery 1.0, the recursive function required to process one sibling at a time is somewhat complex and not especially efficient, as shown here:

declare function local:orders($c as element(*, xs:untyped) ) 
 as element()* 
{ 
    local:next($c/following-sibling::*[1]) 
}; 
 
declare function local:next($o as element(*, xs:untyped)* ) 
 as element()* 
{ 
 if($o/self::order) then 
    ($o, local:next($o/following-sibling::*[1][self::order])) 
 else () 
}; 
 
<orders>{ 
for $ele in $data/orders/customer 
return 
 <customer>{ 
  $ele/@*, 
  local:orders($ele)  
 }</customer>  
}</orders>  

Example: Positional Grouping

The XQuery 1.1 window clause also allows you to define grouping criteria based on position within the XML source. Imagine, for example, an XML document with the following structure containing thousands of book elements:

<?xml version="1.0"?> 
<books name="My books"> 
    <book bookid="1" pubdate="03/01/2002"> 
        <title>Java Web Services</title> 
        <authors> 
            <author>David A. Chappel</author> 
            <author>Tyler Jewell</author> 
        </authors> 
        <subject>Web Services</subject> 
    </book> 
     ... 
</books> 

The XQuery code required to split a single document into multiple documents of, say, 10 book elements each might look like this:

declare variable $window_size = 10; 
 
for tumbling window $customer_orders in $books/books/book 
start when true() 
end at $i when $i mod $window_size = 0 
return  
	serialize-to-url( 
	 <books>{$customer_orders}</books>							 
	 concat("books", $end_pos div $window_size,".xml", "")) 

Example: Sliding Windows

The previous examples showed how to group data using tumbling windows – that is, windows with adjacent items. Sliding windows contain items that can overlap.

Consider this example from the XQuery 1.1 W3C Working Draft 3 December 2008. We have a document that contains a list of colors, like this:

<?xml version="1.0"?> 
<doc> 
  <data>Green</data> 
  <data>Pink</data> 
  <data>Lilac</data> 
  <data>Turquoise</data> 
  <data>Peach</data> 
  <data>Opal</data> 
  <data>Champagne</data> 
</doc> 

Using a FLWOR expression with a sliding window clause:

declare option ddtek:serialize "indent=yes"; 
<root>{ 
for sliding window $item in doc("arrange_rows.xml")/*/data 
  start at $sp when true() 
  end at $ep when $ep - $sp = 2 
return  <window>{$item}</window> 
}</root>  

We can generate a sequence of items, grouped by three, each successive group containing two items overlapping those in the previous group:

<?xml version="1.0"?> 
<root> 
    <window> 
        <data>Green</data> 
        <data>Pink</data> 
        <data>Lilac</data> 
    </window> 
    <window> 
        <data>Pink</data> 
        <data>Lilac</data> 
        <data>Turquoise</data> 
    </window> 
    <window> 
        <data>Lilac</data> 
        <data>Turquoise</data> 
        <data>Peach</data> 
    </window> 
    <window> 
        <data>Turquoise</data> 
        <data>Peach</data> 
        <data>Opal</data> 
    </window> 
    <window> 
        <data>Peach</data> 
        <data>Opal</data> 
        <data>Champagne</data> 
    </window> 
    <window> 
        <data>Opal</data> 
        <data>Champagne</data> 
    </window> 
    <window> 
        <data>Champagne</data> 
    </window> 
</root>