Grouping Data
The previous topic, Restructuring Data: FLWOR Expressions, described how to use XQuery FLWOR expressions with XML and relational data sources to restructure data. This topic describes how to group data using the XQuery FLWOR expression window clause.
The XQuery FLWOR expression window clause is supported in XQuery 1.1 only.
This section covers the following topics:
What Is Grouping
Grouping is a technique that allows you to group XML data and then perform some sort of query – a transformation, for example – on the data in that group.
In XQuery 1.1, you can achieve grouping using the window clause in a FLWOR expression. The window clause is powerful because it allows you to bind the for clause variable to a group of elements, instead of to a single element only. In XQuery 1.0, the for clause variable in FLWOR expressions could be bound to a single element only.
The window Clause
The window clause in the XQuery FLWOR expression allows you to group data in sequences of consecutive items; these sequences are called windows. The start and end of a window are based on user-defined criteria – the WindowStartCondition (start $var when ExprSingle) and WindowEndCondition (end $var when ExprSingle), respectively. To create the window, the window clause iterates over the sequence, referred to as the binding sequence. The resulting window contains the binding sequence’s start item, end item, and all the items in between.
In a window clause, the starting item of the window is determined by the window type.
Types of Windows
You can use the window clause to create two types of windows:
- Tumbling – tumbling windows are defined as windows whose items never overlap. The item that starts one window always follows the last item of the previous window. Thus, no two windows drawn from the same binding sequence can contain the same items.
- Sliding – sliding windows, on the other hand, are defined as windows that can overlap. That is, one window might contain the same item as another window drawn from the same binding sequence. This can occur because every item in the binding sequence that makes the WindowStartCondition true is the starting item for each new window.
Examples of both types of windows appear in the following sections.
Example: Tumbling Windows
Consider the following XML document, which contains information customer orders. Note that the document structure is flat – customer and order elements are intermingled:
<?xml version="1.0"?> <orders> <customer id="1"/> <order type="book" id="1" price="10.0"/> <order type="DVD" id="3" price="24.0"/> <customer id="2"/> <order type="game" id="5" price="50.0"/> </orders>We want to use XQuery to group all orders by customer, like this:
<?xml version="1.0"?> <orders> <customer id="1"> <order type="book" id="1" price="10.0"/> <order type="DVD" id="3" price="24.0"/> </customer> <customer id="2"> <order type="game" id="5" price="50.0"/> </customer> </orders>Using the tumbling windows in XQuery 1.1, the code required to generate the same output is straightforward and concise. Here, the XQuery 1.1 code iterates through all of the elements in the XML document. The start of the binding sequence is an element customer; the end of the binding sequence occurs when the element immediately after the context is not an order (that is, in this example, it is another customer).
<orders>{ for tumbling window $customer in $data/orders/* start $start when $start/self::customer end next $next when $next[self::customer] return <customer>{ $start/@*, subsequence($customer, 2) }</customer> }</orders>While generating this output is possible in XQuery 1.0, the recursive function required to process one sibling at a time is somewhat complex and not especially efficient, as shown here:
declare function local:orders($c as element(*, xs:untyped) ) as element()* { local:next($c/following-sibling::*[1]) }; declare function local:next($o as element(*, xs:untyped)* ) as element()* { if($o/self::order) then ($o, local:next($o/following-sibling::*[1][self::order])) else () }; <orders>{ for $ele in $data/orders/customer return <customer>{ $ele/@*, local:orders($ele) }</customer> }</orders>Example: Positional Grouping
The XQuery 1.1 window clause also allows you to define grouping criteria based on position within the XML source. Imagine, for example, an XML document with the following structure containing thousands of book elements:
<?xml version="1.0"?> <books name="My books"> <book bookid="1" pubdate="03/01/2002"> <title>Java Web Services</title> <authors> <author>David A. Chappel</author> <author>Tyler Jewell</author> </authors> <subject>Web Services</subject> </book> ... </books>The XQuery code required to split a single document into multiple documents of, say, 10 book elements each might look like this:
declare variable $window_size = 10; for tumbling window $customer_orders in $books/books/book start when true() end at $i when $i mod $window_size = 0 return serialize-to-url( <books>{$customer_orders}</books> concat("books", $end_pos div $window_size,".xml", ""))Example: Sliding Windows
The previous examples showed how to group data using tumbling windows – that is, windows with adjacent items. Sliding windows contain items that can overlap.
Consider this example from the XQuery 1.1 W3C Working Draft 3 December 2008. We have a document that contains a list of colors, like this:
<?xml version="1.0"?> <doc> <data>Green</data> <data>Pink</data> <data>Lilac</data> <data>Turquoise</data> <data>Peach</data> <data>Opal</data> <data>Champagne</data> </doc>Using a FLWOR expression with a sliding window clause:
declare option ddtek:serialize "indent=yes"; <root>{ for sliding window $item in doc("arrange_rows.xml")/*/data start at $sp when true() end at $ep when $ep - $sp = 2 return <window>{$item}</window> }</root>We can generate a sequence of items, grouped by three, each successive group containing two items overlapping those in the previous group:
<?xml version="1.0"?> <root> <window> <data>Green</data> <data>Pink</data> <data>Lilac</data> </window> <window> <data>Pink</data> <data>Lilac</data> <data>Turquoise</data> </window> <window> <data>Lilac</data> <data>Turquoise</data> <data>Peach</data> </window> <window> <data>Turquoise</data> <data>Peach</data> <data>Opal</data> </window> <window> <data>Peach</data> <data>Opal</data> <data>Champagne</data> </window> <window> <data>Opal</data> <data>Champagne</data> </window> <window> <data>Champagne</data> </window> </root>