How DataDirect XQuery® Generates SQL


An XQuery cannot be passed directly to a relational database; therefore, DataDirect XQuery translates the query to one or multiple SQL Select statements. The Select statements that are generated can be complex. DataDirect XQuery supports options that create less complex Select statements, which can improve performance in some cases.

In addition, the output of a SQL statement is a table (a flat structure), but the typical XML value is a tree structure. To achieve the required transformation of the result from a flat structure to a tree structure, DataDirect XQuery translates the query into two parts: an XML construction part and a SQL part. The XML construction part adds XML tags to the results retrieved from the database to create the hierarchy requested in the query.

DataDirect XQuery uses four SQL generation algorithms, which result in different SQL statements. Each algorithm takes a different approach to construct the requested XML hierarchy from the results returned from the SQL statements. See "Using DataDirect XQuery® SQL Generation Algorithms".

Simplifying Generated SQL

The options discussed in this section affect only XQuery expressions that are executed by the SQL Adaptor.

String Comparisons and Trailing Spaces

Unlike XQuery string comparison, SQL character comparison is not sensitive to differences in trailing spaces. To accommodate this semantic difference, the SQL statements that DataDirect XQuery executes compare both the strings and the length of the strings.

For example, when comparing an Oracle fixed-width character column with a constant value, DataDirect XQuery executes a SQL statement that contains:

CHARCOL='constant' AND LENGTH(CHARCOL)=LENGTH('constant') 

The length comparison can be avoided by adding the following option declaration to the query prolog:

declare option ddtek:sql-ignore-trailing-spaces "yes"; 

Using this option declaration is also convenient when two fixed-width character columns with different lengths are used in a join condition. For example, assume the following two tables with a fixed-width character column that have different lengths and the following query:

Tables:

table1(col char(10)) 
table2(col char(20)) 

Query:

for $t1 in collection('table1')/table1 
for $t2 in collection('table2')/table2 
where $t1/col = $t2/col 
return 
... 

Even when table1 and table2 contain rows where the col column contains the same value, by default, the values never match because the lengths are different. Adding ignore-trailing-spaces to the query prolog, avoids this possible issue.

String Functions

While many XQuery functions that operate on strings have an equivalent SQL function, the semantics often differ slightly. The most important differences are how the following string conditions are handled in XQuery functions versus SQL functions:

The result of these differences is that the generated SQL for string functions is complex. If both empty sequence and trailing space behaviors are not relevant, complexity can be avoided by adding the following option declaration to the XQuery prolog:

declare option ddtek:sql-simple-string-functions "yes"; 

The result of using this option declaration is that DataDirect XQuery generates SQL that translates the XQuery function to the equivalent SQL function without taking into account trailing spaces, or empty sequence or empty string arguments.

Following is an overview of the XQuery string functions that are affected by the sql-simple-string-functions option declaration. The following examples assume that the expression is a part of a query that is being executed by the SQL Adaptor. Most of these examples apply to Microsoft SQL Server; however, similar considerations hold true for other databases.

Example: fn:string-length()

Example: fn:ends-with()

Example: fn:substring-after()

Example: fn:upper-case() and fn:lower-case()

Example: fn:substring()

Example: fn:concat()

Example: fn:contains()

DB2 Decimal to String Cast

The default casting of DB2 decimal values to string values can result in strings with leading 0s, which is not XQuery-compliant. DataDirect XQuery generates SQL that removes these 0s. The resulting SQL gets fairly complex. This can be avoided by adding the following option declaration to the XQuery prolog:

declare option ddtek:sql-simple-convert-functions "yes"; 

This allows some casts of decimal values to strings to return strings with leading 0s.

Using an Order By Clause

Due to limitations of some SQL databases, it is not always possible to order a SQL result set on an expression that is not part of the Select list. DataDirect XQuery supports an option declaration that allows you to choose whether Order By clauses in the generated SQL are explicitly added to the Select list.

The option can be set by adding the following option declaration to the XQuery prolog:

declare option ddtek:sql-order-by-on-values "no|yes|noSubquery"; 

When set to yes, expressions on which to sort are not explicitly added to the Select list of generated SQL Select statements. When set to no, values or expressions on which to sort are always added to the Select list, which typically decreases performance but is required by some databases. When set to noSubquery, the behavior is equivalent to yes except when the expression on which to sort is a subquery. In this case, the value noSubquery behaves as if no is specified.

The default value is database dependent:

Database
Default
DB2
no
Informix
noSubquery
MySQL
yes
Oracle
yes
PostgreSQL
yes
Microsoft SQL Server
yes
Sybase
noSubquery

There should be no reason to change the value for Microsoft SQL Server, Oracle, MySQL, or PostgreSQL; however, for DB2, Informix, and Sybase, many queries perform faster when the value is changed to yes.

NOTE: The noSubquery value is useful to optimize the performance of some queries with DB2. When using this value, make sure you set the rewrite-exists-into-count option declaration to inCase (see "Using a SQL EXISTS Subclause in DB2").

Example

For example, assume the XQuery expression below is executed against a Microsoft SQL Server database:

for $x in collection('users')/users  
order by $x/userid  
return $x/name 

When order-by-on-values is set to no, the SQL statement executed is:

SELECT ALL name AS racol1,userid AS racol2 
FROM users 
WHERE name IS NOT NULL 
ORDER BY racol2 ASC 

When order-by-on-values is set to yes, the SQL statement executed is:

SELECT ALL name AS racol1 
FROM users 
WHERE name IS NOT NULL 
ORDER BY userid ASC 

Using a SQL EXISTS Subclause in DB2

The different DB2 systems impose limitations on the usage of the SQL EXISTS subclause. This option specifies whether to change an EXISTS subclause into a count() > 0 subclause. When set to no, EXISTS subclauses are not rewritten. When set to yes (the default for DB2 for z/OS), EXISTS subclauses are always rewritten. When set to inCase (the default for DB2 for iSeries), EXISTS subclauses in conditional expressions are rewritten.

Typically, you should not change the default setting, but some XQuery expressions executed against DB2 for z/OS and DB2 for iSeries perform better when this option is set to no. In addition, if order-by-on-values is set to noSubquery for DB2 for Linux/UNIX/Windows, you will get the best performance for the broadest set of queries if you set rewrite-exists-into-count to inCase.

The option can be set by adding the following option declaration to the XQuery prolog:

declare option ddtek:sql-rewrite-exists-into-count "no|yes|inCase"; 

Using BINARY_DOUBLE and BINARY_FLOAT Data Types (Oracle 10g and higher)

By default, DataDirect XQuery uses the Oracle NUMBER data type when converting to or constructing XQuery floats or doubles. You can change this behavior for Oracle 10g and higher by adding the following option declaration to the XQuery prolog:

declare option ddtek:ora10-use-binary-float-double "yes|no"; 

When this option declaration is set to yes, DataDirect XQuery uses the BINARY_FLOAT and BINARY_DOUBLE data types.

Using DataDirect XQuery® SQL Generation Algorithms

DataDirect XQuery uses one of the following SQL generation algorithms when translating a query to one or multiple SQL Select statements:

By default, DataDirect XQuery uses the merge join algorithm, which typically gives the best performance. However, when the XML structure is not too deeply nested, the outer join or sorted outer union algorithm gives better performance. As a guideline, consider using either outer join or sorted outer union in cases where the XML nesting level is limited to four or less.

See "Improving Performance" for additional information about performance.

Nested Loop

The nested loop algorithm creates a first SQL statement that selects all values for the parent nodes and a second SQL statement that, for each parent node, selects the values for the associated child nodes.

Use this algorithm when the parent nodes are not uniquely identifiable. In this case, the nested loop algorithm is the only one that returns correct results.

Merge Join

To construct a correct parent/child relationship, the merge join algorithm creates a first SQL statement that selects all parent values and sorts them on the unique columns of the parents. Then, a second SQL statement joins parent with child values and sorts them again on the unique columns of the parents. The results are processed by moving forward through the results of both SQL statements, linking parent with child node values based on the values of the unique columns.

By default, DataDirect XQuery uses the merge join algorithm, which typically gives the best performance.

Outer Join

The outer join algorithm creates a single SQL statement that outer joins parent with child node values. The advantage of using this algorithm is that only a single SQL statement is created. The disadvantage is that this single SQL statement can be very complex, for example, when deeply nested XML structures must be created. In addition, this algorithm requires that for each set of child node values all parent node values are selected as well, which results in redundant information being communicated between the database server and the application.

As a guideline, consider using the outer join algorithm in cases where the XML nesting level is limited to four or less.

Sorted Outer Union

The sorted outer union algorithm creates a single SQL statement that is the union of multiple SQL statements (one for each level in the XML hierarchy). The first SQL statement in the union selects parent node values; the second selects the unique values of the parents joined with the child node values. This approach is recursively applied for each level in the XML hierarchy.

The advantage of using this algorithm is that only a single SQL statement is created. The disadvantage is that this single SQL statement can be very complex and that SQL engines are typically not well-tuned for complex union statements.

As a guideline, consider using the sorted outer union algorithm in cases where the XML nesting level is limited to four or less.

Selecting Which Algorithm to Use

You can select which algorithm to use for a given connection or for an individual query.

To specify an algorithm for a given connection, configure the JdbcOptions property of DDXQDataSource or the Options property of DDXQJDBCConnection. For example:

DDXQDataSource ds = new DDXQDataSource(); 
ds.setJdbcOptions("sql-rewrite-algorithm=nested-loop"); 

See "DDXQDataSource and DDXQJDBCConnection Properties".

To specify an algorithm for an individual query, add the sql-rewrite-algorithm option declaration to the Query prolog. For example:

declare option ddtek:sql-rewrite-algorithm "outer-join";