7 Querying Large XML Documents
The DataDirect XQuery Streaming XML feature of the XML Adaptor minimizes memory use when querying large XML documents by simultaneously parsing the XML document, executing the query, and sending the data to the application as needed. In contrast, the XML Adaptor loads the complete XML document in memory and creates an optimized in-memory representation of the complete document. Then, the in-memory representation is used during query execution. In general, the XML Adaptor uses more memory than the XML Adaptor with the Streaming XML feature enabled. By default, the Streaming XML feature is enabled.
DataDirect XQuery supports streaming on XML documents accessed through:
- fn:doc( ).
- fn:collection( ) when using the extensions that allow querying of directories (see "Querying Multiple Files in a Directory").
- External variables and initial context item. For the input values to be streamed, they must be defined as XQSequence, java.io.Reader, java.io.InputStream, or StAX (SAX values cannot be streamed). For streaming to work with external variables when not using prepared queries, DataDirect XQuery must be using deferred binding (see "Support of Deferred Binding").
- External Java functions. For the return values to be streamed, they must be defined as javax.xml.transform.stream.StreamSource, javax.xml.transform.stax.StAXSource (for JVM 1.6 only), or com.ddtek.xquery.StAXSource.
The Streaming XML feature is used on a per XML document basis. For example, in a single query, the Streaming XML feature can be used for XML document A and not for XML document B. The Streaming XML feature is not used in the following circumstances:
- If the XML document possibly needs to be parsed more than once. For example, the following two circumstances require an XML document to be parsed more than once:
- If the query includes fn:doc() without literal arguments. In this case, the documents to be queried are determined at runtime and, therefore, might be parsed twice.
- If the fn:doc expression is used in multiple for clauses in a FLWOR expression. For example:
for $a in doc("A.XML")/A/B/C
for $b in doc("X.XML")/X/Y/Z
return
...
In this case, streaming is used for A.XML, but not for X.XML.
- If nodes from the XML document are accessed with a reverse or optional axis or any function that is based indirectly on such an axis: fn:root(), fn:id(), fn:idref(), and fn:document-uri().
In addition, in the following circumstances, some expressions can cause the Streaming XML feature to stop processing the current node:
- A node is used in a function or operator, including effective boolean value calculations. For example:
...
if(doc("foo.xml")/a/b/c
...
In this example, the c nodes are instantiated.
- Multiple path expressions are evaluated on a node. For example:
...
doc("foo.xml")/a/b/(c|d)
...
In this example, the b nodes are instantiated.
- A node is referenced multiple times in the query.