Querying XML Files Archived in ZIP or JAR Files


DataDirect XQuery supports the use of fn:collection() to query XML files archived in a ZIP or JAR file. This feature is useful for querying many types of office documents (text documents created with a word-processor, spreadsheets, charts, and graphical images such as drawings and presentations) stored in an XML format, such as the OpenDocument or Office Open XML format.

In the following example, suppose you have multiple XML files archived in the ZIP file xml.zip. Each XML file contains information about one book, and you want to create a single XML document that contain lists of all your books.

for $book in collection("zip://c:/xml.zip//books") 
return 
  <myBook>{$book/book/title}</myBook> 
</books> 

The result would look something like this:

<books> 
  <myBook> 
    <title>Emma</title> 
  </myBook> 
  <myBook> 
    <title>Pride and Prejudice</title> 
  </myBook> 
  . . . 
</books> 

The function’s declaration for this feature is:

collection("zip_or_jar_url(?option(;option)*)?") 

where:

zip_or_jar_url is a URL referencing a ZIP or JAR file. The URL must use the zip:// or jar:// scheme.

option is {(select="REGEX") | recurse={yes | no} | (sort=[a,t,r]+) | (xquery-regex=(yes|no))}

where:

select contains a regular expression (REGEX), which determines which files in the directory are selected. If select is not specified, any file is assumed.

sort determines how the retrieved files are sorted, as follows:

See also "Collection URI Resolvers".