Querying XML Files Archived in ZIP or JAR Files
DataDirect XQuery supports the use of fn:collection() to query XML files archived in a ZIP or JAR file. This feature is useful for querying many types of office documents (text documents created with a word-processor, spreadsheets, charts, and graphical images such as drawings and presentations) stored in an XML format, such as the OpenDocument or Office Open XML format.
In the following example, suppose you have multiple XML files archived in the ZIP file xml.zip. Each XML file contains information about one book, and you want to create a single XML document that contain lists of all your books.
for $book in collection("zip://c:/xml.zip//books") return <myBook>{$book/book/title}</myBook> </books>The result would look something like this:
<books> <myBook> <title>Emma</title> </myBook> <myBook> <title>Pride and Prejudice</title> </myBook> . . . </books>The function’s declaration for this feature is:
where:
zip_or_jar_urlis a URL referencing a ZIP or JAR file. The URL must use the zip:// or jar:// scheme.
optionis{(select="REGEX") | recurse={yes | no} | (sort=[a,t,r]+) | (xquery-regex=(yes|no))}where:
selectcontains a regular expression (REGEX), which determines which files in the directory are selected. Ifselectis not specified, any file is assumed.
sortdetermines how the retrieved files are sorted, as follows:
asorts alphabetically (ascending).tsorts by modification time (beginning with most recent).rcombined withaandtreverses the sort order.
recursedetermines whether subdirectories archived in the ZIP or JAR file are searched. The default is no.To search subdirectories, set
recurseto yes, for example:for $book in collection("zip://c:/xml.zip?select=*.xml;recurse=yes")
return
<myBook>{$book/book/title}</myBook>
</books>
xquery-regexdetermines what type of regular expression syntax is specified inselect.- If set to no (the default), the select pattern syntax takes the conventional form. For example, *.xml selects all files with an xml extension. More generally, the select pattern is converted to a regular expression by prepending "^", appending "$", replacing "." with "\.", and replacing "*" with ".*". Then, the select pattern is used to match the file names appearing in the directory using the XQuery regular expression rules. So, for example, you can specify *.(xml|xhtml) to match files with either of these two file extensions.
Note however, that special characters used in the URL may need to be escaped using the %HH convention, which can be achieved using the iri-to-uri() function.
- If set to
yes, the select pattern syntax as supported by XQuery is assumed. In this case, some characters may need to be escaped such as the backslash character (\) in a file name, for example:
select=.*\.xml$must beselect=.*%5C.xml$See also "Collection URI Resolvers".