XSLT style sheets use path expressions to locate XML nodes. Composing a straight-line path from the root node down to a child node is sufficient when accessing and displaying all nodes with the same name and in the same position in the DOM hierarchy. There are occasions, though, when a path is needed that selects a particular node or a particular set of nodes based on a node value, a node type, or other forms of search criteria. The need is for a more powerful query capability to better isolate and select nodes for XSLT processing.
The XPath Query Language
The path expressions described previously are special instances of a more general form of expression produced by the XML Path Language (XPath). XPath is a query language for locating nodes of an XML document. It is based on the W3C XML Path Language specifications for a general-purpose query language for addressing and filtering elements and text of XML documents. XPath expressions can address portions of an XML document; it can manipulate strings, numbers, and Booleans; and it can match a set of nodes on content values.
The following button, for example, opens an XML document in which the associated XSLT style sheet uses an XPath expression to locate a node based in its <isbn> value.
XPath Expressions
XPath expressions identify nodes based on their types, names, and values, as well as on the relationship of a node to other nodes in the document. An XPath expression yields a node list or a single node. For example, the search to
"find the <price> value contained in the <book> element of the root <books> node where the price is greater than 20.00"
is expressed in the XPath language as
/books/book[price > 20.00]/price
This expression is used in the select attribute of an XSLT <xsl:apply-template> element to return a list of nodes matching this query. An alternative is a single-node search to
"find the <price> value contained in the <book> element of the root <books> node where the <isbn> node contains the value '0375412883'"
is expressed in XPath as
/books/book[isbn = '0375412883']/price
and returns a single node because of the unique ISBN value. The XPath language provides a powerful search capability for isolating DOM nodes and is the preferred method for traversing DOM trees.
Example XPath Syntax
A basic XPath expression describes a path through the XML DOM hierarchy with a slash-separated list of node identifiers. For example, starting from the root of the Books.xml document, the following pattern traverses down the hierarchy to the <lastname> nodes.
books/book/author/lastname
As you can see, an XPath expression is very much like describing a path through the directory structure of a disk drive. The XPath pattern identifies the hierarchy of nodes to be traversed to locate one or more nodes of interest.
In addition to describing an exact path down a known hierarchy, XPath can include wildcards for describing unknown elements when selecting a set of nodes. For example, a node of any name can be represented by the "*" wildcard character as shown in the following pattern.
books/book/*
Using additional patterns within square brackets can specify branches on the path. For example, the following query describes a branch on the <book> element, indicating that only the <book> elements with <author> children should be considered as a pattern match for retrieving lastname nodes.
books/book[author]/lastname
This expression becomes more specific when relational comparisons are added. The following query returns all child nodes for books whose authors have the last name "Adams".
books/book[author/lastname='Adams']/*
XPath Syntax Rules
An XPath expression consists of a series of one or more traversal nodes, separated by "/" delimiters. Each step, from left to right, moves down the document tree relative to the preceding node.
Absoluate and Relative Paths
A step through the tree can be absolute or relative. An absolute node step assumes that the content will be located starting from the document element. It always begins with a "/" delimiter. Thus, the XPath expression
/books/book/title
begins the search at the document level, continuing down the tree through the <books> root node and <book> parent nodes, retrieving all <title> nodes. A forward-slash stands for a node in the hierarchy even if the node name is not given. The previous path can be represented as
//title
to indicate <title> nodes two levels down in the hierarchy. As well, wildcards can be used in place of node names:
/*/title
Unless situations call for it, it is probably best to explicitly name the nodes in the hierarchy for clarity in readability of the path.
The starting point for a relative node step is always a previously established context node. The step begins with an axis:: reference (child::, parent::, and others) indicating the direction in which to search from the current node, which itself is usually established by a previous XPath expression. For example, if a previous path expression has selected a particular <title> node, then the expression
parent::author
locates the <author> node within the <book> node which is the parent of the current node. Use of relative axes for searches is usually not necessary for simple searches since the child:: axis (in a downward direction) is the default.
Selecting Node Types
Rather than using specific node names, you can select nodes on the basis of their types. The available types include comment(), text(), and node(), the latter which selects nodes of any kind. The following example selects all text nodes -- all data values -- that are children of the <book> node.
books/book/text()
Predicates
Path expressions can include predicates enclosed in brackets to filter and refine a search path. In a previous example the expression
/books/book[isbn='0375412883']/price
selects the <price> node where the <isbn> child node of the <book> parent node contains the string '0375412883'. These predicates are constructed in the general format
where comparison is one of the Boolean operators
The logical operators and and or can be employed to construct multiple comparison tests.
The special expression position() returns or sets the sequence number of a node (beginning with 1) based on its physical location in the tree. The following expression, for example,
<xsl:value-of select = "position()"/>
displays the node position of the current node. The position() expression also can be used to specify a particular node by its physical sequence in the node tree. The expression
books/book[position() = 1]/title
selects the <title> node from the first <book> node. The position can be abbreviated to just the location index: books/book[1]/title.
Example Searches
The button at the top of this page applies an XSLT style sheet to the Books.xml file to select for display the book with ISBN of "0375412883". The style sheet is coded identically to the previous example except for selection of <book> nodes for display. A portion of the style sheet is shown below with the selective search path highlighted. Templates are applied where the <book> node contains an <isbn> child node with the matching value.
... <xsl:template match="/"> <html> <body> <style> table {border-collapse:collapse} tr#head {background-color:#E6E6E6} td {padding:3px} td#title {font-style:italic} </style> <h3>Data from First Book Node</h3> <table border="1"> <tr id="head"> <th>ISBN</th> <th>Title</th> <th>Author</th> <th>Publisher</th> <th>Year</th> <th>Price</th> </tr> <xsl:apply-templates select="books/book[isbn='0375412883']"/> </table> </body> </html> </xsl:template> ...
Another example search is given by the following button. The same style sheet contains the element to select all <book> nodes where the price of the book is greater than $15.00.
<xsl:apply-templates select="books/book[price > 15.00]"/>
Finally, the following button displays a page with multiple selection criteria. Nodes are displayed where the book price is greater than $15.00 and the publication date is later than 2000.
<xsl:apply-templates select="books/book[price > 15.00 and year > 2000]"/>
Using Attributes in XPath Searches
It was discussed previously that XML nodes can be assigned attributes as alternatives to coding child nodes for identity or classification purposes. Although this is not preferred practice, the <book> nodes in the Books.xml file are assigned id attributes to supply the nodes with unique identifiers for illustrative purposes. The numbers are in the format 001, 002, 003, etc., and they are coded as shown below for the first node in the file.
<books> <book id="001"> <isbn>0465044050</isbn> <title>Birth of the Mind</title> <author> <firstname>Gary</firstname> <lastname>Marcus</lastname> </author> ... </book>
Attribute values can be used in predicates of XPath searches. The format is shown below for a search that returns the <book> node with id="004".
<xsl:apply-templates select="books/book[@id = '004']"/>
The attribute name is preceded with an ampersand (@). Conditional and logical operators are applied in the same way as for node searches.
These tutorials do not discuss all aspects of the XPath query language. Full coverage of XPath standards can be found at www.w3.org/TR/xpath.