Xpath Expressions Explained

Multithreaded JavaScript has been published with O'Reilly!

Xpath is a language for selecting XML nodes. You can think of it as the CSS of the XML world. It does some cool things that traditional CSS can’t do (CSS 3 can do some of it), such as selecting items based on content and attributes, and selecting parents and children. There is a cool ZF library which will translate your CSS selectors into Xpath, if you’re interested.

Here’s an example of an Xpath expression. It’s relatively complex and shows off a lot of useful Xpath features:

//Item[ItemNumber='4111']//ExternalIdentifier[@Source='Alpha' and @Type='Beta']

Now, let me break it down. The // means that this node is located anywhere in the document (in CSS this is kinda just assumed. If there is a space in the selector, it does the same thing). The Item part means that we are looking for a Item node. The [ItemNumber='4111′] means that we are looking for a child element of Item (the string before it) which has a child ItemNumber node whose text value is equal to 4111. The // means that we are looking for a child anywhere below the selected parent. The ExternalIdentifier means we are looking for a node of that type. The @Source=’Alpha’ means we are looking for an attribute named Source whose value is Alpha belonging to an element of type ExternalIdentifier (the string before it). The @Type=’Beta’ does the same thing. The and means that this element must have both of these attributes set.

Here's an example chunk of XML (imagine that there are several of these Item nodes):

<Item>
  <ItemNumber>4111</ItemNumber>
  <ExternalIdentifiers>
    <ExternalIdentifier Type="Beta" Source="Alpha">10</ExternalIdentifier>
    <ExternalIdentifier Type="Beta" Source="Gamma">20</ExternalIdentifier>
    <ExternalIdentifier Type="Delta" Source="Alpha">30</ExternalIdentifier>
    <ExternalIdentifier Type="Delta" Source="Gamma">40</ExternalIdentifier>
  </ExternalIdentifiers>
</Item>

By running the xpath expression above against the provided XML document, we get the following PHP object:

array(1) {
  [0]=>
  object(SimpleXMLElement)#2 (2) {
    ["@attributes"]=>
    array(2) {
      ["Type"]=>
      string(12) "Beta"
      ["Source"]=>
      string(4) "Alpha"
    }
    [0]=>
    string(2) "10"
  }
}

If you were to cast this object as a string, you get the text value of the node (in this case 10).

Tags: #php #xml
Thomas Hunter II Avatar

Thomas has contributed to dozens of enterprise Node.js services and has worked for a company dedicated to securing Node.js. He has spoken at several conferences on Node.js and JavaScript and is an O'Reilly published author.