Xpath is a language for selecting XML nodes. You can think of it as the CSS of the XML world. It does some cool things that traditional CSS can’t do (CSS 3 can do some of it), such as selecting items based on content and attributes, and selecting parents and children. There is a cool ZF library which will translate your CSS selectors into Xpath, if you’re interested.
Here’s an example of an Xpath expression. It’s relatively complex and shows off a lot of useful Xpath features:
//Item[ItemNumber='4111']//ExternalIdentifier[@Source='Alpha' and @Type='Beta']
Now, let me break it down. The // means that this node is located anywhere in the document (in CSS this is kinda just assumed. If there is a space in the selector, it does the same thing). The Item part means that we are looking for a Item node. The [ItemNumber='4111'] means that we are looking for a child element of Item (the string before it) which has a child ItemNumber node whose text value is equal to 4111. The // means that we are looking for a child anywhere below the selected parent. The ExternalIdentifier means we are looking for a node of that type. The @Source=’Alpha’ means we are looking for an attribute named Source whose value is Alpha belonging to an element of type ExternalIdentifier (the string before it). The @Type=’Beta’ does the same thing. The “ and ” means that this element must have both of these attributes set.
Here’s an example chunk of XML (imagine that there are several of these Item nodes):
<Item>
<ItemNumber>4111</ItemNumber>
<ExternalIdentifiers>
<ExternalIdentifier Type="Beta" Source="Alpha">10</ExternalIdentifier>
<ExternalIdentifier Type="Beta" Source="Gamma">20</ExternalIdentifier>
<ExternalIdentifier Type="Delta" Source="Alpha">30</ExternalIdentifier>
<ExternalIdentifier Type="Delta" Source="Gamma">40</ExternalIdentifier>
</ExternalIdentifiers>
</Item>
By running the xpath expression above against the provided XML document, we get the following PHP object:
array(1) {
[0]=>
object(SimpleXMLElement)#2 (2) {
["@attributes"]=>
array(2) {
["Type"]=>
string(12) "Beta"
["Source"]=>
string(4) "Alpha"
}
[0]=>
string(2) "10"
}
}
If you were to cast this object as a string, you get the text value of the node (in this case 10).