Recording the Way

Xpath Syntax

2018/01/01

Path 句法(URL):

Syntax Meaning
tag Selects all child elements with the given tag. For example, spam selects all child elements named spam, and spam/egg selects all grandchildren named egg in all children named spam.
* Selects all child elements. For example, */egg selects all grandchildren named egg.
. Selects the current node. This is mostly useful at the beginning of the path, to indicate that it’s a relative path.
// Selects all subelements, on all levels beneath the current element. For example, .//egg selects all egg elements in the entire tree.
.. Selects the parent element.
[@attrib] Selects all elements that have the given attribute.
[@attrib='value'] Selects all elements for which the given attribute has the given value. The value cannot contain quotes.
[tag] Selects all elements that have a child named tag. Only immediate children are supported.
[tag='text'] Selects all elements that have a child named tag whose complete text content, including descendants, equals the given text.
[position] Selects all elements that are located at the given position. The position can be either an integer (1 is the first position), the expression last() (for the last position), or a position relative to the last position (e.g. last()-1).

Predicates (expressions within square brackets) must be preceded by a tag name, an asterisk, or another predicate. position predicates must be preceded by a tag name.

Xpath Python使用示例:

Here’s an example that demonstrates some of the XPath capabilities of the module. We’ll be using the countrydata XML document from the Parsing XML section:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import xml.etree.ElementTree as ET
root = ET.fromstring(countrydata)
# Top-level elements
root.findall(".")
# All 'neighbor' grand-children of 'country' children of the top-level
# elements
root.findall("./country/neighbor")
# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")
# 'year' nodes that are children of nodes with name='Singapore'
root.findall(".//*[@name='Singapore']/year")
# All 'neighbor' nodes that are the second child of their parent
root.findall(".//neighbor[2]")

以上示例用于XML文本,如果要作用于HTML文本,需要将 import xml.etree.ElementTree as ET 改为 import lxml.html ,以及将 root = ET.fromstring(countrydata) 改为 root = lxml.html.fromstring(HTMLData)

XPath Axes

An axis defines a node-set relative to the current node.

AxisName Result
ancestor Selects all ancestors (parent, grandparent, etc.) of the current node
ancestor-or-self Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself
attribute Selects all attributes of the current node
child Selects all children of the current node
descendant Selects all descendants (children, grandchildren, etc.) of the current node
descendant-or-self Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself
following Selects everything in the document after the closing tag of the current node
following-sibling Selects all siblings after the current node
namespace Selects all namespace nodes of the current node
parent Selects the parent of the current node
preceding Selects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes
preceding-sibling Selects all siblings before the current node
self Selects the current node

Axe Examples

Example Result
child::book Selects all book nodes that are children of the current node
attribute::lang Selects the lang attribute of the current node
child::* Selects all element children of the current node
attribute::* Selects all attributes of the current node
child::text() Selects all text node children of the current node
child::node() Selects all children of the current node
descendant::book Selects all book descendants of the current node
ancestor::book Selects all book ancestors of the current node
ancestor-or-self::book Selects all book ancestors of the current node - and the current as well if it is a book node
child::*/child::price Selects all price grandchildren of the current node
CATALOG
  1. 1. Path 句法(URL):
  • XPath Axes
    1. 1. Axe Examples