In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly analyzes how to use Xpath tools and lxml library related knowledge points, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor and learn more about "how to use Xpath tools and lxml libraries".
The editor introduces the content extraction tool-Xpath, which is generally used in conjunction with the lxml library.
1 Xpath and lxml
Xpath
XPath is the XML path language, which is a language used to determine the location of a part of an XML (a subset of the standard general markup language) document. XPath is based on the tree structure of XML and provides the ability to find nodes in the data structure tree.
Xpath was originally used to select XML document node information. XPath became the W3C standard on November 16, 1999. Because it is simple, convenient and easy, it is gradually said to be familiar with.
Lxml
Lxml is a rich and easy-to-use Python official website standard library that specializes in dealing with XML and HTML.
2 the syntax of Xpath
Regular expressions are boring and expensive to learn, and Xpath can be said to be less than 1/10000. So it only takes 10 minutes to master Xpath. The language of Xpath and how to extract information from HTML dom trees, which I sum up as "trunk-tree branches-green leaves".
2.1 "backbone"-select nodes
To grab information, we need to know where to start. Therefore, you need to find a starting node. The following options are available for the Xpath selection start node:
We use the following examples to understand its usage:
If you have no idea about extracting nodes, you can use wildcards to temporarily replace them. Wait until you see the output before further confirmation.
2.2 "Branch"-Relational Node and predicate
The process of this step is to find the node that ultimately contains what we need step by step through the starting point. We sometimes need to use neighboring node information. Therefore, we need to understand the relational node or predicate.
Relational node
Generally speaking, an ordinary node in a DOM tree has a parent node, a sibling node, and a child node. Of course, there are exceptions. Some of these nodes are special and may not have a parent node, such as the root node, or there may be no child nodes, such as the deepest node. Xpath also has syntax that supports the acquisition of relational nodes.
Predicate
Predicates are used to find a specific node or a node that contains a specified value. At the same time, it is embedded in square brackets.
2.3 "Green leaves"-Node content and attributes
At this point, we have found the node of the required content. The next step is to get the contents of the node. The Xpath syntax provides the ability to provide the textual content of a node as well as the content of attributes.
For specific usage, see the following examples:
3 usage of lxml 3.1 install lxml
Pip is the easiest way to install library files, with the following commands:
3.2Using lxml
Lxml is relatively easy to use. We first initialize the html page using lxml's etree, and then throw it to the Xpath match. The specific usage is as follows:
Yes, the information can be extracted in just a few lines of code.
It is worth noting that the type returned by the xpath lookup match may be a value or a list of multiple values. It depends on how your path expression is written.
This is the end of the introduction on "how to use Xpath tools and lxml libraries". More related content can be searched for previous articles, hoping to help you answer questions and questions, please support the website!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.