In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces python how to analyze xml related knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe you will have a harvest after reading this python how to analyze xml article, let's take a look at it.
Introduction
The lxml library is a python xml parsing library that supports HTML and xml parsing, as well as Xpath parsing. Compared with the native xml parsing, lxml is quite efficient.
Xpath is a language for finding information in xml documents. Although it was originally used to search XML documents, it can also be used to find the html language. Its selection function is very powerful, providing a very simple and clear path selection expression, in addition, it also provides more than 100 built-in functions for data processing.
Installation
You can install it using pip. The corresponding pip command is as follows:
Pip install lxml
Use
Read the text parsing node
From lxml import etreetext=''' 's first second item an attribute''html=etree.HTML (text) # initializes to generate a XPath parsing object result=etree.tostring (html,encoding='utf-8') # parsing object output code print (type (html)) print (type (result)) print (result.decode (' utf-8'))
Etree fixes the missing HTML text node, so the printed result complements the html tag.
Second, read the HTML file for parsing
From lxml import etreehtml=etree.parse ('test.html',etree.HTMLParser ()) # specifies that the parser HTMLParser will repair missing information such as declaration information result=etree.tostring (html) # parse into bytes # result=etree.tostringlist (html) # parse into list print (type (html)) print (type (result)) print (result) based on the file
3. Get all nodes
From lxml import etreehtml=etree.parse ('test',etree.HTMLParser ()) result=html.xpath (' / / *') # / / means to acquire descendant nodes, and * means to get all print (type (html)) print (type (result)) print (result)
Returns a list where each element is of type Element and all nodes are included.
To get the li node, you can add the node name after / / and then call the Xpath method.
IV. Text acquisition
From lxml import etreetext=''' 's first second item''html=etree.HTML (text,etree.HTMLParser ()) result=html.xpath (' / / li [@ class= "item-1"] / a/text ()') # get the content result1=html.xpath under a node ('/ / li [@ class= "item-1"] / / text ()') # get the content print (result) print (result1) of all descendant nodes under li
Through the text () method of Xpath, we can get the text in the node.
Fifth, attribute acquisition
The attributes of the node can be obtained through the @ symbol, such as the href attribute of the a tag in the following code:
Result=html.xpath ('/ / li/a/@href') # get the href attribute of a result=html.xpath ('/ / li//@href') # get the href attribute of all li descendant nodes
VI. Choose in order
When we make a selection, we sometimes match multiple nodes, but we only need one of them, so we can get the nodes in a specific order by introducing an index (square bracket Negasso quotation):
From lxml import etreetext1=''' first, second, third, fourth''html=etree.HTML (text1,etree.HTMLParser ()) result=html.xpath (' / / li [contains (@ class, "aaa")] / a/text ()') # get the content of node a under all li nodes result1=html.xpath ('/ / li [1] [contains (@ class) "aaa")] / a/text ()') # get the first result2=html.xpath ('/ / li [last ()] [contains (@ class, "aaa")] / a/text ()') # get the last result3=html.xpath ('/ / li [position () > 2 and position ())
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.