How does python parse xml 03/25 Update SLTechnology News&Howtos

How does python parse xml

2026-03-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces python how to analyze xml related knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe you will have a harvest after reading this python how to analyze xml article, let's take a look at it.

Introduction

The lxml library is a python xml parsing library that supports HTML and xml parsing, as well as Xpath parsing. Compared with the native xml parsing, lxml is quite efficient.

Xpath is a language for finding information in xml documents. Although it was originally used to search XML documents, it can also be used to find the html language. Its selection function is very powerful, providing a very simple and clear path selection expression, in addition, it also provides more than 100 built-in functions for data processing.

Installation

You can install it using pip. The corresponding pip command is as follows:

Pip install lxml

Use

Read the text parsing node

From lxml import etreetext=''' 's first second item an attribute''html=etree.HTML (text) # initializes to generate a XPath parsing object result=etree.tostring (html,encoding='utf-8') # parsing object output code print (type (html)) print (type (result)) print (result.decode (' utf-8'))

Etree fixes the missing HTML text node, so the printed result complements the html tag.

Second, read the HTML file for parsing

From lxml import etreehtml=etree.parse ('test.html',etree.HTMLParser ()) # specifies that the parser HTMLParser will repair missing information such as declaration information result=etree.tostring (html) # parse into bytes # result=etree.tostringlist (html) # parse into list print (type (html)) print (type (result)) print (result) based on the file

3. Get all nodes

From lxml import etreehtml=etree.parse ('test',etree.HTMLParser ()) result=html.xpath (' / / *') # / / means to acquire descendant nodes, and * means to get all print (type (html)) print (type (result)) print (result)

Returns a list where each element is of type Element and all nodes are included.

To get the li node, you can add the node name after / / and then call the Xpath method.

IV. Text acquisition

From lxml import etreetext=''' 's first second item''html=etree.HTML (text,etree.HTMLParser ()) result=html.xpath (' / / li [@ class= "item-1"] / a/text ()') # get the content result1=html.xpath under a node ('/ / li [@ class= "item-1"] / / text ()') # get the content print (result) print (result1) of all descendant nodes under li

Through the text () method of Xpath, we can get the text in the node.

Fifth, attribute acquisition

The attributes of the node can be obtained through the @ symbol, such as the href attribute of the a tag in the following code:

Result=html.xpath ('/ / li/a/@href') # get the href attribute of a result=html.xpath ('/ / li//@href') # get the href attribute of all li descendant nodes

VI. Choose in order

When we make a selection, we sometimes match multiple nodes, but we only need one of them, so we can get the nodes in a specific order by introducing an index (square bracket Negasso quotation):

From lxml import etreetext1=''' first, second, third, fourth''html=etree.HTML (text1,etree.HTMLParser ()) result=html.xpath (' / / li [contains (@ class, "aaa")] / a/text ()') # get the content of node a under all li nodes result1=html.xpath ('/ / li [1] [contains (@ class) "aaa")] / a/text ()') # get the first result2=html.xpath ('/ / li [last ()] [contains (@ class, "aaa")] / a/text ()') # get the last result3=html.xpath ('/ / li [position () > 2 and position ())

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.