What is the use of XPath in python data parsing 10/17 Update SLTechnology News&Howtos

What is the use of XPath in python data parsing

2025-10-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly shows you "what is the use of XPath in python data parsing", the content is simple and clear, and I hope it can help you solve your doubts, so let me lead you to study and learn "what is the use of XPath in python data parsing" this article.

XPath

XPath is the XML path language (XML Path Language), which is a language used to locate a part of an XML document.

Xpath is the most commonly used, convenient and efficient parsing method. It is universal and strong. It can be used not only in Python language, but also in other languages. Data parsing is recommended to start with xpath.

How to use XPath

Principle of xpath parsing:

Instantiate an object of etree and need to load the parsed page source code data into the object

Call the xpath method in the etree object and combine the xpath expression to realize the location of the tag and the capture of the content

Install lxml

Pip install-I https://mirrors.aliyun.com/pypi/simple/ lxml

From lxml import etreetree = etree.parse ('. / tree.html') # loads the source code locally and instantiates an etree object. Must be a local file, not the string tree = etree.HTML (source code) # load the source code from the Internet, instantiate the etree object # / from the root node, a / represents a level / / indicates multiple levels r = tree.xpath ('/ / div//a') # returns the address of all a tag objects under div in the form of a list r = tree.xpath ('/ / div//a') [1] # returns the address of the second a tag object under div r = tree.xpath ('/ / div [@ class= "tang"]') # returns the address of the tang tag in the form of a list Address r = tree.xpath ('/ / div [@ class= "tang"] / / a') # returns all a tag addresses under the tang tag as a list # gets the text content in the tag r = tree.xpath ('/ / div [@ class= "tang"] / / a/text ()') # returns the text in all a tags as a list # gets the attribute value r = tree.xpath ('/ / div) in the tag / / a _ href _ href') # # returns the value of the attribute in all a tags as a list

Tree.html

Xpaht test

BaiLi ShouYue

Wish you a successful year

A bright future 2

# changed the name after

Take dreams as horses

Qingming Festival is still the bright moon and border case in Qin Dynasty-58 second-hand houses

Parse the house name on the page, that is, the title value.

Train of thought

Get the url where the house name is located, and get its response data

Data parsing to construct xpath expressions. Extract target data

Import requestsfrom lxml import etreeurl = "https://bj.58.com/ershoufang/p1/"headers={ 'User-Agent':'Mozilla/5.0 (Linux; Android 6.0) Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Mobile Safari/537.36'} pag_response = requests.get (url,headers=headers,timeout=3) .text # instantiate an etree object tree = etree.HTML (pag_response) r = tree.xpath ('/ / span [@ class= "content-title"] / text ()') # get all / / span labeled "content-title" text content print (r)

Tips: when we use xpath for data parsing, we cannot construct xpath expressions by looking at elements directly, thinking that in many cases, the structure of elements viewed from browsing is different from that of crawling down the source code. So the correct way is to climb down the source code and then observe it to construct xpath.

The element structure in the following browser is different from the crawled element structure. If the xpath expression is constructed according to the elements summarized by the browser, it will not be resolved successfully!

The above is all the contents of the article "what is the use of XPath in python data parsing". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.