In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly shows you "what is the use of XPath in python data parsing", the content is simple and clear, and I hope it can help you solve your doubts, so let me lead you to study and learn "what is the use of XPath in python data parsing" this article.
XPath
XPath is the XML path language (XML Path Language), which is a language used to locate a part of an XML document.
Xpath is the most commonly used, convenient and efficient parsing method. It is universal and strong. It can be used not only in Python language, but also in other languages. Data parsing is recommended to start with xpath.
How to use XPath
Principle of xpath parsing:
Instantiate an object of etree and need to load the parsed page source code data into the object
Call the xpath method in the etree object and combine the xpath expression to realize the location of the tag and the capture of the content
Install lxml
Pip install-I https://mirrors.aliyun.com/pypi/simple/ lxml
From lxml import etreetree = etree.parse ('. / tree.html') # loads the source code locally and instantiates an etree object. Must be a local file, not the string tree = etree.HTML (source code) # load the source code from the Internet, instantiate the etree object # / from the root node, a / represents a level / / indicates multiple levels r = tree.xpath ('/ / div//a') # returns the address of all a tag objects under div in the form of a list r = tree.xpath ('/ / div//a') [1] # returns the address of the second a tag object under div r = tree.xpath ('/ / div [@ class= "tang"]') # returns the address of the tang tag in the form of a list Address r = tree.xpath ('/ / div [@ class= "tang"] / / a') # returns all a tag addresses under the tang tag as a list # gets the text content in the tag r = tree.xpath ('/ / div [@ class= "tang"] / / a/text ()') # returns the text in all a tags as a list # gets the attribute value r = tree.xpath ('/ / div) in the tag / / a _ href _ href') # # returns the value of the attribute in all a tags as a list
Tree.html
Xpaht test
BaiLi ShouYue
Wish you a successful year
A bright future 2
# changed the name after
Take dreams as horses
Qingming Festival is still the bright moon and border case in Qin Dynasty-58 second-hand houses
Parse the house name on the page, that is, the title value.
Train of thought
Get the url where the house name is located, and get its response data
Data parsing to construct xpath expressions. Extract target data
Import requestsfrom lxml import etreeurl = "https://bj.58.com/ershoufang/p1/"headers={ 'User-Agent':'Mozilla/5.0 (Linux; Android 6.0) Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Mobile Safari/537.36'} pag_response = requests.get (url,headers=headers,timeout=3) .text # instantiate an etree object tree = etree.HTML (pag_response) r = tree.xpath ('/ / span [@ class= "content-title"] / text ()') # get all / / span labeled "content-title" text content print (r)
Tips: when we use xpath for data parsing, we cannot construct xpath expressions by looking at elements directly, thinking that in many cases, the structure of elements viewed from browsing is different from that of crawling down the source code. So the correct way is to climb down the source code and then observe it to construct xpath.
The element structure in the following browser is different from the crawled element structure. If the xpath expression is constructed according to the elements summarized by the browser, it will not be resolved successfully!
The above is all the contents of the article "what is the use of XPath in python data parsing". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.