An example of how python uses xpath to get page elements 07/02 Update SLTechnology News&Howtos

An example of how python uses xpath to get page elements

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article is to share with you about python using xpath to get page elements of the use of examples, the editor thinks it is very practical, so share with you to learn, I hope you can get something after reading this article, say no more, follow the editor to have a look.

1. How to use xpath?

XPath uses path expressions to select nodes or node sets in an XML document. Nodes are selected by following the path (path) or step (steps).

Meaning of commonly used path expressions

The expression describes / selects the node in the document from the root node (child node) / / selected current node. Select the current node. Select the parent node of the current node. @ Select attribute * indicates anything (wildcard) | operator can select multiple paths

Common function

Function interpretation startswith () xpath ('/ / div [starts-with (@ id, "ma")]') # Select the div node contains () xpath whose id value begins with ma ('/ / div [contains (@ id, "ma")]') # Select the div node and () xpath ('/ / div [contains (@ id, "ma") and contains (@ id) whose id value contains ma "in")]') # Select the div node text () _ .xpath ('. / div/div [4] / a/em/text ()') whose id value contains ma # Select the text content under the em tag

Note:

1. In html, when there are multiple tags at the same level, such as div, the order starts with 1, not 0.

2. You can quickly get node information by using developer tools in the browser.

2 、 Example: #! / usr/bin/python3#-*-coding: utf-8-*-# @ Time: 2021-9-7 9 utf-8 @ Author: Sun# @ Email: 8009@163.com# @ File: sun_test.py# @ Software: PyCharmimport requestsfrom lxml import etreedef get_web_content (): try: url = "htpps://***keyword=%E6%97%A0%E9%92%A2%E5 % 9C%88&wq=%E6%97%A0%E "9%92%A2%E5%9C%88&ev=1_68131%5E&pvid=afbf41410b164c1b91d"abdf18ae8ab5c&page=5&s=116&click=0" header = {"user-agent": "Mozilla/5.0 (Windows NT 10.0 WOW64) "AppleWebKit/537.36 (KHTML, like Gecko)"Chrome/75.0.3770.100 Safari/537.36"} response = requests.request (method= "Get", url=url Headers=header) result = response.text return result except TimeoutError as e: return Nonedef parsing (): result = get_web_content () if result is not None: html = etree.HTML (result) # get a large node first Contains all the information you want to get ii = html.xpath ('/ / * [@ id= "J_goodsList"] / ul/li') for _ in ii: # adopt a loop Get the small node content #''.join () from the big node in turn and concatenate the contents in the list into a string infoResult = {# @ href: get the content whose attribute is href:' href': "https:" + _ .xpath ('. / div/div [1] / a _ div/div [1]) [0] 'title':' .join (_ .xpath ('. / div/div [2] / div/ul/li/a/@title')), # text () means to get the text information in node I, 'price': _ .xpath ('. / div/div [3] / strong/i/text ()') [0] 'info':' .join (_ .xpath ('. / div/div [4] / a/em/text () .strip (), 'province': _ .xpath ('. / div/div [9] / @ data-province') [0]} print (infoResult) else: raise Exception ("Failed to get page information, please check!") Return Noneif _ _ name__ = ='_ _ main__': parsing ()

Result picture:

The above is an example of how python uses xpath to get page elements. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.