Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use CSS selector to grab the commodity information of JD.com net

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "how to use the CSS selector to grab the commodity information of JD.com net". In the operation of the actual case, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

CSS selector

At present, in addition to official documents, there are not many technical books and blog soft articles on the market and on the Internet about the use of BeautifulSoup, and there are very few CSS selectors in this only material. In the page parsing of web crawlers, the CCS selector is actually a very efficient tool. Although there is not much information, the official documents are very detailed, but the only drawback is that it takes a certain foundation to understand, and there are no small and sophisticated demonstration examples.

JD.com commodity map

First of all, enter the JD.com network, enter the goods you want to query, and send a web page request to the server. Here the editor still uses the keyword "dog food" as the search object, and then gets the latter string of URLs: https://search.jd.com/Search?keyword=%E7%8B%97%E7%B2%AE&enc=utf-8, in which the parameter means the keyword we entered, in this case the parameter represents "dog food". For details, you can refer to Python God to teach you to deal with JD.com commodity information with regular expressions. So, as long as you enter the parameter keyword and encode it, you can get the target URL. Then request the web page, get the response, and then use the CSS selector for the next step of data collection.

The source code of some of the web pages for commodity information on JD.com 's official website is as follows:

Part of the web source code

If we take a closer look at the source code, we can find that the target information we need is below the red box, then we need to get the information we want layer by layer.

The quote method is provided in the urllib library of Python, which can encode the string of URL so that it can enter the corresponding web page.

CSS selector online copy

Many friends find it difficult to write CSS expressions, but it is not difficult to master the basic usage. Copy the CSS expression online as shown in the figure above, and you can easily copy the CSS expression. However, the CSS expressions obtained by this method generally cannot be used in the program, and they are too long to be seen. So CSS expressions usually have to be done by yourself.

Go to the code directly and use CSS to extract the target information, such as product name, link, picture and price. The specific code is shown in the following figure:

Code implementation

If you want to quickly implement a more powerful web crawler, then the BeautifulSoupCSS selector will be one of your necessary tools. BeautifulSoup integrates the syntax of the CSS selector with its own ease of using API. During the development of web crawlers, using CSS selectors is a very convenient method for those who are familiar with CSS selector syntax.

The final effect is as follows:

Final effect picture

Fresh dog food is out of the oven again.

A brief introduction to CSS selectors:

BeautifulSoup supports most CSS selectors. The syntax is: pass a string parameter to the .select () method of the tag object or BeautifulSoup object, and the result of the selection is returned as a list, that is, the return type is list.

Tag.select ("string")

BeautifulSoup.select ("string")

Note: when getting an element with a specific CSS attribute, the tag name is not modified, such as the class class name with a dot and the id name with a #.

This is the end of the content of "how to grab the commodity information of JD.com net with CSS selector". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report