How to use Xpath selector to collect target data from web pages in Scrapy 05/08 Update SLTechnology News&Howtos

How to use Xpath selector to collect target data from web pages in Scrapy

2025-05-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

How to use Xpath selector in Scrapy to collect target data from web pages? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

/ concrete implementation /

1, for the title, as mentioned in the previous article, there are a variety of Xpath expressions, you can choose one of them, debug under the scrapy shell script, get the title extraction method, and write it to the crawler body file.

2. Next is the extraction of the release date, which still implements the interaction between the web page and the source code in an interactive way, as shown in the following figure.

3. And the tag "entry-meta-hide-on-mobile" is globally unique and can be easily located to the element.

4. According to the structure of the web page, we can easily write the Xpath expression of the release date. We can test it in scrapy shell first, and then write the selector expression into the crawler file, as shown in the following figure.

Here is part of the impurity information, need to use strip () and replace () function to remove excess impurities, return the date of a "clean".

5. With regard to the Xpath expression of the topic tag of the article, you can see that it is below the date on the page structure, as shown in the following figure.

So you can get the topic tag of the article by changing the Xpath expression of the release date.

6. The topic tag of the article is under the a tag, as shown in the following figure.

After getting the entire list, use the join function to concatenate the elements in the array with commas to generate a new string called tags, and then write it to the Scrapy crawler file.

7. For the number of likes, the analysis method is the same as before, and the data can be located by finding the only label "vote-post-up".

8. A careful partner may see that the "vote-post-up" attribute is not the only attribute in the class tag, so the initial Xpath expression matches empty content.

Here to give you Amway a tip, if there are multiple attributes in the tag, and the attribute is only, you can use the contains function to assist, its usage is'/ / span [contains (@ class, "vote-post-up"), be sure to practice more, otherwise it is easy to forget. Write the Xpath expression according to the structure of the web page, and the debugging process is shown in the following figure.

The number of likes taken out is a string that needs to be converted to a number using int ().

After reading the above, have you mastered how to use Xpath selector to collect target data from web pages in Scrapy? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.