Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to classify and save all the articles and pictures with Python

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to classify and save all articles and pictures in Python". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Project goal

Create a folder to sort and save all the articles and pictures. The download is successful and the results show the console.

Project analysis

1. How to find the real access address and multiple web requests?

Swipe the mouse, watch the website, and right-click F12. Slide the mouse wheel to load new content.

Click on the random web page, click on Request URL, and observe the rules of the URL.

Https://bh.sb/page/1/https://bh.sb/page/2/https://bh.sb/page/3/https://bh.sb/page/4/

It is observed that for each additional page page/ {} / self-increment 1, use {} instead of the transformed variable, and then use the for loop to traverse the URL to achieve multiple URL requests.

two。 Anti-climbing processing

1) get the normal http request headers, and set these regular http request headers when requests requests.

2) use fake_useragent to generate random UserAgent for access.

Libraries and websites involved

1. The website is as follows:

Https://www.doutula.com/photo/list/?page={}

2. Libraries involved: requests, lxml, fake_useragent, time, os

3. Software: PyCharm

Project implementation

1. We define a class class to inherit object, then define an init method to inherit self, and then define a main function main inheriting self. Import the required libraries and URLs and create a save folder.

Import requests, osfrom lxml import etreefrom fake_useragent import UserAgentimport timeclass bnotiank (object): def _ _ init__ (self): os.mkdir ("Picture") # remember to run only the first time when creating a folder, and comment out the line if you run it multiple times. Def main (self): passif _ _ name__ = ='_ main__': Siper=bnotiank () Siper.main ()

2. Random UserAgent to construct request headers to prevent anti-crawling.

Ua = UserAgent (verify_ssl=False) for i in range (1,50): self.headers = {'User-Agent': ua.random}

3. Send the request, get the response, and call back the page to facilitate the next request.

'' send a request to get a response''def get_page (self, url): res = requests.get (url=url, headers=self.headers) html = res.content.decode ("utf-8") return html

4. Define the parse_page function, obtain the secondary page address, and for traverse to get the required fields.

Def parse_page (self, html): parse_html = etree.HTML (html) image_src_list = parse_html.xpath ('/ / p _ Unip _) # print (image_src_list)

5. When a request occurs for a second-level page, xpath parses the data and gets a link to a large picture.

Reo = parse_html1.xpath ('/ / div//div [@ class= "content"]') # parent node for j in reo: d = j.xpath ('. / / article [@ class= "article-content"] / / phand img text parse_html1.xpath ('/ / H2 [@ class= "article-title"] / / a/text ()') [0] .strip ()

6. Request the address of the picture and write to the document.

Html2 = requests.get (url=d, headers=self.headers). Content dirname = ". / d /" >

7. Call the method to realize the function.

Url = self.url.format (page) print (url) html = self.get_page (url) self.parse_page (html)

8. Set the delay. (prevent ip from being blocked).

Time.sleep (1) "" time delay "" effect display

1. Click the small green triangle to run the input start page and end the page.

2. Display the download success information on the console.

3. Text is named as the image, and the display effect is as follows.

This is the end of the content of "how to classify and save all the pictures in Python". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report