Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python climb the Home of Pictures

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how Python crawls the picture house". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "Python how to climb the house of pictures".

Preface

Analog browser

Request and obtain website data

Extract the data filter we want from the original data

Save the filtered data

What tools do you need to complete a crawler

Python3.6

Pycharm Professional Edition

Target website

The house of pictures

Https://www.tupianzj.com/

Crawler code

Import tool

Standard library that comes with python

Import ssl

The system library automatically creates a saved folder

Import os

Download the package

Import urllib.request

Third-party package of network library

Import requests

Web page selector

From bs4 import BeautifulSoup

The default request https website does not require certificate authentication

Ssl._create_default_https_context = ssl._create_unverified_context

Analog browser

Headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',}

Automatically create a folder

If not os.path.exists ('. / illustration material /'): os.mkdir ('. / illustration material /') else: pass

Request operation

Url = 'https://www.tupianzj.com/meinv/mm/meizitu/'html = requests.get (url, headers=headers) .text

Extract the original data of the page

Soup = BeautifulSoup (html, 'lxml') images_data = soup.find (' ul', class_='d1 ico3'). Find_all_next ('li') for image in images_data: image_url = image.find_all (' img') for _ in image_url: print (_ ['src'], _ [' alt'])

download

Try: urllib.request.urlretrieve (_ ['src'],'. / illustration material /'+ _ ['alt'] +' .jpg') except: pass

At this point, I believe you have a deeper understanding of "Python how to climb the house of pictures". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report