Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to climb the rental picture of a certain station

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you how to use Python to climb a rental picture of a station, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Third-party library

Install first

I use pycharm, so I won't introduce another script installation.

As shown above, open the default settings and select Project Interprecter, double-click pip or click the plus sign to search for third-party libraries to install. If you build a project, remember that Project Interprecter should choose the correct installation location or it cannot be imported.

Requests library

Official definition of requests library: Requests is the only non-GM Python HTTP library that humans can safely enjoy. In fact, he asked the network to get web data.

Import requestsheader= {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'} res=requests.get (' http://sh.58.com/zufang/',headers=header)try: print (res.text); except ConnectionError: print ('access denied!')

The results are as follows:

The parameters of Request Headers are as follows:

Some properties of headers:

Accept: specifies the type of content that the client can receive. The order of the content type indicates the order in which the client receives it.

Accept-Lanuage: specifies the language that the HTTP client browser uses to display the preferred choice of return information

Accept-Encoding specifies the type of content compression encoding that the web server can support by the client browser. Indicates that the server is allowed to compress the output before sending it to the client to save bandwidth. What is set here is the return compression format that the client browser can support.

Character encoding sets acceptable to Accept-Charset:HTTP client browsers

User-Agent: some servers or Proxy will use this value to determine whether the request is made by the browser

Content-Type: when using the REST interface, the server checks this value to determine how the content in the HTTP Body should be parsed.

Application/xml: used when calling XML RPC, such as RESTful/SOAP

Application/json: used when calling JSON RPC

Application/x-www-form-urlencoded: used by browsers when submitting Web forms

When using RESTful or SOAP services provided by the server, incorrect Content-Type settings will cause the server to deny service

BeautifulSoup library

BeautifulSoup can easily parse the page requested by the Requests library and parse the page source code into Soup documents while filtering and extracting data. This is the bs4.2 document.

Beautiful Soup supports the HTML parser in the Python standard library, as well as some third-party parsers. If we don't install it, Python will use Python's default parser, of which lxml is said to be relatively powerful. The hint below is that of the python standard library.

Selector select

Case: climb the Shanghai rental picture import requestsimport urllib.requestimport osimport timefrom bs4 import BeautifulSoupheader= {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36'} url= ['http://sh.58.com/zufang/pn{}/?ClickID=2'.format(number) for number in range (6Magazine 51)] # Page crawl adminCout=6for arurl in url: adminCout=adminCout+1 res=requests.get (arurl,headers=header) soup=BeautifulSoup (res.text,'html.parser') arryImg=soup.select (' .img _ list img') print (arryImg) count = 0 For img in arryImg: print (img ['lazy_src']) _ url = img [' lazy_src'] pathName = "E:\\ 2333\" + str (adminCout) + "_" + str (count) + ".jpg" # set the path and file name result = urllib.request.urlopen (_ url) # Open the link Unlike python2.x, please note that data = result.read () # otherwise start downloading to the local with open (pathName, "wb") as code: code.write (data) code.close () count = count + 1 # count + 1 print ("downloading number:" Count) time.sleep (30) the above content is how to use Python to climb a rental picture of a station. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report