In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you how to use Python to climb a rental picture of a station, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Third-party library
Install first
I use pycharm, so I won't introduce another script installation.
As shown above, open the default settings and select Project Interprecter, double-click pip or click the plus sign to search for third-party libraries to install. If you build a project, remember that Project Interprecter should choose the correct installation location or it cannot be imported.
Requests library
Official definition of requests library: Requests is the only non-GM Python HTTP library that humans can safely enjoy. In fact, he asked the network to get web data.
Import requestsheader= {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'} res=requests.get (' http://sh.58.com/zufang/',headers=header)try: print (res.text); except ConnectionError: print ('access denied!')
The results are as follows:
The parameters of Request Headers are as follows:
Some properties of headers:
Accept: specifies the type of content that the client can receive. The order of the content type indicates the order in which the client receives it.
Accept-Lanuage: specifies the language that the HTTP client browser uses to display the preferred choice of return information
Accept-Encoding specifies the type of content compression encoding that the web server can support by the client browser. Indicates that the server is allowed to compress the output before sending it to the client to save bandwidth. What is set here is the return compression format that the client browser can support.
Character encoding sets acceptable to Accept-Charset:HTTP client browsers
User-Agent: some servers or Proxy will use this value to determine whether the request is made by the browser
Content-Type: when using the REST interface, the server checks this value to determine how the content in the HTTP Body should be parsed.
Application/xml: used when calling XML RPC, such as RESTful/SOAP
Application/json: used when calling JSON RPC
Application/x-www-form-urlencoded: used by browsers when submitting Web forms
When using RESTful or SOAP services provided by the server, incorrect Content-Type settings will cause the server to deny service
BeautifulSoup library
BeautifulSoup can easily parse the page requested by the Requests library and parse the page source code into Soup documents while filtering and extracting data. This is the bs4.2 document.
Beautiful Soup supports the HTML parser in the Python standard library, as well as some third-party parsers. If we don't install it, Python will use Python's default parser, of which lxml is said to be relatively powerful. The hint below is that of the python standard library.
Selector select
Case: climb the Shanghai rental picture import requestsimport urllib.requestimport osimport timefrom bs4 import BeautifulSoupheader= {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36'} url= ['http://sh.58.com/zufang/pn{}/?ClickID=2'.format(number) for number in range (6Magazine 51)] # Page crawl adminCout=6for arurl in url: adminCout=adminCout+1 res=requests.get (arurl,headers=header) soup=BeautifulSoup (res.text,'html.parser') arryImg=soup.select (' .img _ list img') print (arryImg) count = 0 For img in arryImg: print (img ['lazy_src']) _ url = img [' lazy_src'] pathName = "E:\\ 2333\" + str (adminCout) + "_" + str (count) + ".jpg" # set the path and file name result = urllib.request.urlopen (_ url) # Open the link Unlike python2.x, please note that data = result.read () # otherwise start downloading to the local with open (pathName, "wb") as code: code.write (data) code.close () count = count + 1 # count + 1 print ("downloading number:" Count) time.sleep (30) the above content is how to use Python to climb a rental picture of a station. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.