In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "python crawler how to use requests to make agent pool s". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Crawl the agent then validate the agent and put the available agent in the txt file.
Import requests
From scrapy import Selector
Start_url = 'http://www.89ip.cn/index_1.html'
Url = 'http://www.89ip.cn/index_{}.html'
Headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
Class MyProxy (object):
Def GetPage (self,url): # get the page source code
Response = requests.get (url=url,headers=headers)
Text = response.text
Return text
Def GetInfo (self,text): # Page information acquisition
Selector = Selector (text=text)
FindTable = selector.xpath ('/ / div [@ class= "layui-form"] / table/tbody/tr')
For proxy in FindTable:
Ip = "" .join (proxy.xpath ('. / / td [1] / text ()'). Get (). Replace ('\ THANGJIZHANG') .replace ('\ NFINGHANGJIZHANG')
Port = "" .join (proxy.xpath ('. / / td [2] / text ()'). Get (). Replace ('\ THANGJIZHANG') .replace ('\ NFINGHANGJIZHANG')
Print (ip,port)
Self.TestIP (ip,port)
Def TabPage (self,text): # switch pages
Selector = Selector (text=text)
Page = selector.xpath ('/ / * [@ id= "layui-laypage-1"] / a [8] / @ data-page') .get ()
Self.new_url = url.format (page)
Def TestIP (self,ip,port):
Try:
Response = requests.get (url=' https://www.baidu.com/',headers=headers,proxies={"http":"{}:{}".format(ip,port)})
Print (response.status_code)
If response.status_code200:
Print ("access failed")
Else: which is a good http://mobile.zyyyzz.com/ for Zhengzhou abortion Hospital
Self.file = open ('proxy.txt',' aura')
Self.file.write ('{}: {}\ n'.format (ip,port))
Self.file.close ()
Except Exception as e:
Print ("access failed")
Def close (self):
Self.file.close ()
Mypoxy = MyProxy ()
Text = mypoxy.GetPage (start_url)
While True:
Try:
Mypoxy.GetInfo (text)
Mypoxy.GetPage (text)
Text = mypoxy.GetPage (mypoxy.new_url)
Except Exception as e:
Print ('* 10)
# mypoxy.close ()
This is the end of the content of "how python crawler uses requests to make proxy pool s". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.