In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
Most people do not understand the knowledge points of this article "Python how to create their own IP pool", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "Python how to create your own IP pool" article.
Development environment
Python 3.8
Pycharm
Module use
Requests > > pip install requests
Parsel > > pip install parsel
If you install a python third-party module
Win + R enter cmd Click OK, enter the installation command pip install module name (pip install requests) enter
Click Terminal (terminal) in pycharm to enter the installation command
How to configure the python interpreter in pycharm
Select file (File) > setting (Settings) > Project (Project) > python interpreter (python interpreter)
Click on the gear and select add
Add python installation path
How to install plug-ins in pycharm
Select file (File) > setting (Settings) > Plugins (plug-in)
Click Marketplace to enter the name of the plug-in you want to install, such as: translation plug-in input translation / Chinese plug-in input Chinese
Select the appropriate plug-in and click install.
After successful installation, the option to restart pycharm will pop up. Click OK, and the restart will take effect.
Proxy ip structure proxies_dict = {"http": "http://" + ip: Port,"https": "http://" + ip: Port,} idea 1. Data source analysis
Find out where we want the data content, where did it come from?
two。 Code implementation steps
Send a request, send a request for the target URL
Get the data, get the response data returned by the server (web source code)
Parse the data and extract the data content we want
Save data, climb music video local csv database. IP detection to detect whether the IP agent is available and saved by the available IP agent
From from
Import Import
What method is imported from what module?
From xxx import * # Import all methods
Code # Import data request module import requests# data request module third party module pip install requests# import regular expression module import re # built-in module # import data parsing module import parsel # data parsing module third party module pip install parsel > > this is the core component of the scrapy framework lis = [] lis_1 = [] # 1. Send the request, send the request for the target URL https://www.kuaidaili.com/free/for page in range (11,21): url = f 'https://www.kuaidaili.com/free/inha/{page}/' # determine the request url address "headers request header function disguises the python code"# send the request for the url address using the get method in the requests module Finally, after receiving the return data response = requests.get (url) # request with the response variable, the response response object is returned. The 200 status code indicates that the request was successful # 2. Get the data, get the response data returned by the server (web source code) response.text get the response text data # print (response.text) # 3. Parsing data Extract the data content we want to parse the data method: regular: you can directly extract the string data content need to get the html string data for conversion xpath: extract the data content according to the tag node css selector: extract the data content according to the tag attribute which aspect to use that That's what kind of regular expression you like to use to extract data content. Re.findall () calls the method in the module. Can match any character (except newline character\ n) re.S ip_list = re.findall ('(. *?)', response.text, re.S) port_list = re.findall ('(. *?)', response.text Re.S) print (ip_list) print (port_list) "# css selector:" # css selector to extract data needs to convert the acquired html string data (response.text) # I don't know how to css or xpath # # list > table > tbody > tr > td:nth-child (1) # / * [@ id= "list "] / table/tbody/tr/td [1] selector = parsel.Selector (response.text) # convert html string data into selector object ip_list = selector.css ('# list tbody tr td:nth-child (1):: text'). Getall () port_list = selector.css ('# list tbody tr td:nth-child (2):: text'). Getall () print (ip_list) print (port_list) "" # xpath extraction data selector = parsel.Selector (response.text) # convert html string data into selector object ip_list = selector.xpath ('/ / * [@ id= "list"] / table/tbody/tr/td [1] / text ()'). Getall () port_list = selector.xpath ('/ / * [@ id= "list"] / table/tbody/tr/td [2] / text ()') .getall () # print (ip_list) # print (port_list) for ip Port in zip (ip_list, port_list): # print (ip, port) proxy = ip +':'+ port proxies_dict = {"http": "http://" + proxy," https ":" http://" + proxy,} # print (proxies_dict) lis.append (proxies_dict) # 4. Check IP quality try: response = requests.get (url=url, proxies=proxies_dict, timeout=1) if response.status_code = = 200: print ('current proxy IP:', proxies_dict, 'can be used') lis_1.append (proxies_dict) except: print ('current proxy IP:', proxies_dict, 'request timeout Failed detection') print ('number of IP obtained:', len (lis)) print ('get number of available IP agents:', len (lis_1)) print ('get available IP agents:', lis_1) dit = {'http':' http://110.189.152.86:40698', 'https':' http://110.189.152.86:40698'}
The above is about the content of this article on "how Python creates its own IP pool". I believe we all have some understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.