Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python create its own IP Pool

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)05/31 Report--

Most people do not understand the knowledge points of this article "Python how to create their own IP pool", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "Python how to create your own IP pool" article.

Development environment

Python 3.8

Pycharm

Module use

Requests > > pip install requests

Parsel > > pip install parsel

If you install a python third-party module

Win + R enter cmd Click OK, enter the installation command pip install module name (pip install requests) enter

Click Terminal (terminal) in pycharm to enter the installation command

How to configure the python interpreter in pycharm

Select file (File) > setting (Settings) > Project (Project) > python interpreter (python interpreter)

Click on the gear and select add

Add python installation path

How to install plug-ins in pycharm

Select file (File) > setting (Settings) > Plugins (plug-in)

Click Marketplace to enter the name of the plug-in you want to install, such as: translation plug-in input translation / Chinese plug-in input Chinese

Select the appropriate plug-in and click install.

After successful installation, the option to restart pycharm will pop up. Click OK, and the restart will take effect.

Proxy ip structure proxies_dict = {"http": "http://" + ip: Port,"https": "http://" + ip: Port,} idea 1. Data source analysis

Find out where we want the data content, where did it come from?

two。 Code implementation steps

Send a request, send a request for the target URL

Get the data, get the response data returned by the server (web source code)

Parse the data and extract the data content we want

Save data, climb music video local csv database. IP detection to detect whether the IP agent is available and saved by the available IP agent

From from

Import Import

What method is imported from what module?

From xxx import * # Import all methods

Code # Import data request module import requests# data request module third party module pip install requests# import regular expression module import re # built-in module # import data parsing module import parsel # data parsing module third party module pip install parsel > > this is the core component of the scrapy framework lis = [] lis_1 = [] # 1. Send the request, send the request for the target URL https://www.kuaidaili.com/free/for page in range (11,21): url = f 'https://www.kuaidaili.com/free/inha/{page}/' # determine the request url address "headers request header function disguises the python code"# send the request for the url address using the get method in the requests module Finally, after receiving the return data response = requests.get (url) # request with the response variable, the response response object is returned. The 200 status code indicates that the request was successful # 2. Get the data, get the response data returned by the server (web source code) response.text get the response text data # print (response.text) # 3. Parsing data Extract the data content we want to parse the data method: regular: you can directly extract the string data content need to get the html string data for conversion xpath: extract the data content according to the tag node css selector: extract the data content according to the tag attribute which aspect to use that That's what kind of regular expression you like to use to extract data content. Re.findall () calls the method in the module. Can match any character (except newline character\ n) re.S ip_list = re.findall ('(. *?)', response.text, re.S) port_list = re.findall ('(. *?)', response.text Re.S) print (ip_list) print (port_list) "# css selector:" # css selector to extract data needs to convert the acquired html string data (response.text) # I don't know how to css or xpath # # list > table > tbody > tr > td:nth-child (1) # / * [@ id= "list "] / table/tbody/tr/td [1] selector = parsel.Selector (response.text) # convert html string data into selector object ip_list = selector.css ('# list tbody tr td:nth-child (1):: text'). Getall () port_list = selector.css ('# list tbody tr td:nth-child (2):: text'). Getall () print (ip_list) print (port_list) "" # xpath extraction data selector = parsel.Selector (response.text) # convert html string data into selector object ip_list = selector.xpath ('/ / * [@ id= "list"] / table/tbody/tr/td [1] / text ()'). Getall () port_list = selector.xpath ('/ / * [@ id= "list"] / table/tbody/tr/td [2] / text ()') .getall () # print (ip_list) # print (port_list) for ip Port in zip (ip_list, port_list): # print (ip, port) proxy = ip +':'+ port proxies_dict = {"http": "http://" + proxy," https ":" http://" + proxy,} # print (proxies_dict) lis.append (proxies_dict) # 4. Check IP quality try: response = requests.get (url=url, proxies=proxies_dict, timeout=1) if response.status_code = = 200: print ('current proxy IP:', proxies_dict, 'can be used') lis_1.append (proxies_dict) except: print ('current proxy IP:', proxies_dict, 'request timeout Failed detection') print ('number of IP obtained:', len (lis)) print ('get number of available IP agents:', len (lis_1)) print ('get available IP agents:', lis_1) dit = {'http':' http://110.189.152.86:40698', 'https':' http://110.189.152.86:40698'}

The above is about the content of this article on "how Python creates its own IP pool". I believe we all have some understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report