Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to set proxy IP in ForeSpider data Collector

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to set proxy IP in ForeSpider data Collector". In daily operation, I believe many people have doubts about how to set proxy IP in ForeSpider data Collector. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to set proxy IP in ForeSpider data Collector". Next, please follow the editor to study!

-01-create proxy IP

Open the ForeSpider data acquisition engine, IP proxy settings, and create a proxy IP as shown in the following figure.

1. Name: you can customize it, and it can be composed of words, letters, other symbols, etc.

two。 Types: there are static agents and dynamic agents, dynamic and static agents in ForeSpider acquisition system are different from those in the market. IP dynamic and static agents in ForeSpider crawler software can distinguish static and dynamic agents through proxy IP access.

① dynamic proxy: the proxy IP used by the crawler is changeable. In general, the proxy IP is obtained through an API link provided by the proxy IP service provider, which is the dynamic proxy. In the system, fill in the API link at the location marked in the figure below.

② static proxy: the proxy IP used by the crawler is fixed to one or more IP addresses. Generally, the proxy IP service will provide the IP address, port, user name, and password. In this case, the static proxy IP. Some proxy IP service providers only have IP addresses and ports, which are also static proxies. In this case, the username and password are empty.

3. Request frequency: refers to the frequency at which each proxy IP is called by the ForeSpider runtime. When the crawler runs, the number of threads = request frequency * proxy IP.

For example, connect to an agent that requests 10 IP per second, and set the request frequency to 5. Then when the crawler runs, the agent IP is requested 50 times per second, and the best number of threads in the collection running speed is 50.

-02-proxy IP Settings

1. Dynamic proxy IP Settings

In ForeSpider crawler, after creating a new dynamic proxy IP, enter the following basic parameters:

① protocol type: default http/https, supported by both protocols

② return format: return IP format, including TXT and unknown format. Select TXT format to return TXT format, while TXT format can get IP without writing scripts.

③ refresh cycle: the frequency of requests to call API (in milliseconds), which can be entered according to the actual purchase situation.

④ request address: fill in the API link.

After completing the above information, you can click the "Test" button to test, and the test results will be displayed at the bottom of the screen. After the test is successful, confirm that the agent IP is checked, and click the Save button, as shown below:

After you have saved it, you can start data collection.

two。 Static proxy IP Settings

In ForeSpider crawler, after creating a new static proxy IP, you need to enter the following parameters:

① IP address: provided by the agent IP service provider.

② port: provided by the proxy IP service provider.

③ type: choose according to the type of website to be collected, there are http and https types

Validity period of ④: it can be understood as the expiration period, which is not required.

When multiple proxy IP are used at the same time and the expiration time is inconsistent, it can be filled in according to the actual situation, and the use of the proxy IP will be stopped automatically until the expiration date.

When the validity period is less than 1 day, it is recommended to prevent the collection failure caused by the expiration of the proxy IP and the running of the ForeSpider crawler.

If it is left empty, it needs to be manually closed before the expiration of the proxy IP, otherwise the collection will fail. As shown in the following figure:

⑤ user name: provided by the agent IP service provider.

⑥ password: provided by the agent IP service provider.

After filling in, check the proxy IP before you need to use it, open the proxy IP button, and save it. As shown in the following figure:

-03-script Settin

In the ForeSpider data acquisition system, when the access of the proxy IP does not belong to the static / dynamic access mode described above, the script is used to set the proxy IP.

Script settings are similar to dynamic settings. You need to set the following items according to the actual situation, as follows:

① protocol type: default http/https, supported by both protocols

② return format: select an unknown format, and the script can get the TXT format.

③ refresh cycle: the frequency of requests to call API (in milliseconds), which can be entered according to the actual purchase situation.

④ request address: fill in the API link provided by the agent.

⑤ POST DATA: fill in this content according to the post request, depending on the agent.

⑥ code editing area: you need to fill in a script to invoke the proxy IP. The code is as follows:

Ips = DOWNDATA.Split ('\ n'); vart;for (item0)

After pasting the above code into the edit box, you usually only need to modify the last sentence of the script and fill in the parentheses: IP address + port, user name, password, valid duration, http/https.

-04-other settin

In the ForeSpider crawler system, there are three proxy IP mixing strategies:

1. Collection disables local IP

After setting, only the proxy IP is used for collection. If the proxy IP fails or is not enabled, the collection will fail.

two。 Agent failed to use local IP

If the proxy IP fails or fails, it will directly use the local IP for collection.

3. Use only local IP

Only local IP is used for collection after setting.

Matters needing attention

① modifies the proxy IP during the collection process, which needs to pause / stop the collection and then modify it.

② when the api address of the request proxy IP is changed, such as when there is a timestamp in the link address, the proxy IP in the ForeSpider does not support access.

③ can be accessed using script settings when the returned content does not have a password but is required to provide a password.

When ④ does not know the number of requests from the proxy IP, the request frequency should be 1, or any number from 1 to 10.

When ⑤ requests a new batch of proxy IP, the old proxy IP within the validity period can still be used.

At this point, the study on "how to set up the proxy IP in the ForeSpider data collector" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report