In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what is the reptile agent ip required by Python crawler". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "what is the reptile agent ip required by Python crawler"!
1 Agent type
Proxy IP can be divided into four types. The transparent proxy IP, anonymous proxy IP, highly anonymous proxy IP mentioned above, and another is obfuscated proxy IP. At the most basic level of security, they should be arranged in this order: high hiding> confusion> anonymity> transparency.
2 Principles of agency
The proxy type depends primarily on the configuration on the proxy server side. Different configurations result in different proxy types. In the configuration, these three variables REMOTE_ADDR, HTTP_VIA, HTTP_X_FORWARDED_FOR are decisive factors.
REMOTE_ADDR
REMOTE_ADDR represents the IP of the client, but its value is not supplied by the client, but is specified by the server based on the IP of the client.
If you access a website directly using a browser, the website's web server (Nginx, Apache, etc.) sets REMOTE_ADDR to the client's IP address.
If we proxy our browser, our request to visit the target site will go through the proxy server, which will then translate the request to the target site. Then the web proxy server for the website will set REMOTE_ADDR to the IP of the proxy server.
X-Forwarded-For(XFF)
X-Forwarded-For is an HTTP extension header used to indicate the true IP of the HTTP requester. When a client uses a proxy, the web proxy server does not know the client's real IP address. To avoid this, proxy servers usually add an X-Forwarded-For header, adding the client IP to the header.
X-Forwarded-For request header format is as follows:
X-Forwarded-For: client, proxy1, proxy2
client represents the IP address of the client;proxy1 is the IP of the device farthest from the server; proxy2 is the IP of the secondary proxy device; from the format, it can be seen that there can be multiple layers of proxies from client to server.
If an HTTP request passes through three proxies, Proxy1, Proxy2, and Proxy3, IP1, IP2, and IP3 respectively, and the user's real IP is IP0, then according to the XFF standard, the server will finally receive the following information:
X-Forwarded-For: IP0, IP1, IP2
Proxy3 is directly connected to the server, and it appends IP2 to XFF, indicating that it is forwarding requests on behalf of Proxy2. IP3 is not in the list, but can be obtained on the server side through the Remote Address field. We know that HTTP connection is based on TCP connection, HTTP protocol has no concept of IP, Remote Address comes from TCP connection, indicating the IP of the device establishing TCP connection with the server, in this case IP3.
HTTP_VIA
Via is a header in HTTP protocol, which records the proxy and gateway through which an HTTP request passes. After 1 proxy server, the information of a proxy server is added, and after 2 proxies, 2 are added.
3 Agent Type Distinctions
Transparent Proxy
The proxy server is configured as follows:
REMOTE_ADDR = Proxy IPHTTP_VIA = Proxy IPHTTP_X_FORWARDED_FOR = Your IP
Although transparent proxies can directly "hide" the client's IP address, they can still look up the client's IP address from HTTP_X_FORWARDED_FOR.
Anonymous proxy
The proxy server is configured as follows:
REMOTE_ADDR = proxy IPHTTP_VIA = proxy IPHTTP_X_FORWARDED_FOR = proxy IP
Anonymous proxies provide the ability to hide the IP address of a client. With anonymous proxies, the server can know that the client is using a proxy when it cannot know the client's real IP address.
Confusing proxy
The proxy server is configured as follows:
REMOTE_ADDR = Proxy IPHTTP_VIA = Proxy IPHTTP_X_FORWARDED_FOR = Random IP address
Similar to anonymous proxy, but with a more realistic disguise. If the client uses an obfuscated proxy, the server still knows that the client is using a proxy, but gets a fake client IP address.
Elite Proxy or High Anonymous Proxy
The proxy server is configured as follows:
REMOTE_ADDR = Proxy IPHTTP_VIA = not determinedHTTP_X_FORWARDED_FOR = not determined
High hiding proxy not only makes the server unclear whether the client is using proxy, but also ensures that the server does not get the real IP address of the client.
4 Choice of agents
A normal anonymous proxy IP can hide the real IP of the client, but it will also change our request information, and the server side may think that we are using a proxy. However, when using such a proxy, although the visited website cannot know the IP address of the client, it can still know that you are using a proxy, and of course, some IP-detecting web pages can still find the IP address of the client.
A highly anonymous proxy does not alter the client's request, so it looks to the server as if there is a real client browser accessing it, where the client's real IP is hidden and the server does not think we are using a proxy.
Therefore, when the crawler needs to use the crawler proxy ip, try to choose ordinary anonymous proxy and high anonymous proxy. In addition, if you want to ensure that the data is not known by the proxy server, it is recommended to use the HTTPS protocol proxy.
At this point, I believe that everyone has a deeper understanding of "what is the reptile agent ip required by Python crawler", so let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.