In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces how to use User Agent and proxy IP to hide your identity. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
1. Why set up User Agent
Some websites do not like to be accessed by crawlers, so they will detect connected objects. If it is a crawler, that is, a non-human click to visit, it will not allow you to continue to visit, so in order for the program to work properly, you need to hide the identity of your crawler. At this point, we can hide our identity by setting User Agent. The Chinese name of User Agent is user agent, or UA for short.
The User Agent is stored in Headers, and the server determines who is accessing it by looking at the User Agent in the Headers. In Python, if you do not set User Agent, the program will use the default parameters, then the User Agent will have the word Python, if the server checks User Agent, then the Python program without setting User Agent will not be able to access the website properly.
Python allows us to modify this User Agent to simulate browser access, and there is no doubt about its power.
Second, the common User Agent
1.Android
Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Safari/535.19
Mozilla/5.0 (Linux; U; Android 4.0.4; en-gb; GT-I9300 Build/IMM76D) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
Mozilla/5.0 (Linux; U; Android 2.2; en-gb; GT-P1000 Build/FROYO) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
2.Firefox
Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Mozilla/5.0 (Android; Mobile; rv:14.0) Gecko/14.0 Firefox/14.0
3.Google Chrome
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36
Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19
4.iOS
Mozilla/5.0 (iPad; CPU OS 5: 0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3
Mozilla/5.0 (iPod; U; CPU like Mac OS X; en) AppleWebKit/420.1 (KHTML, like Gecko) Version/3.0 Mobile/3A101a Safari/419.3
The above lists some User Agent of Andriod, Firefox, Google Chrome and iOS, which can be used directly by copy.
IV. The use of IP agents
1. Why use IP proxy
UA has been set up, but we should also consider another problem: the program runs very fast. If we use a crawler to crawl things on the website, a fixed IP will visit very frequently, which does not meet the standard of human operation, because it is impossible for human operations to visit so frequently in a few ms. So some websites will set a threshold of IP access frequency, if an IP access frequency exceeds this threshold, it is not a human visit, but a crawler.
. Proxy IP selection
Before writing the code, select an IP address on the agent IP website, where Yiniuyun agent is recommended. Their products are relatively complete, api interface calls and dynamic forwarding calls are supported, and the proxy is self-operated lines, telecom lines. My stability, availability, speed and latency are all very good. Here is an example of their dynamic forwarding agent. Dynamic forwarding means that they will provide you with a fixed ip address, which can be directly configured into the program. You do not need to call ip or manage the ip pool yourself. It is extremely convenient and easy to use, and is definitely the best choice for lazy people.
Code example, here is python as an example
1. Python
Requests
#!-*-encoding:utf-8-*-
Import requests
Import random
# Target page to visit
TargetUrl = "http://httpbin.org/ip"
# the target HTTPS page to visit
# targetUrl = "https://httpbin.org/ip"
# proxy server
ProxyHost = "t.16yun.cn"
ProxyPort = "31111"
# proxy tunnel verification information
ProxyUser = "username"
ProxyPass = "password"
ProxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s"% {
"host": proxyHost
"port": proxyPort
"user": proxyUser
"pass": proxyPass
}
# both http and https access are set with HTTP proxy
Proxies = {
"http": proxyMeta
"https": proxyMeta
}
# set IP switchover head
Tunnel = random.randint (110000th)
Headers = {"Proxy-Tunnel": str (tunnel)}
Resp = requests.get (targetUrl, proxies=proxies, headers=headers)
Print resp.status_code
Print resp.text
On how to use User Agent and proxy IP to hide the identity to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.