In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Some websites won't allow the program to access them directly in the above way. If there is a problem with recognition, the site won't respond at all, so in order to fully simulate the browser's work, we need to set some Headers properties.
First of all, open our browser, debug browser F12, I use Chrome, open network monitoring, as follows, for example, Zhihu, after logging in, we will find that the interface has changed after logging in, a new interface appears, in essence this page contains a lot of content, these contents are not loaded at one time, in essence, many requests are performed, generally first request HTML files, and then load JS, CSS, etc., After many requests, the skeleton and muscle of the web page are complete, and the effect of the entire web page comes out.
The copyright of the book belongs to the author. Please contact the author for authorization and indicate the source of any form of reproduction.
Split these requests, we only look at the first request, you can see, there is a Request URL, there are headers, the following is the response, the picture is not fully displayed, friends can experiment with it. Well, this header contains a lot of information, such as file encoding, compression method, agent requested, etc.
Among them, agent is the identity of the request, if the request identity is not written, then the server may not respond, so you can set agent in the headers, for example, the following example, this example only illustrates how to set the headers, friends look at the setting format is good.
import urllib
import urllib2
url = 'http://www.server.com/login'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'username' : 'cqc', 'password' : 'XXXX' }
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
page = response.read()
In this way, we set up a header, passed in when constructing the request, added headers to the request, and the server will get a response if it recognizes that it is a request from the browser.
In addition, we also have a way to deal with "anti-theft chain," to deal with anti-theft chain, the server will identify whether the referer in the headers is itself, if not, some servers will not respond, so we can also add referer in the headers.
For example, we can construct the following headers
headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' ,
'Referer':'http://www.zhihu.com/articles' }
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.