Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Header of reptiles

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Some websites won't allow the program to access them directly in the above way. If there is a problem with recognition, the site won't respond at all, so in order to fully simulate the browser's work, we need to set some Headers properties.

First of all, open our browser, debug browser F12, I use Chrome, open network monitoring, as follows, for example, Zhihu, after logging in, we will find that the interface has changed after logging in, a new interface appears, in essence this page contains a lot of content, these contents are not loaded at one time, in essence, many requests are performed, generally first request HTML files, and then load JS, CSS, etc., After many requests, the skeleton and muscle of the web page are complete, and the effect of the entire web page comes out.

The copyright of the book belongs to the author. Please contact the author for authorization and indicate the source of any form of reproduction.

Split these requests, we only look at the first request, you can see, there is a Request URL, there are headers, the following is the response, the picture is not fully displayed, friends can experiment with it. Well, this header contains a lot of information, such as file encoding, compression method, agent requested, etc.

Among them, agent is the identity of the request, if the request identity is not written, then the server may not respond, so you can set agent in the headers, for example, the following example, this example only illustrates how to set the headers, friends look at the setting format is good.

import urllib

import urllib2

url = 'http://www.server.com/login'

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

values = {'username' : 'cqc', 'password' : 'XXXX' }

headers = { 'User-Agent' : user_agent }

data = urllib.urlencode(values)

request = urllib2.Request(url, data, headers)

response = urllib2.urlopen(request)

page = response.read()

In this way, we set up a header, passed in when constructing the request, added headers to the request, and the server will get a response if it recognizes that it is a request from the browser.

In addition, we also have a way to deal with "anti-theft chain," to deal with anti-theft chain, the server will identify whether the referer in the headers is itself, if not, some servers will not respond, so we can also add referer in the headers.

For example, we can construct the following headers

headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' ,

'Referer':'http://www.zhihu.com/articles' }

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report