How to use Python crawler to send Chinese HTTP request header 04/17 Update SLTechnology News&Howtos

How to use Python crawler to send Chinese HTTP request header

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to use Python crawler to send Chinese HTTP request headers". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "how to use Python crawler to send Chinese HTTP request headers"!

Sometimes you need to set the value of the HTTP request header to Chinese, but if you set it directly to Chinese, an exception will be thrown. For example, the following code sets the Chinese value for the Chinese request header.

From urllib import request url = 'http://httpbin.org/post' headers= {' User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10'14'3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36', 'Host':'httpbin.org',' Chinese':' Li Ning',} req = request.Request (url = url,headers=headers,method= "POST") request.urlopen (req)

Executing this code throws the following exception.

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range

This exception indicates that the HTTP request header can only be English characters and symbols, not double-byte text, such as Chinese. In order to solve this problem, we need to encode the Chinese when setting the HTTP request header, and then send it to the server, and then decode it with the same rules on the server. You can use a variety of encoding methods, such as url coding, base64 coding, url coding is in the browser address bar, if you enter Chinese, it will be converted to% xx form. If you enter "China", it will become E4%B8%AD%E5%9B%BD.

To encode the string url, you need to use the urlencode function of the urllib.parse module, and the unquote function is used for decoding. The code is as follows:

From urllib.parse import unquote,urlencode # Encoding Chinese value = urlencode ({'name':' Li Ning'}) print (value) # Decoding Chinese print (unquote (value))

If you execute this code, you will output the following result:

Name=%E6%9D%8E%E5%AE%81 name= Li Ning

When you use the urlencode function to encode, you need to specify the dictionary type, and you cannot encode the string directly. Because the urlencode function can only encode the url parameter.

Base64 encoding needs to use the b64encode function in the base64 module, and decoding uses the b64decode function. The code is as follows:

Import base64 # Encoding Chinese base64Value = base64.b64encode (bytes ('Python from rookie to master', encoding='utf-8') print (str (base64Value,'utf-8')) # decodes the Chinese language and converts the decoded result into a string print (str (base64.b64decode (base64Value), 'utf-8') according to utf-8 encoding format)

The b64encode function returns the bytes type after encoding, which needs to be converted to a string type using the str function. The b64decode function needs to specify a value of type bytes when decoding, and the return value of the b64decode function is also of type bytes, so the str function is also required to convert the return value of the function to a string.

The following example demonstrates the complete process of setting a Chinese HTTP request header and decoding it.

From urllib import request from urllib.parse import unquote,urlencode import base64 url = 'http://httpbin.org/post' headers = {' User-Agent':'Mozilla/5.0 (Macintosh Intel Mac OS X 10: 14: 3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36', 'Host':'httpbin.org',' Chinese1':urlencode ({'name':' Li Ning'}), # set Chinese HTTP request header, set Chinese HTTP request header with url encoding format # Use the base64 encoding format 'MyChinese':base64.b64encode (bytes (' this is the Chinese HTTP request header, encoding='utf-8')), 'who':'Python Scrapy'} dict = {' name':'Bill', 'age':30} data= bytes (urlencode (dict), encoding='utf-8') req = request.Request (url = url,data=data,headers=headers,method= "POST") # add the Chinese HTTP request header through the add_header method Url encoding format req.add_header ('Chinese2' Urlencode ({"nationality": "China"}) response=request.urlopen (req) # gets the response information from the server value = response.read (). Decode ('utf-8') print (value) import json # converts the return value to json object responseObj = json.loads (value) # Decoding HTTP request header print (unquote (responseObj [' headers'] ['Chinese1'])) # Decoding HTTP in url format Request header print (unquote (responseObj ['headers'] [' Chinese2'])) # Decoding HTTP request header print in base64 encoding format (str (base64.b64decode ['headers'] [' Mychinese'])) 'utf-8')) so far I believe you have a deeper understanding of "how to use Python crawler to send Chinese HTTP request headers". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.