Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the Encoding and Decoding of url

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces you how to carry out url codec analysis, the content is very detailed, interested friends can refer to, hope to be helpful to you.

1. URL codec 1.1, special characters

Characters from the ASCII (8-bit) character set are used in url. For example, East Asian Chinese needs to be specially encoded in url, and the rule is to add two hexadecimal digits after the percent sign, which is the same as its corresponding position in the ASCII character table.

Special circumstances:

1. If space characters can no longer be used in url, advanced browsers can encode them automatically, such as http://192.168.90.162/advanced%20search.html automatically when visiting http://192.168.90.162/advanced search.html browsers

2. Some characters make url illegal because they are reserved characters. For example, slash characters will be used to split paths, and if you use slash characters instead of using them as paths, you need to escape such as slashes (of course: the file name created under windows cannot contain any characters in "\ /:? |"), but linux is flexible such as (touch: a touch\% 3e, #% 23,% 25, ~% 7e,\% 5C, etc.).

1.2.The method of url coding and decoding of urllib.parse module

Quote method: encode URL

Unquote: decode url

Quote_plus: as with the quote method, spaces are further represented as + symbols

Unquote_plus: same as the unquote method, turning the + symbol into a space

> r2=quote ("/ ~ test aggin & /") > R2 quote% 7Etest% 20aggin% 20% 26bind'# Note that spaces and & symbols have been encoded, while "/" has not been converted > > un_r2=unquote (R2) > > un_r2'/~test aggin & /'> > quote ("~ / test", "~ /") # use of the second parameter "~": when the "~" character is added to the ignore character list, the "~" is not encoded in the output again. ~ /% 20test'

When quote_plus uses it, it replaces the "/" character with% 2F and the space character with the + symbol, which is the main difference from quote, as follows:

> quote_plus ("/ ~ test aggin & /")'% 2F% 7Etestworthy aggressiveness% 26% 2F'> unquote_plus ('% 2F% 7Etestroomagginosity% 26% 2F')'/ ~ test aggin & / '1.3, Chinese codec

Http://192.168.90.162/ Chinese .html is now recognized by mainstream browsers such as chrome,IE,edge. However, the problem can not be ignored, it is not always safe to use url encoding for unsafe characters.

> quote ("Chinese code")'% E4% B8% AD% E6% 96% 87% E7% BC% 96% E7% A0% 81'> > unquote ('% E4% B8% AD% E6% 96% 87% E7% BC% 96% E7% A0% 81') 'Chinese code' 1.4, query parameter code

Http has two data submission methods: get and post. Both query parameters and their values need to be submitted. This value is also displayed in the address bar in the special get method, such as:

Https://www.baidu.com/s?wd=url%E7%BC%96%E7%A0%81&rsv_spt=1&rsv_iqid=0xb6a3f6f50003c26a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf8&rqlang=cn&tn=baiduhome_pg&rsv_enter=0&oq=url%25E7%25BC%2596%25E7%25A0%2581&rsv_t=a3a7nKjz1kdhBSmGXcoRFeCicwee05LR5H9nq5W5GZWVKwR × × WAjMl98d3Rfvhq%2FCJb&rsv_pq=8532218200044be3

Unquote ("url%E7%BC%96%E7%A0%81")? Wd=url%E7%BC%96%E7%A0%81 will display the value in the address bar 'url code'

Method for encoding query parameters: urlencode is provided by urllib.parse module

Function: return the parameter-value pairs of the query to a new style encoded by URL. Parameter value pairs can be a series of dictionary data.

> urlencode ([('key1','value1'), (' key2','value2')]) 'key1=value1&key2=value2' > help (urlencode) Help on function urlencode in module urllib.parse:

Description 1. The query parameter is a (keyword,value) list, where the length of the list is 2. Pay attention to the tuples in the parameter format list

2. Urlencode encodes the parameter value pair list into a URL string, and the query parameter order is the same as that in the list.

> urlencode ([("key1", "urlencode Encoding")]) 'key1=urlencode%E7%BC%96%E7%A0%81'

The urlencode method can also accept an optional parameter to control the data in the input query parameters. The default is False. That is, when the value in the query data (keyword,value) list is also a list, it is encoded using the entire quote_pluse method.

> urlencode ([("keyword", ("value1", "value2", "value3")])) 'keyword=%28%27value1%27%2C+%27value2%27%2C+%27value3%27%29' > urlencode ([("keyword", ("value1", "value2", "value3"))], True) but when the parameter is True, each value in the value list will be matched with keyword as a query parameter.' Keyword=value1&keyword=value2&keyword=value3' on how to carry out url codec analysis is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report