Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use struct and format characters in Python

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to use struct and formatting characters in Python". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Brief introduction

There are two ways to store the contents of a file, one is binary, the other is the form of text. If you are stored in a file as text, you will encounter a problem of converting the text to the data type in Python when reading from the file. In fact, even if it is stored in the form of text, the stored data is structured, because the underlying Python is written in C, which we also call C structure here.

Lib/struct.py is the module responsible for this structural transformation.

Methods in struct

Let's take a look at the definition of struct:

_ _ all__ = [# Functions' calcsize', 'pack',' pack_into', 'unpack',' unpack_from', 'iter_unpack', # Classes' Struct', # Exceptions' error']

There are 6 methods and 1 exception.

Let's mainly look at the use of these six methods:

The method name functions struct.pack (format, v1, v2,...) Returns a bytes object containing the values v1, v2,... packaged according to the format string format. The number of parameters must exactly match the value required by the format string. Struct.pack_into (format, buffer, offset, v1, v2,...) Package v1, v2,... according to the format string format. And writes the packaged byte string to the writable buffer buffer from the position where the offset starts. Note that offset is a required parameter. Struct.unpack (format, buffer) unpacks from the buffer buffer based on the format string format (assumed by pack (format,...) Pack). The result returned is a tuple, even if it contains only one entry. The byte size of the buffer must match the size required by the format. Struct.unpack_from (format, /, buffer, offset=0) unpacks the buffer according to the format string format starting from the location offset. The result is a tuple, even if it contains only one entry. Struct.iter_unpack (format, buffer) unpacks the buffer buffer iteratively based on the format string format. This function returns an iterator that reads blocks of the same size from the buffer until its contents are exhausted. Struct.calcsize (format) returns the size of the structure corresponding to the format string format (that is, pack (format,...) The size of the resulting byte string object.

These methods are mainly packaging and unpacking operations, one of the very important parameters is format, also known as the format string, which specifies the format in which each string is packaged.

Format string

A format string is a mechanism used to specify the format of data when packaging and unpacking data. They are built using format characters that specify the type of data to be packaged / unpacked. In addition, there are special characters that control the byte order, size, and alignment.

Byte order, size and alignment

By default, the C type is represented in the machine's native format and byte order, and is correctly aligned by padding bytes if necessary (according to the rules used by the C compiler).

We can also manually specify the byte order, size, and alignment of the format string:

Character byte order size alignment @ by original byte = no big end standard according to original byte standard none! Network (= large end) standard none

Big end and small end are two ways of data storage.

The first Big Endian stores high-order bytes at the starting address

The second type of Little Endian stores bytes of status at the starting address.

In fact, Big Endian is more in line with human reading and writing habits, while Little Endian is more in line with machine reading and writing habits.

At present, among the two mainstream CPU camps, PowerPC series uses big endian to store data, while x86 series uses little endian to store data.

If different CPU schemas communicate directly, problems may arise because of the different reading order.

Padding is only automatically added between contiguous structure members. Padding is not added to the beginning and end of the encoded structure.

When using non-original byte size and alignment that is'','=', and'!' No padding is added when the.

Format character

Let's take a look at the format of the characters:

Format C type Python type standard size (in bytes) x fill byte string without cchar length 1 byte string 1bsigned char integer 1Bunsigned char integer 2Hunsigned short integer 2iint integer 4llong integer 4Lunsigned long integer 8Qunsigned long long integer 8nssize_t integer Nsize_t integer e (6) floating point 2ffloat floating point 4ddouble floating point 8schar [] byte string pchar [] byte string Pvoid * integer format word

For example, if we want to package an int object, we can write:

In: from struct import * In: pack: Out: B'\ n\ X00\ X00\ x00'In [103]: unpack: (10,) In [10 5]: calcsize ('i'i') Out [105]: 4

In the above example, we packaged an int object 10 and then unpacked it. And the length of the format I is calculated to be 4 bytes.

You can see that the output is b'\ n\ X00\ x00\ x00'. Let's not dig into what the output means. The first b represents byte, followed by the byte code.

Format characters can be preceded by an integer repeat count. For example, the format string '4h' has exactly the same meaning as' hhhh''.

Take a look at how to package four short types:

Pack: Out [106i]: B'\ X02\ x00\ x03\ x00\ x04\ x00\ x05\ x00'In [107l]: unpack ('4hindsight dagger b'\ X02\ x00\ x03\ x04\ x00\ x05\ x00') Out [107i]: (2pence3meme 4Jol 5)

White space characters between formats are ignored, but if it is a struct.calcsize method, there must be no white space characters in the format characters.

When the value x is wrapped using an integer format ('bounded,' bounded, 'hacked,' hacked, 'ified,' Idle, 'lumped,' lumped, 'qnotify,' Q'), a struct.error is thrown if x is outside the valid range of that format.

Format character

Apart from numbers, characters and strings are the most commonly used.

Let's first look at how to use format characters, because the length of the character is 1 byte, we need to do this:

In: pack: In: calcsize ('4c') Out: 4:

The b before the character indicates that it is a character, otherwise it will be treated as a string.

Format string

Take a look at the format of the string:

In: pack Out: b'abcd'In: unpack: (baked abcd4) In: calcsize (4s') Out: 4In: calcsize ('s') Out

You can see that calcsize returns the length of bytes for strings.

Influence of filling

The order of format characters may have an impact on size because the padding required to meet the alignment requirements is different:

> pack ('ci', baked rooms, 0x12131415) baked *\ x00\ x12\ x13\ x14\ x15' > > pack ('ic', 0x12131415, baked trees') b'\ x12\ x13\ x14\ x15'> > calcsize ('ci') 8 > > calcsize (' ic') 5

In the following example, we will show how to manually affect the fill effect:

In [120]: pack ('llh',1, 2,3) Out [120]: B'\ X01\ X00\ X02\ X00\ X03\ x00'

In the above example, we packaged the three numbers: 1, 2, 2, 3, but in a different format, long,long,short.

Because long is 4 bytes and short is 2 bytes, it is inherently misaligned.

If we want to align, we can manually populate it by adding 0l to it for 0 long:

In [118]: pack ('llh0l', 1,2,3) Out [118]: B'\ X01\ X00\ X02\ X00\ X03\ X00\ x00'In [122]: unpack ('llh0l'' B'\ X01\ X00\ X02\ X00\ X00) Out [122]: (1,2,3) complex applications

Finally, let's look at a more complex application in which data from unpack is directly read into tuples:

> record = b'raymond\ x32\ x12\ x08\ X01\ x08' > > name, serialnum, school, gradelevel = unpack ('

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report