What is the asm format 07/16 Update SLTechnology News&Howtos

What is the asm format

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is asm format". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is asm format"?

1. Basic asm format 1. rule of grammar

Asm [volatile] ("assembly instruction")

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

All instructions must be enclosed in double quotation marks

If you have more than one instruction, you must divide it with a\ nseparator. For typesetting, you will usually add\ t.

Multiple assembly instructions can be written on one line or on multiple lines

The keyword asm can be replaced with asm

Volatile is optional, and it is possible for the compiler to optimize assembly code. After using the volatile keyword, tell the compiler not to optimize handwritten inline assembly code.

2. Test1.c insert empty instruction # include int main () {asm ("nop"); printf ("hello\ n"); asm ("nop\ n\ tnop\ n"nop"); return 0;}

Note: in C language, two consecutive strings are automatically concatenated into one, so the two strings "nop\ n\ tnop\ n\ t"nop" are automatically concatenated into one string.

Generate assembly code instructions:

Gcc-M32-S-o test1.s test1.c

The content in test1.s is as follows (only the code related to the inline assembly code is posted):

# APP # 5 "test1.c" 1 nop # 0 "2 # NO_APP / / here is the code generated by the printf statement. # APP # 7 "test1.c" 1 nop # 0 "2 # NO_APP

As you can see, the inline assembly code is wrapped in two comments (# APP... # NO_APP). Two pieces of assembly code are embedded in the source code, so you can see that the assembly code generated by the gcc compiler contains these two parts of code.

The assembly code embedded in these two parts is an empty instruction nop, which makes no sense.

3. Test2.c manipulates global variables

Embed assembly instructions in C code to calculate or perform certain functions. Let's take a look at how to manipulate global variables in inline assembly instructions.

# include int a = 1; int b = 2; int c; int main () {asm volatile ("movl a,% eax\ n\ t"addl b,% eax\ n\ t"movl% eax, c"); printf ("c =% d\ n", c); return 0;}

Basic knowledge about compilers in assembly instructions:

Eax and ebx are registers (32 bits) in x86 platforms. In the basic asm format, registers must be preceded by a percent sign.

The 32-bit register eax can be used as 16-bit (ax) or as 8-bit (ah, al). This article will only use 32-bit.

Code description:

Movl a,% eax / / copies the value of variable a to the% eax register

Addl b,% eax / / adds the value of variable b to the value (a) in the% eax register and puts the result in the% eax register

Movl eax, c / / copy the value in the% eax register to the variable c

Generate assembly code instructions:

Gcc-M32-S-o test2.s test2.c

The test2.s content is as follows (only the parts related to the inline assembly code are posted):

# APP # 9 "test2.c" 1 movl a,% eax addl b,% eax movl% eax, c # 0 "" 2 # NO_APP

As you can see, in inline assembly code, you can directly use the names of global variables an and b to operate. Execute test2 to get the correct results.

Consider a question: why can variables a, b, c be used in assembly code?

Looking at the section before the inline assembly code in test2.s, you can see:

.file "test2.c" .globl a .data .align 4 .type a, @ object .size a, 4 a: .long 1 .globl b .align 4 .type b, @ object .size b, 4 b: .long 2 .comm cmage4 4

Variables an and b are modified by .globl and c by .comm, which is equivalent to exporting them as global, so they can be used in assembly code.

So the question arises: if it is a local variable, it will not be exported with .globl in assembly code, so can it be used directly in inline assembly instructions?

Seeing is believing, let's put these three variables inside the main function and try it as local variables.

4. Test3.c attempts to operate on the local variable # include int main () {int a = 1; int b = 2; int c; asm ("movl a,% eax\ n\ t"addl b,% eax\ n\ t"movl% eax, c"); printf ("c =% d\ n", c); return 0;}

Generate assembly code instructions:

Gcc-M32-S-o test3.s test3.c

You can see in test3.s that there are no export symbols for a, b, and c, and that an and b are not used anywhere else, so they are copied directly to the stack space:

Movl $1,-20 (% ebp) movl $2,-16 (% ebp)

Let's try to compile into an executable program:

$gcc-M32-o test3 test3.c / tmp/ccuY0TOB.o: In function `main': test3.c: (.text + 0x20): undefined reference to `a' test3.c: (.text + 0x26): undefined reference to `b' test3.c: (.text + 0x2b): undefined reference to `c' collect2: error: ld returned 1 exit status

Compilation error: unable to find a reference to aPermine bjorc! What should I do to use local variables? Extend asm format!

Second, extended asm format 1. Instruction format

Asm [volatile] ("Assembly instruction": "output Operand list": "input Operand list": "changed register")

Format description

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Assembly instructions: same as basic asm format

Output Operand list: how assembly code passes processing results to C code

Input Operand list: how C code passes data to inline assembly code

Modified registers: tell the compiler which registers we use in inline assembly code

The "changed register" can be omitted, and the last colon can be omitted, but the preceding colon must be retained, even if the list of output / input operands is empty.

Explain again about the "changed registers": gcc needs to use a series of registers when compiling C code; we also use some registers in our handwritten inline assembly code.

In order to inform the compiler and let it know which registers are used by our users in the inline assembly code, we can list them here so that gcc will avoid using these listed registers

two。 Format of the list of output and input operands

In the system, there are only two places to store variables: registers and memory. Therefore, to tell the inline assembly code output and input operands is to tell it:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

To which registers or memory addresses to output the results

Which registers or memory addresses to read input data from

This process also has to meet a certain format:

[output modifier] constraint (register or memory address)

(1) constraint

It is through different characters to tell the compiler which registers, or memory addresses, to use. Include the following characters:

A: use eax/ax/al registers

B: use ebx/bx/bl registers

C: use ecx/cx/cl registers

D: use edx/dx/dl registers

R: use any available general-purpose register

M: memory location of using variables

Just keep these in mind. Other constraint options include: d, S, Q, A, f, t, u, etc., and check the document when needed.

(2) output modifier

As the name implies, it is used to modify the output, providing additional instructions for the output register or memory address, including the following four modifiers:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

+: the modified Operand can be read and written

=: modified operands can only be written to

%: the modified Operand can be interchanged with the next Operand

&: you can delete or reuse modified operands before the inline function is completed

The language description is more abstract, look at the example directly!

3. Test4.c operates on the local variable # include int main () {int data1 = 1; int data2 = 2; int data3; asm ("movl% ebx,% eax\ n\ t"addl% ecx,% eax": "= a" (data3): "b" (data1), "c" (data2)); printf ("data3 =% d\ n", data3) Return 0;}

There are two things to pay attention to:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

In inline assembly code, the list of "changed registers" is not declared, which means it can be omitted (the preceding colon is not required)

In the extended asm format, 2% must be written before the register.

Code interpretation:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

"b" (data1), "c" (data2) = = > copy the variable data1 to register% ebx and variable data2 to register% ecx. In this way, in the inline assembly code, you can manipulate these two numbers through these two registers.

"= a" (data3) = = > put the processing result in the register% eax and copy it to the variable data3. The previous modifier equal sign means that the data is written to% eax and not read from it.

With the above format, inline assembly code can use the specified register to manipulate local variables, and later you will see how local variables are copied from the stack space to the registers.

Generate assembly code instructions:

Gcc-M32-S-o test4.s test4.c

The assembly code test4.s is as follows:

Movl $1,-20 (% ebp) movl $2,-16 (% ebp) movl-20 (% ebp),% eax movl-16 (% ebp),% edx movl% eax,% ebx movl% edx,% ecx # APP # 10 "test4.c" 1 movl% ebx,% eax addl% ecx,% eax # 0 "2 # NO_APP movl% eax,-12 (% ebp)

As you can see, before entering the handwritten inline assembly code:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Copy the number 1 through the stack space (- 20 (% ebp)) to register% eax and then to register% ebx

Copy the number 2 through the stack space (- 16 (% ebp)) to register% edx and then to register% ecx

These two operations correspond to the "input Operand list" section of the inline assembly code: "b" (data1), "c" (data2).

After inline assembly code (after # NO_APP), copy the value from the% eax register to the-12 (% ebp) position on the stack, which is where the local variable data3 is located, thus completing the output operation.

4. Test5.c declares the changed register

In test4.c, we do not declare the changed registers, so the compiler can choose which registers to use at will. As you can see from the generated assembly code test4.s, gcc uses the% edx register.

So let's test it: tell gcc not to use the% edx register.

# include int main () {int data1 = 1; int data2 = 2; int data3; asm ("movl% ebx,% eax\ n\ t"addl% ecx,% eax": "= a" (data3): "b" (data1), "c" (data2): "% edx"); printf ("data3 =% d\ n", data3); return 0 }

In the code, the last part of the asm instruction, "% edx", is used to tell the gcc compiler: in inline assembly code, we will use the% edx register, so you don't use it.

Generate assembly code instructions:

Gcc-M32-S-o test5.s test5.c

Take a look at the generated assembly code test5.s:

Movl $1,-20 (% ebp) movl $2,-16 (% ebp) movl-20 (% ebp),% eax movl-16 (% ebp),% ecx movl% eax,% ebx # APP # 10 "test5.c" 1 movl% ebx,% eax addl% ecx,% eax # 0 "2 # NO_APP movl% eax,-12 (% ebp)

As you can see, gcc did not choose to use the register% edx before inlining the assembly code.

Third, use placeholders instead of register names

In the above example, only two registers are used to manipulate two local variables, and if there are many operands, it is inconvenient to write the name of each register in inline assembly code.

Therefore, extending the asm format gives us another lazy way to use the registers in the list of output and input operands: placeholders!

Placeholders are a bit similar to batch scripts, using 2. To reference input parameters, the placeholder in the inline assembly code starts with the registers in the output Operand list, numbering from 0 to all registers in the input Operand list.

Or look at the example is more direct!

1. Test6.c uses placeholders instead of registers # include int main () {int data1 = 1; int data2 = 2; int data3; asm ("addl% 1,% 2\ n\ t"movl% 2,% 0": "= r" (data3): "r" (data1), "r" (data2)); printf ("data3 =% d\ n", data3); return 0 }

Code description:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Output Operand list "= r" (data3): the constraint uses the character r, that is, no register is specified, and it is up to the compiler to choose which register to use to store the result, and finally copy it to the local variable data3

Input Operand list "r" (data1), "r" (data2): constraint character r, no registers are specified, and the compiler chooses which two registers to use to receive the local variables data1 and data2

Only one register is needed in the list of output operands, so the% 0 in the inline assembly code represents this register (that is, counting from 0)

There are two registers in the input Operand list, so% 1 and% 2 in the inline assembly code represent these two registers (that is, counting sequentially from the last register in the output Operand list)

Generate assembly code instructions:

Gcc-M32-S-o test6.s test6.c

The assembly code is as follows: test6.s

Movl $1,-20 (% ebp) movl $2,-16 (% ebp) movl-20 (% ebp),% eax movl-16 (% ebp),% edx # APP # 10 "test6.c" 1 addl% eax,% edx movl% edx,% eax # 0 "" 2 # NO_APP movl% eax,-12 (% ebp)

As you can see, the gcc compiler selects% eax to store the local variable data1,%edx to store the local variable data2, and then the result of the operation is also stored in the% eax register.

Does it feel much more convenient to operate in this way? There is no need for us to specify which registers to use, just give it to the compiler to choose.

In inline assembly code, use placeholders such as% 0,% 1,% 2 to use registers.

Don't worry, if you find it troublesome and error-prone to use numbering, there is another more convenient operation: extending the asm format also allows you to rename these placeholders, that is, to give each register an alias, and then use aliases to manipulate registers in inline assembly code.

Or look at the code!

2. Test7.c aliases the register # include int main () {int data1 = 1; int data2 = 2; int data3; asm ("addl% [v1],% [v2]\ n\ t"movl% [v2],% [v3]": [v3] "= r" (data3): [v1] "r" (data1), [v2] "r" (data2)) Printf ("data3 =% d\ n", data3); return 0;}

Code description:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Output Operand list: give the register (selected by the gcc compiler) an alias v3

Input Operand list: aliases v1 and v2 for the register (selected by the gcc compiler)

After standing up aliases, you can use these aliases (% [v1],% [v2],% [v3]) to manipulate data directly in inline assembly code.

Generate assembly code instructions:

Gcc-M32-S-o test7.s test7.c

Let's take a look at the generated assembly code test7.s:

Movl $1,-20 (% ebp) movl $2,-16 (% ebp) movl-20 (% ebp),% eax movl-16 (% ebp),% edx # APP # 10 "test7.c" 1 addl% eax,% edx movl% edx,% eax # 0 "" 2 # NO_APP movl% eax,-12 (% ebp)

The assembly code for this part is exactly the same as in test6.s!

Fourth, use memory location

In the above example, both the output Operand list and the input Operand list section use registers (constraint characters: a, b, c, d, r, etc.).

We can specify which registers to use, or we can leave it to the compiler to choose which registers to use. It is faster to manipulate the data through registers.

If we like, we can also directly use the memory address of the variable to manipulate the variable, which requires the use of the constraint character m.

1. Test8.c uses memory address to manipulate data # include int main () {int data1 = 1; int data2 = 2; int data3; asm ("movl% 1,% eax\ n\ t"addl% 2,% eax\ n\ t"movl% eax,% 0": "= m" (data3): "m" (data1), "m" (data2)) Printf ("data3 =% d\ n", data3); return 0;}

Code description:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Output Operand list "= m" (data3): directly use the memory address of the variable data3

Enter operands list "m" (data1), "m" (data2): directly use the memory address of variables data1, data2

In inline assembly code, because you need to add, you need to use a register (% eax), which is definitely needed for this calculation.

When manipulating data in those memory addresses, sequentially numbered placeholders are still used.

Generate assembly code instructions:

Gcc-M32-S-o test8.s test8.c

The generated assembly code is as follows:

Movl $1,-24 (% ebp) movl $2,-20 (% ebp) # APP # 10 "test8.c" 1 movl-24 (% ebp),% eax addl-20 (% ebp),% eax movl% eax,-16 (% ebp) # 0 "2 # NO_APP movl-16 (% ebp),% eax

You can see: before entering the inline assembly code, put the values of data1 and data2 on the stack, then directly manipulate the data in the stack with the register% eax, and finally copy the operation result (% eax) to the location of the data3 in the stack (- 16 (% ebp)).

At this point, I believe that you have a deeper understanding of "what is asm format", might as well come to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.