What is the method of compiling basic assembler programs? 07/15 Update SLTechnology News&Howtos

What is the method of compiling basic assembler programs?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what is the method of compiling the basic assembler program". In the daily operation, I believe that many people have doubts about the method of compiling the basic assembler program. The editor consulted all kinds of materials and sorted out the simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "what is the method of compiling the basic assembler program?" Next, please follow the editor to study!

Source program 1.1 constitutes the association hypothesis between registers and segments

Assume: it means "hypothesis".

It assumes that one of the registers and programs uses segment. Segments defined by ends are associated.

This association is illustrated by assume so that the compiler can associate the segment register with a specific segment if necessary.

Label

A label refers to an address.

Codesg: before the segment, as the name of a segment, the name of the segment will eventually be compiled and processed by the linker as the segment address of a segment.

Define a segment

The function of segment and ends is to define a segment, segment indicates the beginning of a segment, and ends indicates the end of a segment.

Segment and ends are pseudo instructions used in pairs

A segment must be identified by a name in the format:

Paragraph name segment

Paragraph name ends

An assembler consists of multiple segments that are used to store code, data, or as stack space.

There must be at least one segment in a meaningful assembler, which is used to store code.

Program end tag

End is the end tag of an assembler. If the compiler encounters the pseudo instruction end in the process of compiling the assembler, it ends the compilation of the source program.

If the program is finished, add the pseudo instruction end at the end. Otherwise, when the compiler compiles the program, it cannot know where the program ends.

Note: don't confuse end with ends.

Program return

When a program finishes, it returns control of CPU to the program that enables it to run. This process is called program return.

How to return

The returned segment should be added at the end of the program.

Mov ax,4c00H

Int 21H

Program running

DOS is a single-tasking operating system.

If a program P2 is in the executable file, there must be a running program P1. After loading P2 from the executable file into memory, the control of CPU will be transferred to P2Magee P2 to run. After P2 starts running, P1 pauses.

When P2 is finished, control of CPU should be returned to P1, the program that enables it to run, after which P1 continues to run.

1.2 "programs" in source programs

Assembly source program:

Pseudo instruction (compiler processing)

Assembly instruction (compiled into machine code)

Program: an instruction or data in a source program that is eventually executed or processed by a computer.

Be careful

We can call all the contents of the source program file as the source program, and turn the instructions or data in the source program that are finally processed by the computer into a program.

The program is first stored in the source program in the form of assembly instructions, which is compiled and linked into machine code and stored in an executable file.

1.3 end of paragraph, end of program, return of program

1.4 syntax errors and logic errors

Grammatical error

Errors found by the compiler when the program is compiled

Logic error

An error that occurs at run time that a program cannot show at compile time.

2 the process of program execution 2.1 A brief process of assembly language program from writing to final execution:

2.2 connection

Action

When the source program is very large, it can be divided into multiple source program files to compile. After each source program is compiled into an object file, they are connected together with a linker to generate an executable file.

A subroutine in a library file is called in the program, and the library file needs to be connected with the object file generated by the program to generate an executable file.

After a source program is compiled, the object file containing the machine code is obtained, and some contents in the object file can not be directly used to generate the executable file. The linker processes this content as the final executable information.

Therefore, when there is only one source program file, and there is no need to call a subroutine in a library, the linker must also be used to process the target file to generate an executable file.

Note that the executable is the end result of the connection process.

The assembly language compiler is used to compile the source program in the source program file to generate the target file, and then the linker is used to connect the target file to generate an executable file that can be run directly in the operating system.

2.3 executable file

The executable file contains two parts:

Programs (machine code translated from assembly instructions in the original program) and data (data defined in the source program)

Relevant descriptive information (for example, how big the program is, how much memory space it takes, etc.)

Execute the program in the executable file

In an operating system, a program in an executable file is executed.

According to the description information in the executable file, the operating system loads the machine code and data in the executable file into memory, and initializes them (for example, setting CS:IP to the first instruction to be executed), and then CPU executes the program.

The principle that programs in executable files are loaded into memory and run

In DOS, if the program P1 in the executable file is to run, it must have a running program P2, load P1 from the executable file into memory, and give it control of CPU before P1 can run.

When P1 is finished, control of CPU should be returned to P2, the program that enables it to run.

The execution process of exe

Actual process

(1) at the prompt "C:\ masm", we enter the name of the executable file "1" and press Enter.

(2) Program running in 1.exe

(3) end the run, return, and display the prompt "C:\ masm" again.

Operation process

Operating system is a huge and complex software system composed of multiple functional modules. Any general-purpose operating system should provide a program called shell (shell), which users (operators) use to operate the computer system.

There is a program command.com in DOS, which is called the command interpreter in DOS, which is the shell of the DOS system.

(1) when we execute 1.exe directly in DOS, it is the running command that loads the programs in 1.exe into memory.

(2) command sets the CS:IP of CPU to point to the first instruction of the program (that is, the entry of the program), so that the program can run.

(3) after the program is finished, return to command, and CPU continues to run command.

2.4 tracking the execution process of the program

Debug can load the program into memory and set CS:IP to point to the entrance of the program, but Debug does not give up control of CPU, so we can use the relevant commands of Debug to step through the program and see the execution result of each instruction.

When we run Debug to track 1.exe with "Debug 1.exe" in DOS, the order in which the program loads is: command loads Debug,Debug loads 1.exe.

The order of return is: from the program in 1.exe to Debug, from Debug to command.

The loading process of programs in EXE files

Summary

After the program is loaded, the segment address of the memory area where the program is located is stored in ds. If the offset address of this memory area is 0, the address of the memory area where the program is located is: ds:0

In the first 256 bytes of this memory area, PSP,dos is used to communicate with programs.

The program is stored in the backward space from 256 bytes.

Therefore, we can get the segment address of PSP from ds, the offset address of SA,PSP is 0, then the physical address is SA × 16room0.

Because PSP occupies 256 (100H) bytes, the physical address of the program is:

SA × 160x256 = SA × 16x16x16 = (SA+16) × 160mm 0

The available segment address and offset address are expressed as SA+10:0.

3 programming 3.1 two basic problems

The computer is a machine that processes and calculates data, so there are two basic problems involved:

(1) where is the data processed?

(2) how long is the data to be processed? These two problems must be clearly or implied in the machine instructions, otherwise the computer will not work.

For brevity of description, in later lessons, we will use two descriptive symbols reg to represent a register and sreg to represent a segment register.

The collection of reg includes: ax, bx, cx, dx, ah, al, bh, bl, ch, cl, dh, dl, sp, bp, si, di

The collection of sreg includes: ds, ss, cs, es.

3.2 where is the data?

The location of the data processed by machine instructions

Most machine instructions are instructions for data processing, which can be roughly divided into three categories: read, write, and operation.

At the machine instruction level, it does not care about the value of the data, but about the location of the data it will process immediately before the instruction is executed.

Before the instruction is executed, the data to be processed can be located in three places: CPU, memory, and port.

Instruction example

Expression of data location in Assembly language

Three concepts are used to express the location of data in assembly language.

Count now (idata)

For data directly contained in machine instructions (in the instruction buffer of cpu before execution), it is called immediate number (idata) in assembly language and is given directly in assembly instructions. For example:

Mov ax,1

Add bx,2000h

Or bx,00010000b

Mov al,'a'

The data to be processed by the instruction is in the register, and the corresponding register name is given in the assembly instruction. For example:

Mov ax,bx

Mov ds,ax

Push bx

Mov ds: [0], bx

Push ds

Mov ss,ax

Mov sp,ax

Mov ax,bx

Corresponding machine code: 89D8

Execution result: (ax) = (bx)

Segment address (SA) and offset address (EA)

The data to be processed by the instruction is in memory, and the EA,SA can be given in the [X] format in the assembly instruction in a segment register.

The register that stores the segment address can be the default.

Mov ax, [0]

Mov ax, [bx]

Mov ax, [bx+8]

Mov ax, [bx+si]

Mov ax, [bx+si+8]

The segment address is in ds by default

Registers that store segment addresses can also be explicitly given.

Mov ax, [bp]

Mov ax, [bp+8]

Mov ax, [bp+si]

Mov ax, [bp+si+8]

The segment address is in ss by default

A register that explicitly gives the address of the storage segment.

Addressing mode

When the data is stored in memory, we can give the offset address of the memory unit in a variety of ways. This method of locating the memory unit is generally called addressing.

3.3 how long is the data processed by the instruction

8086CPU instructions, can handle two sizes of data, byte and word. So indicate in the machine instruction whether the instruction is a word operation or a byte operation.

For this problem, the assembly language deals with it in the following ways.

(1) indicate the size of the data to be processed by the register name.

(2) in the absence of a register name, the length of the memory unit is indicated by the operator X ptr, which can be word or byte in the assembly instruction.

(3) other methods

In the following instruction, the register indicates that the instruction is operating in bytes:

Mov al,1

Mov al,bl

Mov al,ds: [0]

Mov ds: [0], al

Inc al

Add al,100

In the following instruction, the register indicates that the instruction is performing a word operation:

Mov ax,1

Mov bx,ds: [0]

Mov ds,ax

Mov ds: [0], ax

Inc ax add ax,1000

In memory unit access instructions that do not have registers participating, it is necessary to use word ptr or byte ptr to explicitly indicate the length of memory units to be accessed.

Otherwise, CPU will not know whether the unit to be accessed is a word unit or a byte unit

In the following instruction, the memory unit accessed by the instruction is indicated in word ptr as a word unit:

Mov word ptr ds: [0], 1

Inc word ptr [bx]

Inc word ptr ds: [0]

Add word ptr [bx], 2

In the following instruction, the byte ptr indicates that the memory unit accessed by the instruction is a byte unit:

Mov byte ptr ds: [0], 1

Inc byte ptr [bx]

Inc byte ptr ds: [0]

Add byte ptr [bx], 2

Some instructions default to whether to access word units or byte units

For example, push [1000H] does not need to indicate whether you are accessing a word unit or a byte unit.

Because the push instruction only operates on words

3.4 data processing using data in code snippets

Considering such a problem, the sum of the following 8 data is calculated by programming, and the result is stored in the ax register:

0123H,0456H,0789H,0abcH,0defH,0fedH,0cbaH,0987H .

In the previous lesson, we all accumulated data from certain memory units and did not care about the data itself.

But now what we want to accumulate is the data that has been given a number.

The meaning of "dw" in the first line of the program is to define font data. Dw is define word.

Here, we use dw to define eight font data (separated by commas) that occupy 16 bytes of memory.

The instructions in the program have to accumulate the eight pieces of data, but where are the eight pieces of data?

Since they are in the code snippet and the program stores the segment address of the code segment in the CS at run time, we can get their segment address from the CS

What is the offset address of these 8 data?

Because the data defined in dw is at the beginning of the code segment, the offset address is 0, and the eight data is at offset 0, 2, 4, 6, 8, A, C, E of the code segment.

When the program runs, their addresses are CS:0, CS:2, CS:4, CS:6, CS:8, CS:A, CS:C, CS:E.

In the program, we use bx to store plus 2 incremental offset addresses, and use loops to accumulate.

Before the loop starts, the setting (bx) = 0century CSRBX points to the word unit where the first data is located.

In each loop, (bx) = (bx) + 2 the word unit in which the next data is located is pointed to.

How to make this program run directly in the system after compilation? We can specify where the entry of the boundary order is in the source program.

Explore the role of end:

In addition to notifying the compiler that the program is finished, end can also tell the compiler where the entrance to the program is.

With this approach, we can arrange the framework of the program as follows:

Using stacks in code snippets

Complete the following program and use the stack to store the data defined in the program in reverse order.

Assume cs:codesg

Codesgsegment

Dw 0123h,0456h,0789h,0abch,0defh,0fedh,0cbah,0987h

Code ends end

The idea of the program is roughly as follows:

When the program is running, the defined data is stored in the cs:0~cs:15 unit, a total of 8 word units. The data in the eight-word units are put into the stack in turn, and then out of the stack into the eight-word units in turn, so as to realize the reverse storage of the data.

The problem is that we first need to have a memory space that can be used as a stack. As mentioned earlier, this space should be allocated by the system. We can get a piece of space by defining data in the program, and then use this space as stack space.

Mov ax,cs

Mov ss,ax

Mov sp,32

We want to say that the memory space of cs:16 ~ cs:31 is used as a stack, and the stack is empty in the initial state, so if ss:sp points to the bottom of the stack, set ss:sp to point to cs:32.

For example, for:

Dw 0123H,0456H,0789H,0abcH,0defH,0fedH,0cbaH,0987H

We can say that we have defined eight font data, or we can say that we have opened up an eight-word memory space, and the data in each word unit in this space is in turn:

0123H,0456H,0789H,0abcH,0defH,0fedH,0cbaH,0987H .

Because they end up having the same effect.

Put data, code, and stack into different segments

In the previous content, we used the data and stack in the program, and we put the data, stack and code into one segment. When programming, we should pay attention to where is the data, where is the stack, where is the code. There are obviously two problems with this:

(1) put them in a paragraph to confuse the program.

(2) the data processed in the previous program is very little, and the stack space used is small, plus there is not much code, so there is no problem to put it into a section.

But if the data, stack, and code need more space than 64KB, you can't put it in a segment (the capacity of a segment cannot be greater than 64KB, which is a limitation of the 8086 pattern we use in our learning, not all processors).

Therefore, we should consider using multiple segments to store data, code, and stacks.

We define multiple segments in the same way as we define code snippets, and then define the required data in those segments, or by defining data to obtain stack space.

The address of the data "0abch" in the "data" section of the program is: data:6.

To feed it into bx, we use the following code:

Mov ax,data

Mov ds,ax

Mov bx,ds: [6]

We cannot use the following instructions:

Mov ds,data

Mov ax,ds: [6]

The instruction "mov ds,data" is incorrect because 8086CPU does not allow a value to be sent directly into the segment register.

References to segment names in the program, such as "data" in the instruction "mov ds,data", are processed by the compiler as a numeric value that represents the address of the segment.

"Code segment", "data segment" and "stack segment" are all our arrangements.

We use pseudo instructions in the source program

"assume cs:code,ds:data,ss:stack" connects cs, ds, and ss to code, data, and stack segments, respectively.

After doing so, will CPU point cs to code,ds to data,ss to stack, thus dealing with these segments as we intended?

Of course not, knowing that assume is pseudo-instruction, executed by the compiler, and information that exists only in the source program, and CPU doesn't know about it.

If you want CPU to act according to our arrangement, you must control it with machine instructions, and the assembly instructions in the source program are what CPU will execute.

How does CPU know to execute them?

At the end of the source program, we use "end start" to describe the entry of the program, which will be written into the description of the executable file. After the program in the executable file is loaded into memory, the CS:IP of CPU is set to point to this entry, thus starting to execute the first instruction in the program.

The label "start" is in the "code" section so that CPU executes the contents of the code section as instructions.

In the code section, we use the instruction:

Mov ax,stack

Mov ss,ax

Mov sp,16 sets ss to point to stack and ss:sp to point to stack:16. After CPU executes these instructions, the stack segment will be used as stack space. CPU if you want to access the data in the data segment, you can use ds to point to the data segment and use other registers (such as bx) to store the offset address of the data in the data segment

In short, how CPU deals with the contents of our defined segments, whether it is executed as an instruction, accessed as data, or as a stack space, depends entirely on the specific assembly instructions in the program, and the assembly instructions set up registers such as CS:IP, SS:SP, DS, etc.

3.5 Modular implementation: call and ret instructions

Function: call and ret instructions are transfer instructions, they both modify IP, or both CS and IP.

Ret

The ret instruction uses the data in the stack to modify the contents of the IP, thus realizing the near transfer.

When CPU executes the ret instruction, perform the following two steps:

(1) (IP) = ((ss) * 16 + (sp))

(2) (sp) = (sp) + 2

Retf

The retf instruction uses the data in the stack to modify the contents of CS and IP, so as to achieve remote transfer.

When CPU executes the retf instruction, perform the following two steps:

(1) (IP) = ((ss) * 16 + (sp))

(2) (sp) = (sp) + 2

(3) (CS) = ((ss) * 16 + (sp))

(4) (sp) = (sp) + 2

As you can see, if we use assembly syntax to interpret ret and retf instructions, then:

When CPU executes the ret instruction, it is equivalent to:

Pop IP

When CPU executes the retf instruction, it is equivalent to:

Pop IP

Pop CS

Example

Ret instruction

After the ret instruction in the program is executed, (IP) = 0 the first instruction in the code snippet is pointed to by CS _ IP.

Retf instruction

After the retf instruction in the program is executed, the CS:IP points to the first instruction in the code snippet.

Call instruction

CPU executes the call instruction and performs two steps:

(1) push the current IP or CS and IP into the stack

(2) transfer

Main application format

Call instruction can not achieve short transfer. In addition, the method of call instruction transfer is the same as that of jmp instruction.

Call instruction for transfer based on displacement

Call label (after stacking the current IP, go to the label to execute instructions)

When CPU executes the call instruction in this format, it does the following:

(1) (sp) = (sp)-2 ((ss) * 16 + (sp)) = (IP)

(2) (IP) = (IP) + 16 bit displacement

Call label

16-bit shift = address at "label"-address of the first byte after the call instruction

The range of 16-bit displacement is-32768mm 32767, which is expressed by complement.

The 16-bit displacement is calculated by the compiler at compile time.

From the above description, we can see that if we use assembly syntax to interpret call instructions in this format, then when CPU executes the instruction "call label", it is equivalent to carrying out: push IP jmp near ptr label

The call instruction whose destination address is in the instruction.

In the call instruction described above, there is no transfer destination address in the corresponding machine instruction, but the transfer displacement relative to the current IP.

The instruction "call far ptr label" implements the transfer between segments.

What happens when CPU executes the call instruction in the format "call far ptr label":

(1) (sp) = (sp)-2 ((ss) × 16 + (sp)) = (CS) (sp) = (sp)-2 ((ss) × 16 + (sp)) = (IP)

(2) (CS) = the segment address where the label is located (IP) = the offset address where the label is located

As can be seen from the above description, if we use assembly syntax to interpret call instructions in this format, then: when CPU executes the instruction "call far ptr label", it is equivalent to carrying out: push CS push IP jmp far ptr label

Transfer call instruction with address in register

Instruction format: call 16-bit register

Features:

(sp) = (sp)-2

((ss) * 16 + (sp)) = (IP)

(IP) = (16-bit register)

Assembly syntax interprets call instructions in this format. When CPU executes call 16-bit reg, it is equivalent to push IP jmp 16-bit register.

Transfer call instructions with addresses in memory

There are two formats for call instructions that transfer addresses in memory:

(1) call word ptr memory unit address

Explanation of assembly syntax: push IP jmp word ptr memory unit address, such as the following instruction: mov sp,10h mov ax,0123h mov ds: [0], ax call word ptr ds: [0] after execution, (IP) = 0123H, (sp) = 0EH

(2) call dword ptr memory unit address

Explanation of assembly syntax: push CS push IP jmp dword ptr memory unit address, for example, the following instructions: mov sp,10h mov ax,0123h mov ds: [0], ax mov word ptr ds: [2], 0 call dword ptr ds: [0] after execution, (CS) = 0, (IP) = 0123H, (sp) = 0CH

The combined use of call and ret

Let's take a look at the main execution process of the program:

(1) after the first three instructions are executed, the stack is as follows:

(2) after the call instruction is read, the code in the CPU instruction buffer of (IP) = 000EHJ CPU is B8 05 00; CPU executes B8 05 00. First of all, the situation in the stack is changed to:

Then, (IP) = (IP) + 00050013H.

(3) CPU starts at cs:0013H (that is, at the label s).

(4) after ret instruction is read: (IP) = 0016H CPU instruction buffer, the code in the instruction buffer executes C3 for C3bin CPU, which is equivalent to pop IP. After execution, the situation in the stack is as follows:

(IP) = 000EH

(5) CPU goes back to cs:000EH (that is, the instruction after the call instruction) to continue execution.

We found that we can write a program segment with a certain function, which we call a subroutine, and use the call instruction to execute it when needed.

Before the call instruction is transferred to the execution subroutine, the address of the instruction after the call instruction will be stored in the stack, so you can use the ret instruction after the subroutine, set the value of IP with the data in the stack, and then go to the code behind the call instruction to continue execution.

In this way, we can use call and ret to implement the mechanism of the subroutine.

Framework of subroutine

Labeling: the framework of the source program in which the instruction ret has subroutines:

The problem of passing parameters and results

The subroutine generally handles certain transactions according to the parameters provided, and after processing, the result (return value) is provided to the caller.

In fact, when we talk about the passing of parameters and return values, we are actually talking about how to store the parameters needed by the subroutine and the resulting return values.

We design a subroutine that can calculate the third power of N according to the N provided.

Here are two questions:

(1) where do we store parameter N?

(2) where do we store the calculated values?

Obviously, we can store it in registers, and we can put parameters in bx.

Because N × N × N is to be calculated in the subroutine, multiple mul instructions can be used. For convenience, the results can be put into dx and ax.

Subroutine

Description: calculate the third power of N

Parameter: (bx) = N

Results: (dx:ax) = N ∧ 3

Cube:mov ax,bx

Mul bx; multiply by ax and bx

Mul bx

Ret

Using registers to store parameters and results is the most commonly used method. For registers that store parameters and registers that store results, the read and write operations of callers and subroutines are just the opposite:

The caller feeds the parameter into the parameter register and fetches the return value from the result register

The subroutine takes the parameters from the parameter register and sends the return value into the result register.

At this point, the study of "what is the method of compiling the basic assembly program" is over. I hope to be able to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.