Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand registers in assembly language

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about how to understand the registers in assembly language, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something from this article.

Now let's introduce something about registers. As we know, register is the internal structure of CPU, which is mainly used for the storage of information. In addition, there is an arithmetic unit inside the CPU, which is responsible for processing data; the controller controls other components; the external bus connects CPU and various components for data transmission; the internal bus is responsible for the data processing of various components within the CPU.

So for the assembly language as we know it, our main concern is registers.

Why is the register present? Because we know that programs are loaded in memory and run by CPU, the main responsibility of CPU is to process data. Then this process is bound to involve reading and writing data from memory, because it involves sending data requests through the control bus and entering the memory storage unit, and getting data through the same channel, this process is very tedious and involves a lot of memory consumption, and there are some commonly used memory pages, which are actually unnecessary, so registers appear and are stored in CPU.

Recognition register

There are many official names for registers, Wiki above is called Processing Register, can also be called CPU Register, there is often a thing in the computer called more than one thing, anyway, you know all about registers on it.

Before we recognize the registers, let's first take a look at the internal structure of CPU.

Logically, CPU can be divided into three modules, namely, control unit, operation unit and memory unit, which are connected by CPU internal bus.

The CPU of almost all von Neumann computers can be divided into five stages: fetching instructions, decoding instructions, executing instructions, accessing access numbers, and writing back results.

The instruction fetch phase is the process of reading instructions in memory into registers in CPU, which are used to store the address of the next instruction.

In the instruction decoding stage, after fetching the instruction, it immediately enters the instruction decoding stage. in the instruction decoding stage, the instruction decoder splits and interprets the retrieved instructions according to the predetermined instruction format. identify and distinguish different instruction categories and various methods to obtain operands.

In the stage of executing the instruction, after the decoding is completed, it is necessary to execute this instruction. The task of this stage is to complete the various operations specified in the instruction and specifically realize the function of the instruction.

In the access data fetching stage, it may be necessary to extract data from memory according to the needs of the instruction. the task of this stage is to get the address of the Operand in main memory according to the instruction address code, and read the Operand from main memory for operation.

The result writeback phase, as the last stage, writes the result data of the execution instruction phase back to the internal register of the CPU, so that it can be quickly accessed by subsequent instructions.

Registers in computer architecture

Register is a very fast computer memory, the following is a comparison of components with storage function in modern computers, we can see that the speed of registers is the fastest, but also the most expensive.

Let's take the intel 8086 processor as an example. 8086 processor is the predecessor of x86 architecture. 8088 is derived after 8086.

In 8086 CPU, there are 20 address buses, so the maximum addressing capacity is 2 ^ 20 to the power of 1MB, as is the case with 8088.

In the 8086 architecture, all internal registers, internal and external buses are 16 bits wide and can store two bytes because they are full 16-bit microprocessors. The 8086 processor has 14 registers, each with a unique name, namely

AX,BX,CX,DX,SP,BP,SI,DI,IP,FLAG,CS,DS,SS,ES

These 14 registers may be divided into three categories according to their functions.

Universal register

Control register

Segment register

Let's introduce these registers respectively.

Universal register

There are four main general-purpose registers, namely, AX, BX, CX and DX. Similarly, these four registers are also 16-bit and can store two bytes. The four registers AX, BX, CX and DX are generally used to store data, also known as data registers. Their structure is as follows

The previous generation register of 8086 CPU is 8080, which is a kind of 8-bit CPU. In order to ensure compatibility, 8086 has made a small modification on 8080. The general registers AX, BX, CX and DX in 8086 can be used independently by two 8-bit registers.

In terms of details, AX, BX, CX, and DX can be further divided.

AX (Accumulator Register): accumulation register, which is mainly used for input / output and large-scale instruction operations.

BX (Base Register): base address register, used to store the base access address

CX (Count Register): count registers, CX registers count in cycles during iterative operations

DX (data Register): data register, which is also used for input / output operations. It is also used with AX registers and DX for multiplication and division operations involving large values.

These four registers can be divided into the upper part and the lower part, which are used as eight 8-bit data registers.

AX registers can be divided into two independent 8-bit AH and AL registers

BX registers can be divided into two independent 8-bit BH and BL registers

CX registers can be divided into two independent 8-bit CH and CL registers

DX registers can be divided into two independent 8-bit DH and DL registers

Except for the AX, BX, CX and DX registers above, none of the other registers can be divided into two separate 8-bit registers.

This is shown in the following figure.

Together, it is.

The low (0-7) bit of AX forms the AL register, and the high 8-bit (8-15) bit forms the AH register. AH and AL registers are 8-bit registers that can be used, the same goes for others.

After knowing the registers, let's take a look at how the data is stored through an example.

Data 19, for example, is stored in 16-bit memory as follows

The storage mode of the register is to store the low bit first, and the high bit if the low bit can not be satisfied. If the low bit can be satisfied, the high bit is filled with 0, and if other low bits can be satisfied, the rest bits are also filled with 0.

8086 CPU can store two types of data at a time

Byte: a byte consists of 8 bit, which is a constant storage method

Word (word): a word is a fixed-size data processed by an instruction set or processor hardware as a unit. For intel, a word length is two bytes. Word is a very important feature of the computer. For different instruction set architectures, the data processed by the computer at one time is also different. In other words, machines with different instruction sets can deal with unused word length at one time, such as word, double word (32-bit), four-word (64-bit), and so on.

AX register

As we discussed above, another name of AX is accumulator or accumulator for short, which can be divided into two independent 8-bit registers AH and AL;. In compiling assemblers, AX registers can be said to be the most frequently used registers.

Here are a few pieces of assembly code

Mov ax,20 / * put 20 into register AX*/mov ah,80 / * put 80 into register AH*/add ax,10 / * add the value in register AX to 8 * /

Note here: ax and ah appear in the above code, but AX and AH do appear in the comments, but the meaning is the same and is not case-sensitive.

Compared with other general registers, AX is special. AX has a special function, that is, the use of DIV and MUL instructions.

DIV is the division instruction in 8086 CPU.

MUL is the multiplication instruction in 8086 CPU.

BX register

BX is called a data register, which indicates that it can temporarily store general data. Also to accommodate the previous 8-bit CPU, BX can be used as two separate 8-bit registers, namely BH and BL. In addition to the function of temporarily storing data, BX is also used for addressing, that is, finding physical memory addresses. The data stored in the BX register is generally used as an offset address, because the offset address is of course offset on the base address. The offset address is stored in the segment register, which we will talk about later.

CX register

CX is also a data register, which can temporarily store general data. Also to accommodate the previous 8-bit CPU, CX can be used as two separate 8-bit registers, namely CH and CL. In addition, CX also has its own special use, and the C in CX is translated into Counting, that is, the function of counter. When using loop LOOP instructions in assembly instructions, you can specify the number of times you need to loop through CX. Each time the loop LOOP is executed, CPU does two things

One thing is that the counter is automatically minus 1.

Another thing is to judge the value in CX. If the value in CX is 0, it will jump out of the loop and continue to execute the instructions under the loop.

Of course, if the value in CX is not 0, the instruction specified in the loop continues.

DX register

DX is also a data register, which can temporarily store general data. Also to accommodate the previous 8-bit CPU, the use of DX was already introduced in the previous introduction to the AX register, that is, support for MUL and DIV instructions. At the same time, numerical overflow is also supported.

# Segment register

CPU contains four segment registers that serve as the base location for program instructions, data, or stacks. In fact, references to all memory on IBM PC contain a segment register as the base location.

The segment register mainly contains

CS (Code Segment): code register, the base location of the program code

DS (Data Segment): data register, the basic location of the variable

SS (Stack Segment): stack register, the base location of the stack

ES (Extra Segment): other registers, other basic locations of variables in memory.

Index register

The index register mainly contains the offset of the segment address, and the index register is mainly divided into

BP (Base Pointer): base pointer, which is the offset on the stack register and is used to locate variables on the stack

SP (Stack Pointer): stack pointer, which is the offset on the stack register and is used to locate the top of the stack

SI (Source Index): indexing register, used to copy source strings

DI (Destination Index): destination indexing register, which is used to copy to the target string

Status and control register

There are only two registers left, the instruction pointer register and the flag register:

IP (Instruction Pointer): instruction pointer register, which stores the next instruction executed from the offset at the Code Segment code register

FLAG: the Flag register is used to store the status of the current process, which are

Location (Direction): the transmission direction used for the data block, whether to transfer up or down

Interrupt flag bit (Interrupt): 1-allow; 0-prohibit

Trap: determines whether CPU should stop after the execution of each instruction is complete. 1-on, 0-off

Carry (Carry): sets whether the last unsigned arithmetic operation has carry

Overflow: sets whether the last signed operation overflows

Symbol (Sign): if the last arithmetic operation is negative, set 1 = negative, 0 = positive

Zero: if the last arithmetic operation results in zero, 1 = zero

Auxiliary carry (Aux Carry): for carry from the third to the fourth place

Parity (Parity): for parity

Physical address

As we all know, when CPU accesses memory, it needs to know the specific address of accessing memory. Memory unit is the basic unit of memory. Each memory unit has a unique address in memory, which is the physical address. The interaction between CPU and memory has three buses, namely, the data bus, the control bus and the address bus.

CPU feeds the physical address into memory through the address bus, so how does the CPU form the physical address? This will be the focus of our next discussion.

Now, let's first discuss the structural issues related to 8086 CPU.

Cxuan has been talking to you for such a long time, you should know that 8086 CPU is a 16-bit CPU, so what is a 16-bit CPU?

You may have heard this answer roughly. 16-bit CPU means that CPU can process 16-bit data at a time. Being able to answer this question means that your bottom level is good, but it is not comprehensive enough. In fact, 16-bit CPU refers to

The arithmetic unit inside CPU can process up to 16 bits of data at a time.

The arithmetic unit is actually ALU, the operation control unit. It is one of the three core devices in CPU, which is mainly responsible for the operation of data.

The maximum width of the register is 16 bits

The maximum width of this register is worth the maximum number of binary digits that the general register can handle.

The path between the register and the arithmetic unit is 16 bits

This refers to the bus between the register and the arithmetic unit, which can transmit 16 bits of data at a time.

Well, now you know why it's called a 16-bit CPU.

After you know the answer to the above question, let's talk about how to calculate the physical address.

8086 CPU has a 20-bit address bus, and each bus can transmit one-bit addresses, so 8086 CPU can transmit 20-bit addresses, that is, 8086 CPU can achieve the addressing power of 2 ^ 20, that is, 1MB. 8086 CPU is also a 16-bit structure. from the structure of 8086 CPU, it can only transmit 16-bit addresses, that is, 2 ^ 16 power, that is, 64 KB, so how can it achieve the addressing capability of 1MB?

It turns out that the interior of 8086 CPU uses two 16-bit addresses to transmit a 20-bit physical address, as shown in the following figure

Describe the process described in the figure above

The related components in CPU provide two addresses: segment address and offset address, both of which are 16-bit, and they are transformed into 20-bit physical addresses through the address adder, which is the physical address passed to memory by the input and output control circuit, thus completing the translation of physical addresses.

The address adder uses the physical address = segment address * 16 + offset address to synthesize the physical address with the segment address and the offset address.

The following is the workflow of the address adder

In fact, the segment address * 16 is to move 4 bits to the left. In the above description, the physical address = segment address * 16 + offset address is actually a concrete implementation of the basic address + offset address = physical address addressing mode. The base address is actually equal to the segment address * 16.

You may not be clear about the concept of paragraph, so let's discuss it.

What is a paragraph?

The concept of segment often appears in the operating system, such as in memory management, the operating system will divide different data into segments to store, such as code segment, data segment, bss segment, rodata segment and so on.

But these partitions are not done by memory. Cxuan tells you who did it. It is actually done by Boss CPU behind the scenes, and memory is used as an object of condemnation.

In fact, the memory is not segmented, the segmentation is entirely done by CPU. As mentioned above, the physical address of the memory unit is given by the basic address + offset address = physical address, so that we can manage CPU by segments.

As shown in the figure

This is a schematic diagram of two 16-KB programs being loaded into memory, and you can see that the segment addresses of both programs are 16380.

It should be noted that the 8086 CPU segment address is calculated as the segment address * 16, so the addressing capacity of 16 bits is 2 ^ 16, so the length of a segment is 64 KB.

Segment register

Cxuan just briefly introduced the concept of segment register for you above, and the introduction of segment register is a little shallow, and the introduction of segment register does not introduce the true face of the segment. Now I will introduce it in detail for you. I believe that after reading the concept of the above section, the segment register is also easy.

We mentioned the concept of related components in the diagram that synthesizes physical addresses, which are actually segment registers, namely CS, DS, SS, ES. When 8086 CPU accesses memory, these four registers provide the segment address of the memory unit.

CS register

If you want to talk about CS registers, then IP registers are the ones you can't get around. CS and IP are both very important registers for 8086 CPU, and they indicate the address of the instruction that CPU currently needs to read.

The full name of CS is Code Segment, or code register, while the full name of IP is Instruction Pointer, or instruction pointer. Now you know why these two appeared together!

In 8086 CPU, what is pointed to by CS:IP is executed as an instruction. As shown in the following figure

Explain the picture above.

In CPU, the segment address is provided by CS and IP, and the adder is responsible for converting to the physical address, the input and output control circuit is responsible for input / output data, the instruction buffer is responsible for buffering instructions, and the instruction executor is responsible for executing instructions. There is a continuous storage area in memory where machine code is stored inside and addresses and assembly instructions are stored on the outside.

In the figure above, the segment address and offset address are 2000 and 0000, respectively. When these two addresses enter the address adder, the address adder will be responsible for translating the two addresses into physical addresses.

The address adder is then responsible for transmitting the instructions to the input and output control circuit

The input and output control circuit sends the 20-bit address bus to memory.

Then take out the corresponding data, that is, B8, 23, 01, B8 and BB in the figure are operands.

The control input / output circuit sends B8 23 01 into the instruction buffer.

At this point, the instruction is ready for execution, and the IP, that is, the instruction pointer, will automatically increase. The IP we mentioned above is actually the offset address from Code Segment, that is, CS. It will know the next address to read the instruction, as shown in the following figure

After that, the instruction executes the extracted instruction B8 23 01.

Then 2000 and 0003 are sent to the address adder for subsequent instruction reading. The latter instruction reading process is the same as what we discussed above, so cxuan will not repeat it here.

Through the above description, we can summarize the working process of 8086 CPU.

The segment register provides the segment address and offset address to the address adder.

The physical address is calculated by the address adder and the physical address is sent to memory through the input / output control circuit.

The instruction corresponding to the physical address is extracted, retrieved through the control circuit and sent to the instruction buffer.

IP continues to point to the address of the next instruction, while the instruction executor executes instructions in the instruction buffer

What is Code Segment?

Code Segment is the code snippet, which is the basic address stored in the CS register, that is, the segment address, which is essentially the address of a set of memory units, such as mov ax,0123H, mov bx, 0003H above. We can store a set of code with length N in a set of memory units with contiguous addresses that are actually multiples of 16, and we can assume that this piece of memory is used to store the code.

DS register

When CPU reads and writes a memory unit, it needs to know the address of the memory unit. In 8086 CPU, there is a DS register, which is usually used to hold the address of the segment that accesses the data. If you want to read 10000H of data, you may need the following code

Mov bx,10000Hmov ds,bxmov a1, [0]

The above three instructions read 10000H into A1.

In the assembly code above, the mov instruction can be transmitted in two ways

One is to send the data directly into the register.

One is to send the contents of one register to another.

But not only that, mov instructions have the following expressions

Describe examples of mov registers, data such as mov ax,8mov registers, registers such as mov ax,bxmov registers, memory units such as mov ax, [0] mov memory units, registers such as mov [0], axmov segment registers, registers such as mov ds,ax stack

Stack, which I believe most of my friends are already very familiar with, is a kind of storage space with special access methods. Its particularity is that the elements that enter the stack first and then go out, that is, what we often say, first in and then out.

It's like a big storage box, in which you can put the same type of things, such as books, the first book in the storage box is at the bottom, and the last book in the storage box is at the top. If you want to take the book, you must start from the top, otherwise you can't take out the bottom book.

The data structure of the stack is like this. The operation that you press the book into the storage box is called push, and the operation that you take the book out of the storage box is called pop. The model diagram looks like this.

Entering the stack is equivalent to increasing the operation, and going out of the stack is equivalent to deleting the operation, but it is called differently. Unlike memory, the stack does not need to specify the address of the element. Its approximate use is as follows

/ / push data Push (123); Push (456); Push (789); / / pop-up data j = Pop (); k = Pop (); l = Pop ()

In the stack, LIFO means that the last data saved in the array of the stack (Last In) will be read out first (First Out).

Stack and SS register

Let's describe the push-in and pop-up process of the stack through a piece of assembly code.

8086 CPU provides in-stack and out-of-stack instructions, the most basic of which are PUSH (on-stack) and POP (out-of-stack). For example, push ax will push the data from the ax register into the stack, and pop ax means to take the data from the top of the stack and feed it into the ax register.

One thing to note here: both stacks and unstacks in 8086 CPU are done in word units.

I first have an initial stack here without any instructions or data.

Then after we push the data in the stack, the data in the stack is as follows

The instructions involved are

Mov ax,2345Hpush ax

Note that the data is stored in two units, with the high address unit storing the high 8-bit address and the low address unit storing the low 8-bit address.

Then push the data in the stack

The instructions involved are

Mov bx,0132Hpush bx

Now that there are two pieces of data in the stack, let's perform the unstack operation.

The instructions involved are

Pop ax/* ax = 0132H * /

And then continue to take out the data.

The instructions involved are

Pop bx/* bx = * /

The complete push and pop process is as follows

Now cxuan asks you a question. The space we described above is 10000H ~ 1000FH as the access unit for push and pop instructions. But how do you know that the stack unit is 10000H ~ 1000FH? In other words, how do you select the specified stack unit for access?

In fact, 8086 CPU has a set of stack registers SS and SP. SS is a segment register, which stores the base position of the stack, that is, the top of the stack, while SP is the stack pointer, which stores the offset address. At any time, the SS:SP points to the top element of the stack. When the push and pop instructions are executed, CPU gets the address at the top of the stack from SS and SP.

Now that we can fully describe the push and pop process, cxuan will deduce this process for you.

The key changes involved in the above process are as follows.

When using the PUSH instruction to press a byte unit into the stack, SP = SP-1; that is, the top element of the stack changes.

When using the PUSH instruction to press a 2-byte word unit into the stack, SP = SP-2; that is, the element at the top of the stack will also be changed.

When 1 byte unit is popped out of the stack using the POP instruction, SP = SP + 1; that is, the top element of the stack changes.

When you use the POP instruction to eject a 2-byte word unit from the stack, SP = SP + 2; that is, the element at the top of the stack changes

The problem of crossing the boundary at the top of the stack

Now we know that 8086 CPU can use SS and SP to indicate the address at the top of the stack, and provide PUSH and POP instructions to get on and off the stack, so now you know how to find the top of the stack, but how can you make sure that the top of the stack doesn't cross the line? What is the impact of crossing the boundary at the top of the stack?

For example, the following is a schematic diagram of the top of the stack crossing the boundary.

At first, the SS:SP register points to the top of the stack, and then after push a certain number of elements to the stack space, the SS:SP is at the top of the stack space, and then push the elements inside the stack space, the stack top crossing problem will occur.

It is dangerous to cross the boundary at the top of the stack, because since we arrange an area space as a stack, other instructions and data may also be stored outside the stack space, which may belong to other programs. so doing this will confuse the computer.

We hope that 8086 CPU can solve problems on its own. After all, 8086 CPU is already a mature CPU, so we should learn to solve problems on our own.

But goose (intentionally), for 8086 CPU, this may be its long-cherished wish for a lifetime. The truth is that 8086 CPU will not guarantee that the top of the stack is out of bounds, that is to say, 8086 CPU will only tell you where the top of the stack is, and will not know how big the stack space is, so programmers need to guarantee it manually.

In addition, I have exported six copies of PDF, which are available for download free of charge, as shown below

After reading the above, do you have any further understanding of how to understand registers in assembly language? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report