In-depth understanding of computer system-- from the Perspective of CAEer 04/17 Update SLTechnology News&Howtos

In-depth understanding of computer system-- from the Perspective of CAEer

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

It is said that in the previous one, the writing purpose depends on the development of "computational electromagnetics". Electromagnetic CAE technology has profoundly changed the hardware design flow, but only relying on the simulation calculation of CAE commercial software and the designer's manual parameter adjustment or relatively low-level automatic parameter scanning function, it is more and more difficult to support excellent hardware design. Learn and master the most cutting-edge design concepts and optimization strategies, rely on a solid mathematical foundation to transform the problem into a mathematical model, and rely on solid programming literacy to digitize, program and deeply embed the qualitative design concept into the hardware design stage. will it bring more possibilities to hardware design? This is the "origin" of this article.

As a computer layman, the process of learning "computer systems" is roughly divided into three stages:

Introduction, learning began with the systematic study of bilibili's "computer Science crash course" and "designing a computer from 0 to 1" in early November 2021. Combined with Japanese computer science writer Yukio Yazawa's two popular science books, "how the computer runs" and "how the Program runs", the introduction to the computer hardware system is completed.

In-depth, mainly relying on bilibili's "Wang Dao postgraduate entrance examination" video "computer composition principle" (the source of a large number of pictures), combined with the in-depth reading of the high-score computer classic black book "in-depth understanding of computer system" (author Randal E. Bryant), completed the in-depth study of "computer system", on this basis for digestion-absorption-output, and finally completed the preliminary construction of the knowledge system of the computer system.

In practice, after the establishment of the theoretical system, through the disassembly and re-understanding of the notebook motherboard, the simple practice of the knowledge system is completed.

Based on the principle of "complete system, clear structure and progressive hierarchy", this article will try to introduce all kinds of "computer system" systematically and popularly, hoping to help electromagnetic CAE designers get started. Limited by the author's professional and cognitive level, the article is inevitably biased, but also hope to correct.

Second, the content of this article will be carried out from three aspects:

1. Underlying cognition: the computer system is roughly composed of computer hardware (computer composition principle), operating system, and various software applications (algorithms and data structures). In this way, a fully functional computer is formed, and the connection between computers is realized through a computer network. As an abbreviated version of "computer system", this paper will focus on the implementation of the bottom "software / hardware" of the computer during the operation of the "code" written in high-level language.

two。 In-depth understanding: after completing the exposition of the underlying cognition of the computer composition principle, the students who want to have an in-depth understanding need to open the encapsulation of the computer components. more specifically explore the structure, function and implementation mechanism of the subsystems that make up the computer system. This part always revolves around the "hard" and "soft" aspects, and the "hard" part will go deep into the computer. This paper introduces in detail the hardware system architecture of the computer and the structure, composition and operation mechanism of each hardware subsystem (storage system, central processing system, bus system and I / O system). The "soft" part will introduce in detail the information representation and processing methods based on the "0ram 1" binary system, the construction mode of the "instruction system" that controls the coordination of computer hardware, the operation mechanism, and the "rules" and "strategies" within each hardware subsystem to maintain its orderly work.

If a computer system is compared to a company, then the "hard" part is similar to the functional departments (similar to hardware subsystems) made up of the basic elements "people" (similar to transistors) and the "organizational structure" that frames these functions; the "soft" part is similar to the various "rules and regulations" based on the "core ideas". Hard and soft work together to ensure the orderly operation of the company.

3. Know motherboard: the best case of "computer system" carrier should be the motherboard of personal computer, the first acquaintance of "motherboard", PCB wiring like capillaries, dense circuit components and a large number of integrated circuit chips surrounded by pins may cause your physiological discomfort, analysis is even more impossible to start. When in-depth understanding and construction of a relatively complete "computer system" system, and then look at the "motherboard", will there be a different understanding? This process itself is also a useful practice to deepen the understanding of "computer system".

What is the computer system computer is one of the most complex systems so far, its function is to complete the preset instructions according to the determined order, and these preset instructions are the programs we are familiar with.

The so-called complex system, on the one hand, because the number of transistors of its constituent elements is very large, the CPU core the size of a fingernail contains hundreds of millions of transistors, on the other hand, it is very powerful, and a simple number of heaps can not make a powerful system. The strength of the computer system stems from the fact that it interconnects a large number of transistors and allows them to divide the work and cooperate according to strict organizational rules. As a result, the complex function which is far beyond the individual ability is realized. This is consistent with the logic of cell-organ-intelligent life. "computer system" is concerned with the construction process and working mechanism of this complex system: 1) at the hardware level, transistors-computer components-fully functional computers; 2) at the software level, 0 binary-information representation and processing-instruction system.

Second, the basic composition of the computer now let's have a brief understanding of the specific composition of the computer hardware system from the top down. The basic components of the computer hardware system include two parts: 1) the host (the core of the computer) and 2) I / O devices (keyboard, mouse, monitor, optical drive, etc.). The mainframe is composed of CPU and main memory, while CPU mainly includes arithmetic unit (performing logic and arithmetic operation) and controller (directing the operation of programs). The main memory is used to store programs and data.

Open the sales page of any notebook, and the configuration parameters table describes the main performance parameters of the computer, among which the most important parameters are: 1) CPU model; 2) memory (that is, main memory) capacity; 3) hard disk capacity (that is, secondary storage capacity).

The main function of CPU is to execute instructions, and its structure mainly includes an arithmetic unit and a controller. The functions of each component are:

Arithmetic unit: like carrying bricks, the main work is to carry out all kinds of arithmetic and logic operations, which are mainly composed of: 1) arithmetic logic unit: used to perform all kinds of arithmetic and logic operations; 2) general registers to store operands to be operated; 3) accumulators to store operands and results; 4) quotient registers to assist quotient calculation.

Controller: like a foreman, the main job is to direct the calculator to execute various instructions, which are mainly composed of: 1) control unit: analyzing instructions and giving control signals; 2) instruction registers: used to store instructions to be executed; 3) program counter: the address used to store the next instruction.

The main memory is a temporary container for storing all kinds of instructions (data) translated by the running program, which consists of three parts: 1) the storage bank, which stores large amounts of programs or data like a container; 2) MAR, address register, the address of the program or data to be temporarily stored (like a pickup code) 3) MDR, data registers, temporarily store programs or data to be fetched (just like packages to be picked up).

Third, the organizational form of the computer is complex, and if the huge system wants to run orderly, it must be protected by a set of scientific and reasonable organizational structure, just like the organizational structure of a large company. the division of labor and cooperation among the components of the computer needs to be guaranteed by the computer hardware architecture. The organization of computer hardware is mainly divided into two types: 1) Feng-Neumann structure; 2) modern computer structure.

Von-Neumann structure

Von Neumann, a computer scientist, established the basic composition and organizational structure of modern computing, namely "von-Neumann structure". The structure consists of five parts: arithmetic unit, controller, memory, input device and output device. The organizational structure is shown in the following figure. The structure is characterized by CPU-centered organization. All data flow, program operation and result output are executed and coordinated by CPU.

If you compare a computer to a company, the five major components can probably do the following analogy: the arithmetic unit is like the production department, and the memory is like the warehousing department. All the raw materials purchased and products processed need to go through the production department and then sent to the warehousing department and other relevant departments, which obviously makes the production department do a lot of work outside the scope of responsibility and reduces production efficiency. Therefore, the modern computer structure optimizes the organizational form of the five major components, that is, the "modern computer structure".

Modern computer structure

Modern computer structure takes memory as the core, and all input data / programs and output calculation results are first stored in memory, and then sent to CPU for execution or to output devices. Modern computer structure effectively lightens the burden of CPU, makes CPU more focused on instruction execution, and greatly improves efficiency.

Fourth, above the running process of the computer, we have basically completed the construction of a computer hardware system, so how does the system work and how do the components of the various departments work? to ensure that the pre-set instructions run smoothly.

The process goes something like this: when you open an App on your phone, the background is actually running code written in a high-level language, which is translated into a machine language (lines of binary code) recognized by the computer hardware through something called a compiler, and then loaded into the main memory via I / O.

When the instructions and data expressed in machine language are loaded into main memory, CPU accesses the main memory fetch instructions and starts execution, as follows:

Step1: the program counter PC points to the current instruction address # 01 and sends the address to the address register MAR of main memory

Step2: the storage bank sends the data of the corresponding location to the data register MDR according to the pick-up code of the MAR

Step3:MDR sends instructions to the controller's instruction register IR for instruction analysis.

Step4:IR sends the operation code of the instruction to the control unit CU, and sends the address code to MAR,CU to tell the operator what to do according to the operation code.

Step5: the memory bank transmits the data corresponding to the # 04 address to the ACC of the arithmetic unit via the MDR according to the pick-up code of the MAR.

In this way, you can get a general idea of how the computer's hardware works together to ensure that the program runs smoothly.

After an in-depth understanding of the introduction, this part will formally go deep into the computer subsystems from the two aspects of "software" and "hardware", and introduce their structure and working mechanism.

First, the binary system of information representation and processing is the foundation of information science, just as transistors are the foundation of computer hardware systems. Modern computers store and process information represented by binary signals. These ordinary binary numbers, or bit, form the basis of the digital revolution.

The mathematical theory we are familiar with is based on the theory of information representation and processing of decimal system, but each bit state of decimal system contains 10 kinds (0,9), so there are too many states, which is not conducive to engineering implementation. However, one bit state of binary system contains only two kinds (0jing1). Binary values work better on machines that store and process information. Binary signals can be easily represented, stored, and transmitted, for example, as holes or no holes in a punched card, high or low voltage on a wire, or clockwise or counterclockwise caused by a magnetic field.

This chapter is divided into three parts: 1) information storage, mainly introducing some basic concepts about the basis of binary theory; 2) representation and calculation of integers, this paper introduces the theoretical basis of using unsigned numbers and binary complements to represent and operate integers; 3) the representation and operation of floating-point numbers, introduces the method of using the binary version of scientific notation to represent real numbers and its related operational properties.

1. Information storage

1.1 carry counting system

The real world we live in is based on the decimal system, while the computer hardware can only understand the 0ap1 binary language. The close relationship between the two worlds is inseparable from the conversion between various binary systems. The most common is the conversion between decimal, binary, octal, and hexadecimal. Among them, the conversion between decimal and octal to hexadecimal and binary is as follows, and the mutual conversion among decimal, octal and hexadecimal can be realized by binary transfer.

1.2 words

Most computers use 8-bit blocks, or bytes, as the smallest addressable storage unit, rather than accessing every bit (bit) in memory. The bank of memory can be regarded as a very large array of bytes, and each word called virtual memory has a "house number", or address. The set of all addresses is a virtual memory address space, and the size of the space is the word length of the computer, such as a 32-bit computer. The space of the virtual address is limited to 4GB 8GB.

1.3 data size

There are several commonly used data types, character type, integer type and floating point type, in which character type is generally used to store a single character in a string, integer type is used to store integers of various lengths, and floating point type is used to store floating point numbers of different precision.

1.4 addressing and byte order

If a data spans multiple storage bytes, then the storage order of the data must be specified, and almost all machines and multi-byte objects are stored in a continuous sequence of bytes. The sequence of bytes representing an object is sorted by two general rules: 1) large-end mode and 2) small-end mode.

As shown in the figure, hexadecimal data 01234567H is stored in the memory pointed to by address # 1, where "01" represents the high significant bit (8bit) of the data, and "67" represents the low significant bit of the data, then the high significant bit is placed in front of the large-end mode, and the high-significant bit is placed after the high-significant bit in the small-end mode.

1.5 string

Strings are made up of characters, and in a computer, the connection between each character and "0 8bit 1" is achieved by ASCII encoding (8bit), where the ASCII value is represented in hexadecimal.

1.6 Common operations

1. Bit-level operation

The so-called bit operation means that each bit of the data expressed in binary can perform the corresponding Boolean operation as an individual, the main operations are and &, OR, non ~ and XOR ^, and the data types that can participate in bit operation are arbitrary "integers" (such as char, int, short int, long int and unsigned int).

2. Logical operation

Logical operations (or |, and & &, no! The nature of the operation is significantly different from that of the bit-level operation, and the function is also completely different. The logical operation assumes that all non-zero data are TRUE, while data zero is FALSE, and the result of the operation is 1 or 0, representing TRUE or FALSE.

3. Shift operation

Shift operation to move the bit mode to the left or right. The left shift is expressed as xk, but the specific operation of the right shift is divided into two cases: 1) logical right shift to high position complement 0 position 2) arithmetic right shift to high position complement most significant bit, as shown in the following figure. For unsigned data, the right shift must be logical, and for signed data, almost all machines default to the arithmetic shift.

two。 Representation and operation of integers

This section describes two representations of integers, one that can only represent non-negative numbers, and the other that can represent negative, zero, and positive numbers. Its mathematical properties are strongly related to subsequent machine-level implementations.

2.1 Integer data type

2.2 unsigned numbers and binary complements

The so-called unsigned number is a number without a "+ / -" sign, which can only represent a non-negative number. The mapping relationship between the binary coded representation and the true value is:

That is, the true value of the unsigned binary coding of w bit can be calculated by the above formula, which establishes the one-to-one mapping relationship between binary coding (similar to w-dimensional vector) and true value (similar to the module of w-dimensional vector). In fact, it is the power multiplication summation used in the previous binary to decimal system.

For symbolic numbers (that is, including the "+ / -" sign), you need to encode the symbol bits to distinguish between positive and negative numbers. There are two options for specific implementation:

The original code (S) indicates that the most significant bit is taken as a symbol bit, and its true value can be expressed as B2S. As can be seen from the following figure, the most significant bit (symbol bit) determines the positive or negative of the true value, and the other bits only determine the absolute value.

The complement (T) indicates that the most significant bit is defined as a negative weight, and the calculation of its true value can be expressed as B2T. As can be seen from the following figure, the result of its true value is expressed as a negative number + a positive number, in which the negative number depends on the most significant bit (negative weight bit), while the size of the positive number depends on other bits.

It should be noted that there are some inherent defects in the original code when indicating the number of symbols, as shown in the following figure: the binary representation under the definition of + 5 and-5 is-10, which is obviously not consistent with the reality, while using the complement definition, the calculated result is 0, which is consistent with the reality. Therefore, in most cases, the symbolic representation uses the complement scheme.

The binary representation of different types of data and the method of calculating its true value are shown in the following figure, in which the inverse code with symbol number is defined as the transition form of the conversion from source code to complement code, which has little effect in practice.

2.3 Transformation between signed and unsigned numbers

The so-called transformation of signed and unsigned numbers does not change the bit representation of binary numbers, only because the interpretation of each "bit" of binary numbers is different because of the different definitions of signed and unsigned numbers. this leads to a change in the true value represented by the binary number.

The conversion from binary complement to unsigned number is shown in the following figure through formula and illustration.

If you convert from an unsigned number to a binary complement, the formula and diagram are shown in the figure, respectively.

2.4 expansion and truncation of numbers

Common operations such as conversion between integers of different word length, conversion of word-length binary numbers to word-length binary numbers need to be extended, and word-length numbers to word-length numbers need to be truncated.

The expansion of the binary complement and the unsigned number is also different: 1) the expansion of the unsigned number is the high position complement 0 position. 2) the extension of the binary complement is the highest significant bit of the high bit complement. The formulation of this extension rule is to ensure that the true value represented by the binary system before and after the expansion does not change.

Truncation will change the truth value represented by binary. For the unsigned number x, the result of truncating it to k bits is equivalent to calculating mod (that is, truth pairs are modularized). In short, the truncation results of unsigned numbers and binary complements can be expressed in the following forms:

2.5 Integer Operation

The operation of integers is mainly carried out around unsigned numbers and binary complements, and the commonly used operations are: 1) addition operation; 2) non-operation; 3) multiplication operation; 4) power operation multiplied by 2; 5) power operation divided by 2.

For addition operations, we are usually concerned about whether the calculation results are overflowed or not. for unsigned numbers and binary complements, the addition results are as follows:

For multiplication, we can see that whether it is unsigned number or binary complement operation, multiplication can be realized by "bit" truncation, and it can be realized without adding special multiplier, which shows great convenience. The multiplication results of unsigned numbers and binary complements are as follows:

For the power operation of multiplying the power sum of 2 and dividing it by 2, the calculation process can be realized by shift operation, which greatly improves the convenience of operation. Among them: 1) multiplication, whether for unsigned numbers or binary complements, can be equivalent by moving k bits to the left; 2) divided by, for unsigned numbers or binary complements, it is equivalent to the arithmetic right shift k (in which the unsigned number high complement 0, the binary complement high complement highest bit).

3. Representation and Operation of floating Point numbers

3.1 floating point representation

The floating point number is encoded as a rational number in the form -. Where s is the symbol bit (1 bit, there are only two states: 0 to 1, indicating + / -), M is the significant number (n bits,), and E is the exponent (k bits,--). For the binary representation, only these three numbers need to be encoded. For the single-precision floating-point format (float,32 bit), Kendall 8 gets nasty 23, and for the double-precision format (double,64 bit), Kendall 11 gets nasty 52. The binary coding representation of the floating point number is known, and the true value of the floating point number is solved. The calculated results are as follows.

It is important to note that there are two methods to calculate the true value of floating-point encoding for whether the coding of the index is all-zero-stroke-1.

When the exponential coding is neither full zero nor full 1, the floating point number is normalized, where the exponential field -, e is unsigned, and the bits are represented as--, -; decimal field, where--

When the exponential code is all 0, the floating point number is an unnormalized value, and at this point-

When the exponential code is all 1, the floating point number is a special value, and when the decimal field is all 0, it means infinity, swarm 0, is -; when the decimal field is non-0, it is NaN.

3.2 floating point operation

The addition operation of floating point numbers, which is different from the addition operation of integers, lacks a lot of attributes (such as not satisfying the law of association and distribution), so I won't repeat it here.

4. Summary

"decimal" is the basis of all modern theories related to "numbers", and it is the most familiar "way" for us to represent the world, while as the basis of the information world, "binary" provides another "way". Therefore, the establishment of the relationship between the two "ways" is essential (binary transformation), and it needs to be based on "binary". To represent all kinds of numbers (unsigned numbers, signed numbers, fixed-point numbers, floating-point numbers, etc.) and clarify the properties of all kinds of mathematical operations, which are all introduced in the first chapter.

Second, the storage system is like a "warehouse" in a computer system, which is used to store programs, instructions, data and other information. It will be divided into three parts: 1) what is a "storage system"? this paper introduces the abstract model of memory and the hierarchical structure based on access speed gradient. 2) Why it is called "system", introduce various storage technologies and the pyramid structure of storage system based on "locality principle", 3) how to run "storage system", focus on the composition of storage system structure and the most important "main memory" and "high-speed cache", and explain how they work with CPU to successfully complete data access.

1. What is "storage system"?

1.1 understanding at the abstract level

As shown in the figure above, so far, in our study of computer systems, we have relied on a simple computer system model in which CPU executes instructions, while memory stores instructions and data for CPU. In this simple model, the memory system is a linear array of bytes, while CPU can access each memory location in a constant time. More specifically, as shown in the following figure, it contains a memory bank (for storing data), followed by two interfaces (address interface, data interface), although this has been a valid concrete model so far, but it does not reflect the way modern systems actually work.

1.2 hierarchical structure

In fact, a memory system (memory system) is a hierarchical structure of storage (storage) devices with different capacity, cost, and access time. The CPU register holds the most commonly used data. The buffer area of a small, fast cache memory (cache) near the CPU. Main memory temporarily stores data stored on larger slow disks, which are often used as buffers for data stored on disks or tapes of other machines connected through the network.

two。 Why do you call it "system"?

2.1 Diversity of storage technology

With the development of information technology, storage technology is changing with each passing day, and how to store binary information is also very rich.

According to the type of storage medium, memory can be divided into: 1) semiconductor memory (mainly used for main memory and Cache); 2) magnetic surface memory (mainly magnetic disk and magnetic tape); 3) optical memory (mainly optical disc, etc.).

Among them, semiconductor memory is developing rapidly, and there are many categories:

SRAM is a static random access memory: its storage unit is a bistable circuit, and each cell is implemented with six transistor circuits. As long as there is power, it can maintain two voltage configurations or stable states indefinitely, and the circuit complexity is relatively high, so the cost is relatively high. It is generally used for caching and registers in CPU.

DRAM is a dynamic random access memory: the bit capacitance of each bit of the memory element is characterized by the charge and discharge of the capacitor. The memory cell of DRAM is very sensitive to interference and the cost is much cheaper than SRAM. It is generally used in main memory (memory bar).

ROM (read only memory) read-only memory, can only read, can not write. Unlike RAM (the information stored in DRAM and SRAM will be lost after a power outage), ROM is a non-volatile memory, and its stored information will not be lost even if the power is turned off. Programs stored in ROM devices are often called firmware (firmware), and when the computer is powered on, it runs the firmware stored in ROM. Some systems provide a small number of basic input and output functions (such as BIOS routines) in firmware, and complex devices, such as graphics cards and disk drives, also rely on firmware to translate I / O requests from CPU.

Disk: the structure consists of a disk (storing data), a spindle (driving the disc to rotate), and a read-write head (reading the information on the disk). The core component of a disk, which is divided into many concentric circles (tracks). Many sectors are alternately distributed on the track (sectors are divided by gaps), and the disk stores data in units of sector locations.

2.2 principles for building memory

Above, we know that there are many ways to store binary 0ram 1 information. When building a computer system, how do we choose these memories?

As described above, although all of the above devices can store information, their performance and cost are very different.

For this reason, in order to take into account the cost, capacity (number of words * word length) and speed (data width / storage cycle) of the computer system, the computer storage device does not simply choose which technology system, but

According to the distance between the storage device and the computer brain (CPU), the storage system is arranged according to the transmission speed from fast to slow. Thus the storage system layout design of "pyramid structure" is formed.

As shown in the figure, take the storage system of my laptop as an example, its basic composition is: 1) the CPU is equipped with three levels of cache (L1 / L2 / L3), the capacity is 256KB, 1.0MB and 6.0MB, respectively, increasing in turn; 2) the computer main storage is 7.9GB DRAM;3) and the secondary storage disk is 239GB SSD (solid state disk). As you can see, the farther away the storage device is from the CPU, the greater the storage capacity.

2.3 locality principle (feasibility basis)

As mentioned above, in order to balance capacity, cost and transmission speed, we have designed a "pyramid" storage system hierarchy. Can this structure realize the smooth "up and down" of data among the storage devices of the storage system and ensure the smooth operation of the computer system?

The locality principle provides a theoretical basis for the rationality of this structure. The so-called locality principle means that a well-written computer program tends to reference data items close to other recently quoted data items, or close to self-referenced data items. This principle has a great influence on the design of software system and hardware system. Locality usually takes two forms: 1) time locality, that is, a memory location that has been referenced once may be referenced many times in the near future; and 2) spatial locality, that is, a memory location is referenced once. then the program may refer to a nearby memory location in the near future.

The locality principle has practical requirements in the project, for example, at the hardware level, the computer copies the adjacent data sets in the main memory to the cache in advance by introducing the cache, thus making use of the speed advantage of the cache to improve the data transmission speed.

3. How to run "Storage system"

3.1 system structure

Above, we know that the devices in the storage system gradually form a "pyramid structure" layout according to their "closeness and closeness" to CPU. In this section, we will take an in-depth look at how these storage devices are connected to each other and with CPU.

As shown in the figure above, the running process of the data in the storage system is roughly as follows: 1) all kinds of application App are installed on the computer's C / D disk (ROM). Because of the non-volatile nature of ROM, even if the power is off, the installed App will not be deleted; 2) when you turn on the computer and start Wechat App, the relevant running programs will be copied to the main memory and start running. 3) the cache copies a subset of data / programs from the main memory in case CPU is called in time; 4) CPU fetches instructions and data from the cache for processing, and outputs the expected results to the user.

3.2 main memory

In the above description, we still analyze the main memory as an abstract black box, and now we will go deep into the main memory to explore the composition and operating mechanism of the main memory.

The basic composition of main memory

As can be seen from the above, the basic composition of the main memory consists of three parts: 1) the storage bank for storing data, which is similar to the container, the data is like a package; 2) the address register, which is used to store the address of the data to be fetched by CPU, just like the pick-up code when picking up the package; and 3) the data register, which is used to store the data to be fetched by CPU temporarily, just like the package to be picked up. The orderly development of the three depends on the coordination of timing control logic.

Continue to delve into the structural details of the memory bank, which is composed of dense integrated circuits, the basic composition of which is as follows: the memory bank is divided into many small squares (forming a storage matrix) by green vertical lines and red horizontal lines. Each small square is a storage "bit", which is composed of a MOS tube and a capacitor, in which the capacitor is equivalent to a "reservoir", which can store water or release water. No water is "0". The MOS tube is the equivalent of a water pipe valve, which controls whether water is stored or waterproof, corresponding to whether to "write" or "read" the data in the capacitor. Therefore, the "green line" connecting the water outlet of the water pipe is to write or read the data in the storage unit, and the eight storage "bits" constitute a storage "word". The "red line" of the connection switch valve G determines whether the corresponding unit is allowed to be written or read data, and the red line is connected to the address register uniformly, which location of the data can be controlled by the data of the address register.

In fact, it is quite wasteful to connect the "red line" directly to the address register (MAR). After all, only one of the bit red control lines is on (corresponding to "1"), while the other lines are closed (corresponding to "0"), so n address lines only determine n states, which is a waste of resources. by bridging a decoder, you can control each state through an n-bit address line, thus making full use of resources. The read and write status of the current memory is determined by the read-write control line.

Ignoring the circuit details, main memory is a package chip that contains a set of address line pins and data line pins, plus read and write control line pins, and chip selection pins (because the main memory chip is made up of multiple chips in parallel. Chip line selection controls which memory chip is used).

Total memory capacity = number of memory cells * memory word length (the number of bits contained in the memory cell), such as 8KB, where 1B (word) = 8bit (bit), so 8KB=8K*1B=8*8bit, that is, the address line has 13 bits and the data line 8 bits.

The connection between main memory and CPU

Above, we can be regarded as a clear analysis of the main memory, so how does the main memory connect with CPU to achieve CPU to freely read the data stored in it? In fact, it is very simple, the data line is connected to the data line, and the address line is connected to the address line.

If the capacity of main memory can not meet the requirements of CPU, it can be solved by memory expansion, which can be extended in two ways:

If the number of digits of the main memory is not enough (equivalent to the size of the express cabinet is too small to hold a large package), it can be achieved by bit expansion (express cabinet expansion).

The number of words in main memory is not enough (the number of storage units is not enough, which means that the number of express cabinets is too small to put too many packages), it can be achieved by word expansion.

3.3 caching

The background of cache is that with the rapid development of computer technology, the data transmission speed of main memory can not match the operation speed of CPU, which seriously affects the running speed of computer.

The solution is to add a faster cache between the CPU and the main memory to copy the programs to be executed in the main memory in case the CPU operation is called. This strategy is like the difference between JD.com Express and other express delivery. JD.com Express builds its own warehouse in each city to store goods in advance. After the buyer places an order, the goods are usually shipped directly from the warehouse, not from the place of origin. So as to achieve the "same day" such efficiency.

Mapping relationship between cache and main memory

The main memory of the computer is equivalent to a large reservoir, and the cache is equivalent to a small reservoir. The main memory copies the data to the cache beforehand. If the main memory is regarded as a data set, the cache is a subset of the collection. Therefore, it is necessary to make clear the mapping relationship between the subset and the original set, so that CPU will not be confused when getting data from the cache.

Since cache Cache is a subset of main memory, we must first clarify the mapping relationship between the subset and the original set. The mapping relationship between cache and main memory can be divided into three main types:

Fully associative mapping: 1) in mapping mode, the data transfer between main memory and cache is in blocks, and each block contains data of multiple words. In fully associative mapping, as shown in the following figure: line # 0 blocks in Cache can receive # 0 blocks of main memory, that is, blocks in main memory can be stored anywhere in Cache without restriction. 2) memory access mode, as shown in the following figure, assuming that the memory capacity is 1604B, the address of main memory includes 4-bit block number and 2-bit intra-block address. CPU asks Cache for data with the main memory address (such as 001110). Cache then checks the marks and significant bits against the main memory address, and finds that the # 2 line mark is the same as the main memory block number, and the mark bit is 1 (indicating that there is data in the block). It is a hit to inform CPU that "I have something you want".

Direct mapping: 1) in the mapping mode, blocks in main memory can only be queued into a certain line in Cache in a certain order, so Cache block number = main memory block number% (remainder) total number of Cache blocks 2) in the memory access mode, CPU asks Cache for data with the address code 001110, and Cache uses the main memory block number to check the number of lines 8 and the remainder is # 3, so he goes to the # 3 line to check the mark and the significant bit, and finds that although the significant bit is 1, the mark is 1011, which is not consistent with the main memory block number, so inform CPU that "I don't have what you want", that is, it misses.

Group associative mapping, 1) mapping mode, between 1) and 2) mapping mode, blocks in Cache can be grouped in advance, blocks in main memory must be put into a certain group in Cache in a certain order, but can be placed at will within the group 2) in the memory access mode, CPU asks Cache for data with the address code 001110, and Cache uses the main memory block number to take the remainder of group 4 to # 3 groups (that is, the last two digits of the block number), so he goes to the # 3 group (# 6 line, # 7 line) to check the mark and the significant bit, and finds that line # 7 is marked as 0011, and the significant bit is 1, so inform CPU that "I have something you want", that is, it is a hit.

Summary and comparison of the advantages and disadvantages of the three mapping methods are as follows:

Cache replacement strategy

Earlier, the mapping method between cache and main memory and the memory access mode of CPU under different mapping methods are introduced. The memory access results are divided into two cases: hit and miss according to whether CPU gets the "desired data". Everyone will be happy if they are hit. Here is an introduction to what happens if you do not hit. If you don't hit, there are two situations:

If you have a free location to store the data, you can Copy the data corresponding to the memory block address directly at this time.

There is no empty place to store data (that is, the position to be placed is occupied by the original data block). At this time, it is necessary to replace the two blocks to ensure the normal operation of the computer. The replacement premises of the three different mapping methods are different, and the differences are as follows: 1) they are not picky about food, unless the Cache is full, all available space can be used; 2) directly map the most specific, must correspond to the row is not empty, otherwise replace 2) the association of groups is more moderate, between the two, and the corresponding group will be replaced only when the corresponding group is full.

In actual replacement, there are four main strategies: 1) random algorithm (RAND), which randomly selects one of the blocks that meet the requirements, and the effect is poor; 2) first-in, first-out algorithm (FIFO), giving priority to replacing the latest main memory block called Cache; 3) recent least use (LRU), replacing the longest unused main memory block, based on the "locality principle", the hit rate is high. 4) not frequently used recently (LFU), which will be replaced by the main memory block with the least access times.

Write strategy for caching

The above describes some strategies for CPU to read data from the cache, but at the same time, the operation results of CPU should also be written back to the cache and main memory for later use. At this time, there are different strategies. For the two cases of write hit and write miss, the processing strategy is also different.

When writing hits: 1) full write method, that is, after a write hit, the result needs to be written to both the cache and main memory; 2) write back method, that is, after a write hit, the result is only written to the cache, and when the data in the corresponding location in the cache is to be replaced, the result is written back to main memory.

Write miss: 1) write allocation method, when CPU misses the cache write, the blocks in the main memory are called into the cache and modified in the cache, usually used with the write-back method; 2) non-write allocation method, when the CPU does not hit the cache write, only writes to the main memory, not into the cache, usually with full write method.

4. Summary

This chapter gives a more in-depth introduction of the storage system, and successively introduces a variety of storage technologies, and different storage methods work in different parts of the computer because of the difference in speed and cost, thus forming a "pyramid" hierarchical structure, as well as a detailed introduction of the structure, operation mechanism and coordination mechanism between the main memory and cache Cache in the hierarchical structure.

Third, the instruction system is shown in the following figure. We know that the hierarchical structure of the computer system is as follows: transistors are the basic elements of the computer system, and they realize their value through high / low level switching. A large number of transistors form very large-scale integrated circuits by constantly nesting dolls. These integrated circuits can play different roles due to different functions (memory, CPU, input / output, etc.). These integrated circuits with different functions are combined to form a computer hardware system, but this hardware system only knows the machine language composed of 0 and 1. It is very different from the programs written by programmers (high-level languages), and they need to be translated by the compiler in order to communicate with each other. These data strings composed of 0 and 1 are instructions, which are the smallest functional units run by the computer, and the set of instructions that can perform various functions is the instruction system.

The instruction sequence compiled by the compiler is put into the memory bank of the main memory. Under the control of the program counter PC, CPU fetches the instructions from the main memory one by one, analyzes the instructions by the controller of CPU, and directs the CPU arithmetic unit to complete the corresponding operation and processing according to the instruction requirements. The specific running process can be referred to above.

The chapter of instruction system will be carried out from three aspects: 1) instruction format, which introduces the basic composition of an instruction and its classification according to different standards; 2) instruction / data addressing, which needs to be extracted from the total main memory before the instruction is run, which requires addressing, which introduces the various ways of instruction addressing and data (operands of instructions) addressing respectively. 3) CISC and RISC, introduce two mainstream instruction systems (complex instruction set and reduced instruction set), and simply clarify their essential differences, advantages and disadvantages and typical applications.

1. Instruction format

The format of the instruction is shown in the figure below, which consists of an opcode and an address code, where the opcode specifies the type of operation on the operating object (summation, shift, etc.), while the address indicates the location of the operating object.

Due to the different types of operation tasks, the number of instruction addresses varies, which is mainly divided into: 1) zero address instruction; 2) one address instruction; 2) two address instruction; 3) three address instruction; 4) four address instruction.

Zero address instruction

One case is that there is no need for operands, such as null operation, shutdown, off interrupt and other instructions; the other is the stack computer, where two operands are implicitly stored at the top of the stack and the top of the secondary stack, and the result is on the top of the sub-stack.

One address instruction

One case requires only a single Operand, such as plus 1, minus 1, inverse, complement, etc.; the other requires two operands, but one Operand is implied in a register (such as ACC).

Two-address instruction

It is often used for arithmetic operations and logic operations related instructions that require two operands.

Three-address instruction

It is often used for arithmetic operations and logic operations related instructions that require two operands, and the calculation results are written into A3.

Four-address instruction

It is often used for arithmetic operations and logic operations related instructions that require two operands, and write the calculation results to A3, while telling the address of the next instruction execution.

The classification of instructions can also be divided according to length and type: 1) fixed length instruction word structure, that is, the length of all instructions in the instruction system is the same; 2) variable length instruction word structure, that is, the length of various instructions in the instruction system is different.

Classified by operation type: 1) data transfer class, which transfers data between main memory and CPU (e.g. LOAD: putting data in memory into register; STORE: putting data in register into memory) 2) arithmetic / logic operations, in which arithmetic operations include addition, subtraction, multiplication, division, increment 1, subtraction 1, complement, floating-point operation, decimal operation, etc. Logic operations include and, OR, no, XOR, bit operation, bit test, bit clearing, bit inversion, etc.; 3) shift operations, including arithmetic shift, logic shift, cyclic shift, etc. 4) transfer operation, including unconditional transfer JMP, conditional transfer (JZ: result 0: result overflow; JC: result carry), call and return (CALL and RETURN), trap (Trap) and trap instruction; 5) input / output operation, data transfer between CPU register and IO port (port is the register in IO interface).

two。 Instruction / data addressing

The purpose of instruction addressing is to determine the storage location of the next instruction. There are mainly two types: 1) sequential addressing, which continuously executes instructions in memory by adding 1 to the program counter PC; 2) jump addressing, indicated by transfer instruction (JMP). The operation of the two instruction addressing is shown in the following figure.

Compared with instruction addressing, the variety of data addressing is much richer. The main task of data addressing is to determine the real address indicated by the address code of this instruction.

Taking an address instruction as an example, the composition of the address code consists of two parts (addressing feature + formal address). The effective address of the Operand EA (real address) can only be obtained by processing the formal address in accordance with the operation specified by the addressing feature. Addressing characteristics define the ways in which data is addressed, and there are a wide variety of them, as shown in the following figure.

Immediate addressing

No addressing is required, and the formal address is the Operand (usually in the form of a complement).

The advantage is that there is no need to access memory, the speed is fast, and the disadvantage is that the number of bits of the formal address limits the range of operands.

Direct addressing

The formal address An in the instruction is the real address of the Operand, namely EA=A.

The advantage is that the instruction structure is simple, and the instruction execution only accesses memory once. The disadvantage is that the number of bits of A determines the range of addressing, and the Operand address is not easy to modify.

Indirect addressing

The formal address given in the address field of the instruction is not the real address of the Operand, but the address of the memory unit where the valid address of the Operand is located, namely EA= (A).

The advantage is that the addressing range can be expanded (the number of bits of the valid address EA is greater than that of the formal address A), and the disadvantage is that multiple addresses are required during instruction execution.

Implicit addressing

The address of the Operand is not given obviously, but the address of the Operand is implied in the instruction.

The advantage is that it is advantageous to shorten the instruction word length, while the disadvantage is that it needs to increase the hardware that stores operands or implicit addresses.

Directly give the register number of the Operand in the instruction word, that is, EA=Ri, whose Operand is in the register referred to by Ri.

The advantage is that the instruction does not access the main memory but only the register in the execution phase, the instruction word is short and the execution speed is fast, and the vector / matrix operation is supported. The disadvantage is that the register is expensive and the number of registers in the calculator is limited.

What is given in the register Ri is not an Operand, but the address of the main memory unit where the Operand is located, EA= (Ri).

The characteristic is that it is faster than normal indirect addressing, but the execution phase of the instruction requires access to main memory (because the operands are in main memory).

Base address addressing

Add the contents of the base address register (BA) in the CPU to the formal address An in the instruction format to form the valid address of the Operand, that is, EA= (BR) + A.

The advantage is that it is convenient for the program to "float" and to realize the concurrent operation of multi-channel programs.

Indexed addressing

The effective address EA is equal to the sum of the formal address An in the instruction word and the contents of the indexing register IX, that is, EA= (IX) + A, in which IX can be described as a programming register (special) or a general register as an indexing register. The difference from base addressing is that IX can be modified by the user.

Advantages: in the array processing process, you can set An as the first address of the array, constantly change the contents of the register IX, you can easily form the address of any data in the array, especially suitable for the compilation of loop programs.

Relative addressing

Add the content of the program counter PC to the formal address An in the instruction format to form the valid address of the Operand, that is, EA= (PC) + A, where An is the displacement relative to the address referred to by PC, provably negative and represented by the complement.

Advantages: the address of the Operand is not fixed, it changes with the change of the PC value, and there is always a fixed value difference between the instruction addresses, which makes it easy for the program to float.

Stack addressing

Operands are stored in the stack, implicitly using the stack pointer (SP) as the Operand address. The stack is a specific storage area in memory that is managed according to the "last-in, first-out (LIFO)" principle. The address of the read / write unit in this storage area is given using a specific register called stack pointer (SP) (similar to the program counter PC).

Based on stack addressing, the following figure shows an addition process:

Step1: the stack pointer SP points to R0, and the corresponding data 0001 goes out of the stack into ACC,SP and points to R1

Step2:SP corresponding data 1001 out of the stack to enter the register XMagi SP points to R2

Step3:ALU calculates the sum of ACC and X (1010) and sends it to register Y

Step4: the result of the calculation is stacked, SP points to R1, and Y (1010) is sent to the stack register R1.

According to the location of stack data, it can be divided into hard stack and soft stack, the hard stack uses registers to store operands, the cost is high, but the speed is fast; the soft stack uses main memory to store operands, the cost is low, but the speed is relatively slow.

The calculation method of valid addresses with different addressing methods and the number of memory calls are summarized in the following table:

3.CISC and RISC

There are two main directions in the design of instruction set, one is CISC (Complex Instruction Set Computer) represented by X86 architecture, that is, complex instruction set, and the other is RISC (Reduced Instruction Set Computer) represented by ARM architecture, namely reduced instruction set. The design ideas of both go to the left.

CISC: the design idea is to complete a complex basic function for an instruction, an instruction can be completed by a special circuit, and the more complex instructions are completed by a general circuit with storage components through the design idea of "storage program" (microprogram). Typical application is X86 architecture, which is mainly used in notebooks, desktops, etc.

RISC: the design idea is from one instruction to the completion of a basic "action". Multiple instructions are combined to complete complex basic functions. The typical application is ARM architecture, which is mainly used in mobile phones, tablets and so on.

If "instruction" is the language of the computer hardware system, then CISC and RISC are the two language system specifications, in which CISC uses "words" as elements to build a language system, each word can express a meaning, and to express complex meaning, words can be combined. The advantage lies in simplicity, while the disadvantage is that there may be many words. RISC is based on "letters" to build a language system, each letter can not express the exact meaning, need to combine a lot of letters in order to express rich and diverse meaning, its advantage is that the number of "elements" is small, only 26, the disadvantage is that to express any meaning, you need to combine a lot of letters.

The comparison of the two instruction sets is shown in the following figure:

4. Summary

This chapter introduces the relevant contents of the instruction system in detail, and then introduces what the instruction system and its function in the computer system, the composition of an instruction (instruction format), how the instruction and data are addressed (addressing mode) and two typical instruction systems (CISC and RISC).

CPU as Randall-E-Bryant said in "understanding computer systems", modern processors can be said to be one of the most complex systems created by human beings. A silicon chip the size of a fingernail can hold a complete high-performance processor and a large cache, as well as logic circuits used to connect external devices, which is the core of the computer.

As mentioned earlier, CPU is mainly composed of operators and controllers, but such an abstract model is obviously not the end point of our understanding. In this chapter, we will go deep into the interior of CPU. Understand the function and structure of CPU, how CPU completes the extraction and execution of instructions, how the extracted data flows within CPU, how the controller (CU) plays its scheduling role through control signals, and how to improve the efficiency of CPU through the concept of pipeline.

The function and structure of 1.CPU

The main functions of CPU are:

Instruction control: complete the operation of fetching, analyzing and executing instructions, that is, the sequential control of the program.

Operation instruction: the function of an instruction is often realized by the combination of several operation signals. CPU manages and generates the operation signals of each instruction taken out of memory, and sends various operation signals to the corresponding components, thus controlling these components to act in accordance with the instructions.

Time control: time control over various operations. Time control should provide proper control signals for each instruction in chronological order.

Data processing: arithmetic and logical operations on data

Interrupt handling: to deal with abnormal situations and special requests during the operation of the computer.

The basic structure of CPU includes an arithmetic unit and a controller, in which the function of the arithmetic unit is to process the data; the function of the controller is to coordinate and control the instruction sequence of each part of the computer to execute the program, and the basic functions include: 1) fetching instructions, automatically forming instruction addresses, and issuing commands for fetching instructions 2) analyze the instruction, analyze the obtained instruction (opcode + Operand address), address the opcode (analyze what to do), generate the valid address of the Operand EA;3) execute the instruction, according to the "operation command" and "Operand address" obtained by the analysis instruction, form the operation signal control sequence, and coordinate the ALU, memory and I / O devices to complete the corresponding operations. 4) interrupt handling, managing bus and I / O, handling abnormal situations.

1.1 basic structure of arithmetic unit

The arithmetic unit mainly consists of ALU with arithmetic / logic operation function and general register for temporarily storing various input / output results, which are connected by CPU internal bus, which simplifies the connection circuit between devices.

1.2 basic structure of the controller

The core of the controller (CU) is the instruction decoder ID and the micromanipulation signal generator. In the future, they lie in the main memory, analyze the instructions placed in the instruction register (IR), and generate micromanipulation signals to direct the devices in the CPU to carry out various tasks in a reasonable and orderly manner, in which the instruction and Operand data are carried out from the main memory / cache, via the address bus and data bus, as well as the address register MAR and the data register MDR. Arrive at CPU, where the address bus and data bus are like higher national highways.

1.3 combination

The computing system and the control system are assembled together to form a fully functional CPU, in which the data transfer between the two parts is carried out through the CPU internal bus.

A seemingly complex structural composition diagram can actually be divided into four parts according to its function.

two。 Instruction execution process

2.1 instruction cycle

The time unit in CPU mainly includes these three: 1) clock cycle, also known as oscillation cycle, generated by the oscillating circuit in CPU, which is often defined as the reciprocal of clock pulse frequency, which is the smallest time unit in the time series; 2) machine cycle, also known as CPU cycle. In a computer, in order to facilitate management, the execution process of an instruction is often divided into several stages (such as fetch, decoding, execution, etc.), and each stage completes a basic operation. The time it takes to complete a basic operation is called the machine cycle. In general, a machine cycle consists of several clock cycles; 3) instruction cycle, each time CPU fetches an instruction and executes it, it has to complete a series of operations, and the time required for this series of operations is usually called an instruction cycle. In other words, the instruction cycle is the time it takes to fetch an instruction and execute it. Because the operation functions of each instruction are different, the instruction cycle of each instruction is different. For example, the instruction cycle of an addition instruction is different from that of a multiplication instruction.

The execution process of each instruction is carried out according to the flow block diagram shown in the following figure, and the computer determines which step the instruction goes through the status of four triggers (fetch EF, interval IND, execute EX, interrupt INT).

2.2 instruction data flow

How data flows in CPU during fetch, address, execution, and interruption.

Finger fetch period

The main task of the value cycle is to fetch instructions from memory, and the main process is as follows:

Step1: the current instruction address is sent to the memory address register MAR, that is, (PC)-> MAR

Step2:CU sends out the control signal, which is transmitted to the main memory through the control bus. Here is the read signal, that is, 1-> R

Step3: send the contents of the main memory referred to by MAR to MDR through the data bus, that is, M (MAR)-> MDR

Step4: put the contents of MDR into the instruction register IR, that is, (MDR)-> IR

Step5:CU sends out a control signal to form the next instruction address, namely (PC) + 1-> PC.

Interval cycle

The main task of the inter-address cycle is to obtain the valid address of the Operand (data). The acquisition of EA,EA is performed on the formal address according to the addressing characteristics. Take the primary address as an example, the data flow is:

Step1: send the address code of the instruction to MAR, that is, Ad (IR)-> MAR

Step2:CU sends out a control signal to start the memory read operation (R), that is, 1-> R

Step3: send the contents (EA) of the main memory referred to by MAR to MDR through the data bus, that is, M (MAR)-> MDR

Step4: send a valid address (EA) to the address code field of the instruction, that is, (MDR)-> Ad (IR).

Execution cycle

The execution cycle carries out related operations through the arithmetic unit according to the opcodes and operands of the instruction word in IR, resulting in the execution result, and there is no uniform data flow.

Interruption cycle

The main task of the interruption cycle is to pause the current task to complete other tasks. You need to save the breakpoint before pausing. Generally, use the stack (address at the top of the SP storage stack) to save the breakpoint. The specific process is as follows:

Step1: the controller subtracts SP from 1 (stack preparation), and sends the modified address to MAR, that is, (SP)-1-> SP, (SP)-> MAR

Step2:CU sends out a control signal to start the memory write operation (W), that is, 1-> W

Step3: write the breakpoint (the content of PC) to the location where the SP points to the address via MDR

The step4:CU control sends the entry address of the interrupt service program to the PC and starts to execute the interrupt program.

2.3 instruction execution scheme

Option 1: single instruction cycle, all instructions choose the same execution time to complete, take the maximum value

Plan 2: multiple instruction cycles, different instructions, choose different execution time

Option 3: pipeline scheme, which allows as many instructions as possible to be executed in parallel

3. The function and basic structure of data path

The realization of CPU function is realized by running instructions, and the essence of instruction running process is the transmission of data between components (computing units, registers, etc.). Data path refers to the path of data transmission between functional components. As shown in the following figure, the data path during the operation of CPU is introduced in bus mode (take the instruction cycle as an example).

A fetch instruction cycle can be divided into several microoperations, and the essence of each microoperation is the flow of data. behind the scenes of data flow is a series of control signals controlled by turn-on / off (that is, the realization of micromanipulation is realized by control signal trigger).

The basic structure of data path is divided into two categories: 1) CPU internal bus mode (introduced above), which has the advantages of simple line layout and disadvantages such as conflicts in the process of data flow and relatively low efficiency. 2) dedicated data path mode, the advantage is that there is a special path between devices, there is no conflict, speed block; the disadvantage is that the line layout is complex.

4. Function and working principle of the controller

When the program is running, the code written by the high-level language is compiled and converted into lines of 0swap 1 binary code (instructions) into main memory, and each instruction can be divided into four machine cycles (fetch cycle, inter-address cycle, execution cycle and interrupt cycle), and each machine cycle can be divided into several microoperations (data flow). Behind the development of these microoperations is the turn-on / off operation of a series of control signals, and the controller is the component that centrally manages these control signals.

The functions of the controller are: 1) fetch instructions; 2) analyze instructions; 3) generate control signals. The related functional structures are mainly composed of three parts: 1) the program counter PC, which indicates the location of the current instruction to be executed in the main memory; 2) the instruction register IR, which is used to store the instruction to be executed, analyzes the instruction, and provides the instruction opcode part to the control unit. 3) the control unit CU, which is also the core of the controller, according to the requirements of the instruction opcode, synthesizes the timing provided by the beat generator, the cycle mark provided by the machine cycle trigger and the feedback signal of each execution unit, outputs a set of control instructions (micro commands) to centrally control the control signal and guide the execution of the microoperation.

The function, structure and working mechanism of the CPU controller are introduced above. it is known that the control unit CU is the core of the controller. There are two ways to realize the function of this core component: 1) hard wiring, based on the implementation of hardware circuit; 2) microprogram idea, based on the implementation of software microinstruction. This is already the content of the CU design level, the following authors only do a simple principle explanation, do not do a specific development.

4.1 hard cabling (hardware ideas)

In the previous article, CU is abstracted as a black box, whose input is divided into four parts (timing signal, instruction signal, period mark, feedback signal), and the output is a group of control signals. As shown in the following figure, the idea of hard wiring is to establish the relationship between input signal and output signal through logic circuit, and the general process is to establish the logical expression between output and input variables. Then the knowledge of digital circuit is used to realize the corresponding logical expression.

The implementation process of hard wiring is pure hardware control, the advantage is fast response, the disadvantage is that the design and implementation process is more complex, and the scalability of hardware implementation is relatively poor. It is suitable for RISC instruction system, because the instruction in the instruction system is relatively simple and easy to be realized by circuit.

4.2 Microprograms (software ideas)

At the computer system level, the software implementation is equivalent to the hardware implementation. The essence of the input signal and output signal of the control unit CU is a set of instructions encoded by 0amp 1. The realization process of the "microprogram" is to establish the mapping relationship between the input signal coding and the output signal coding, and store them in the special memory (control memory CM), and run through memory access to extract the control signal coding to achieve microoperation. The schematic diagram of the structure is shown below.

When running, the CU encodes according to the input signal, converts it into the address encoded by the corresponding control signal through the micro-address forming unit, accesses the CM to extract the corresponding control signal coding, and sends it to each control signal (microinstruction), carries out the corresponding microoperation, and combines the micromanipulation to form instruction control. This process is very similar to accessing memory, so many components and processes add the word "micro" to the scientific name of "memory access". The advantage of microprogram controller lies in its simple design and implementation and good expansibility, while its disadvantage is that microinstruction execution requires memory access, and its speed is relatively slow, so it is suitable for CISC system, because the instruction in the instruction system is relatively complex and easy to be implemented with micromanipulation instructions.

5. Instruction pipeline

Compared with sequential execution, the total execution time of the program is greatly reduced by using pipeline mode.

5.1 ideal state

Ideally, each stage takes the same time, and after each stage ends, it can immediately move on to the next stage. At this time, simply arranging according to the concept of pipeline can greatly improve the efficiency of instruction operation.

There are two representations of pipeline: 1) instruction execution process diagram, which is mainly used to analyze instruction execution process; 2) space-time diagram, which is mainly used to analyze the performance of pipeline.

There are three performance indicators to evaluate the pipeline: 1) throughput, which refers to the number of tasks completed by the pipeline in a unit time; 2) speedup, the ratio of the time it takes not to use the pipeline to the time it takes to use the pipeline to complete the same batch of tasks; 3) efficiency, the utilization of pipelined equipment.

5.2 non-ideal state

The reality is that in the process of instruction batch operation, there will be a variety of conflicts, which will affect the operation efficiency of the assembly line. The main factors affecting the assembly line can be divided into three categories:

Structural correlation (resource conflict), that is, a conflict caused by multiple instructions competing for the same resource at the same time

Data correlation (data conflict). In a program, it is necessary to be able to execute the previous instruction in order to execute the latter instruction, that is, the data correlation of two instructions.

Control correlation (control conflict). When the pipeline encounters transfer instructions and other instructions to change the PC value, it will cause control correlation.

6. Summary

In this section, the structure, operation mechanism and data flow of CPU are introduced in detail, and go deep into the core of CPU. The function, structure, operation mechanism and two implementation strategies of the controller are introduced. Finally, the efficient operation strategy of CPU instructions based on pipeline technology is introduced.

five。 The bus is the "highway" in the computer system, which provides a common path for the "communication" between different hardware in the computer system and coordinates the "data flow". This chapter will be carried out in three aspects: 1) an overview of the basic concept of bus, classification according to different standards and four typical structures; 2) bus arbitration and transmission, this paper focuses on three arbitration methods to solve the problem of multi-device contention for bus and the process of using bus for data transmission. 3) bus standards form a wide variety of bus standards due to the development of transmission speed and different application scenarios.

1. Overview

1.1 basic concepts

A bus is a group of common information transmission lines that can be shared time-sharing for multiple components. It's like a highway in public transportation. The emergence of bus greatly simplifies the data transmission lines between different devices.

The characteristics of the bus: 1) mechanical characteristics, size, shape, number of pins and arrangement order; 2) electrical characteristics, transmission direction and effective level range; 3) functional characteristics, the function of each transmission line (address, data, control); 4) time characteristics, the timing relationship of signals.

1.2 bus Classification and Classical structure

1. Classify according to data transmission format

According to the format of data transmission, it can be divided into serial bus and parallel bus: 1) Serial bus is like an one-way street, data can only be transmitted one by one in line, the advantage is that it is low-cost and suitable for long-distance transmission, the disadvantage is that data need to be disassembled and assembled in advance in the process of sending and receiving. 2) parallel bus is like a two-way six-lane, data does not need to be queued and sent in parallel, but the logical timing is relatively simple, there is no need to disassemble and assemble the data, and the realization of the circuit is relatively simple, but the disadvantage lies in the large number of signal lines and the high cost of long-distance transmission.

2. Classify by bus function

Classified according to the function of the bus, it can be divided into three categories:

On-chip bus

Common connection lines between registers and registers and between registers and ALU in CPU chips

System bus

According to the different contents of the transmitted information, the main memory system bus can be divided into three categories: 1) data bus, which is used to transmit the data of addresses between functional components, is a two-way transmission line, and the number of bits is related to machine word length and memory word length. 2) address bus, used to point out the address of the main memory unit or I / O port where the source data or destination data on the data bus is located, is an one-way transmission line, and the number of bits of the address bus is related to the size of the main memory address space; 3) the control bus transmits control information, including control commands sent by CPU and feedback signals returned by main memory (or peripherals) to CPU.

According to the structure, the system bus can be divided into: 1) single bus; 2) double bus; 3) three bus; 4) four bus. The more complex the computer system structure is, the more complex the corresponding bus structure will be.

Communication bus

It is used for information transmission between computer systems or between computer systems and other systems (remote communication equipment, test equipment, etc.).

On-time control mode

It is divided into synchronous bus and asynchronous bus.

1.3 performance indicators

The performance indicators of the bus mainly include: 1) bus cycle, the time required for a bus operation (including application phase, addressing phase, transmission phase), which is usually composed of several bus clock cycles; 2) bus clock cycle, that is, machine clock cycle, determined by the clock system; 3) the working frequency of the bus, the reciprocal of the bus cycle, which actually means that data can be transmitted several times a second 4) the clock frequency of the bus, that is, the machine clock frequency; 5) the bus width, usually the number of elements of the data bus, determines the number of digits that can be transmitted at the same time

6) bus bandwidth

Bus data transfer rate, bus bandwidth = bus working frequency * bus width (bit / s).

7) bus reuse

A signal line transmits different types of information at different times (such as the multiplexing of address bus and data bus).

8) signal lines

The total number of address bus, data bus and control bus is the number of signal lines.

two。 Bus arbitration and transmission

2.1 Arbitration

How to solve the problem of multiple devices competing for bus? At this time, it is necessary to introduce arbitration strategy, which can be divided into two categories: centralized arbitration and distributed arbitration. There are three ways of centralized arbitration: 1) chain query; 2) counter query; 3) independent request.

Chain query mode

After receiving the bus request BR, the bus control unit checks the request status through each device interface according to the near and far chain. When it is found that the BR is not triggered through the device interface 0, it sequentially checks the device interface 1, finds that the BR is triggered, sends the bus permission BG to it and triggers the bus busy BS, and the device 1 interface acquires the bus control.

Counter query mode

After the bus controller receives the bus request BR, it judges that the bus is idle, the counter begins to count, and the value is sent to each device interface through the device address. When the request device interface address is consistent with the value, the device acquires the bus control, the counter stops counting and querying, and triggers the total busy BS.

Independent request mode

The required devices respectively send the bus request BR to the bus controller, and the bus controller sends the request to allow BG to the corresponding device according to a certain priority, and triggers the bus busy BS, and the corresponding device acquires the bus control.

Comparison of three centralized arbitration methods:

Distributed arbitration: when the device is requested by the bus, each sends its unique arbitration number to the shared arbitration bus, each arbitration number competes with each other, and the higher priority can be allowed by the bus. The characteristics of this arbitration method are that there is no need for a central arbitrator, each potential main module has its own arbitrator and arbitration number, and multiple arbitrators compete for the use of a bus.

2.2 Transmission proc

How do a pair of devices occupying the bus transmit data? The four stages of the bus cycle (bus transfer process):

In the application allocation stage, the application is submitted by the main module that uses the bus, and the bus use right of the next transmission cycle is granted to a certain applicant by bus arbitration. This stage can also be subdivided into two stages: transmission request and bus arbitration.

In the addressing phase, the master module that obtains the right to use sends out the address and related commands of the slave module to be accessed through the bus to start the slave module participating in this transmission.

In the transmission phase, the master module and the slave module exchange data, and the data can be transmitted singly or bidirectionally.

In the end phase, the relevant information of the main module is withdrawn from the system bus, giving up the right to use the bus.

3. Bus standard

The bus standard is the specification for the interconnection of different modules in the computer. According to the position of the bus in the computer system, it can be divided into: 1) system bus; 2) local bus; 3) device bus, communication bus.

The computer hardware architecture is shown in the figure, and the bus standards and related bus interfaces for the interconnection between the main hardware modules are shown below. Additional instructions are required:

Beiqiao chip is responsible for the control of the interconnection between high-speed devices (main memory, display adapter (graphics card)) and CPU. The bus transmission speed is high, also known as system bus, but now many computers integrate the functions of Northbridge chip into CPU.

Nanqiao is mainly responsible for the interconnection and control of some low-speed devices (network card, USB equipment, audio, hard disk, etc.) on the computer.

Super I / O is mainly responsible for the control of computer interconnection of I / O devices, which will be described in detail in the I / O system.

The common bus standards and their parameters are summarized as follows, focusing on the names of several commonly used buses, data transmission formats (parallel transmission, serial transmission), application scenarios (what hardware modules are connected), bus location (system bus, local bus or device bus).

4. Summary

The bus is a common information path shared by time-sharing in the computer, just like the highway, the advantage is that the circuit scale of the computer system is greatly simplified, and the problem introduced is the distribution of the right to use the bus. thus, it focuses on the "arbitration" mechanism and the "transmission flow" of the bus. Finally, some common bus standards, main parameters and applications are introduced.

6. I / O system "I / O system" is the "transfer station" between the core part of the computer (CPU, main memory) and external devices, which plays a role in data transition and transfer coordination. This chapter is mainly carried out from four aspects: 1) basic concepts, introduction of the composition of I / O system and I / O control methods; 2) external devices, a brief introduction of various external devices 3) I / O interface, as the most important part of I / O system, introduces its main functions and general structure, as well as the addressing mode of internal port; 4) I / O mode, focusing on three I / O control modes (program query mode, program interrupt mode and DMA mode), and expounds the operation mechanism, advantages and disadvantages of each mode.

1. Basic concept

1.1 composition of I / O system

Generally speaking, I / O system consists of I / O hardware and I / O software. Among them, 1) I / O hardware: including external devices, I / O interface and I / O bus, 2) I / O software: including drivers, user programs, management programs, upgrade patches, etc. I / O instructions and channel instructions are usually used to realize the information exchange between host and I / O devices.

1) I / O instruction

The command code is a part of the CPU instruction, which is slightly different from the normal instruction format. The opcode indicates what the CPU should do to the I / O device, and the command code indicates what the I / O interface should do to the device.

2) Channel instruction

For the instructions that can be recognized by the channel, the channel program is programmed in advance and placed in the main memory. In the computer containing the channel, CPU executes I / O instructions to issue commands to the channel, and the channel executes a series of channel instructions to manage I / O devices instead of CPU.

I / O interface: also known as I / O controller (device controller, a chip, usually integrated on the motherboard), responsible for coordinating data transmission between the host and external devices. Its function is not only an interface, but also a scheduling between the host and the peripheral.

1.2 I / O control mode

I / O control methods are mainly divided into four types:

Program query method: CPU constantly polls and checks the "status register" in the I / O controller, detects that the status is "completed", and then takes out the output data from the data register.

Program interrupt mode: when waiting for I keyboard / O, CPU can execute other programs first. after the keyboard I / O is completed, the I / O controller sends an interrupt request to CPU, CPU corresponding interrupt request, and takes away the input data

DMA (direct memory access) control mode: a direct data path (DMA bus) is added between the main memory and the I / O device. The DMA controller automatically controls the "read / write" of the data between the disk and the main memory, and only sends an interrupt request to the CPU before a whole piece of data is read and written.

Channel control mode: according to the requirements of CPU, the channel executes the channel program in the main memory, controls the I / O device to complete a series of tasks, and sends an interrupt request to CPU after the task is completed.

two。 peripheral

The external devices of the external computer system are mainly divided into: 1) input / output devices; 2) external storage devices. It is numerous and varied, and this section will not describe it in detail.

Input / output device

Input device: keyboard, mouse.

Output device: monitor (main parameters: screen size, resolution, grayscale, refresh frequency, display memory capacity and loan), printer.

External storage device

Disk (storage mechanism, structure, performance parameters, memory access process), optical disk memory and solid state hard disk (SSD).

2.I/ O interface

2.1 main functions and components

I / O interface is the transition between I / O bus and peripherals, its main functions are: 1) data buffering, matching the working speed of host and peripherals through data buffer register (DBR); 2) error or status monitoring, feedback of all kinds of errors and status information of devices through status registers for CPU reference; 3) control and timing, receiving control signals and clock signals from the control bus 4) data format conversion, serial-parallel, parallel-serial format conversion; 5) communicate with host and device to realize the communication between host-I / O interface-I peripheral.

The basic structure of I / O interface:

As shown in the figure, according to the functional requirements of the I / O interface, the main structure is composed of: 1) data buffer register DBR;2) device selection circuit; 3) equipment status marking; 4) command register and command decoder; 5) control logic circuit.

Take the control peripheral as an example to introduce the operation mechanism of the I / O interface.

Step1:CPU sends a signal to the I / O interface to select the device through the address line, and the device selection circuit determines whether it is the device, verifies it, changes the device status mark, and notifies CPU via the status line, "I am the person you are looking for."

Step2:CPU sends the command to the peripheral through the command register of the I / O interface through the command line.

Step3: the peripheral transmits data to the DBR of the I / O interface through the data line, and then notifies the I / O interface via the status line that the transfer is complete.

The step4:I / O interface changes the status tag and sends an interrupt request to CPU, "what you want is ready."

Step5:CPU responds to the interrupt request "I know, you give it to me" through the command line, and the I / O interface passes the data from DBR to CPU through the data line.

2.2 I / O Port and Addressin

The essence of data transfer between CPU and peripherals is to read / write some registers (such as data buffer registers, command registers, etc.) in I / O interface. The register here is also called the port, which is the difference between the port and the interface:

If you want to read / write registers, you have to address these registers in advance. There are two ways of addressing:

Unified addressing, the I / O port is assigned as a memory unit, and the I / O port can be accessed with the same memory access instruction, also known as memory mapping. The advantage of this method is that it does not need special input / output instructions, large programming space, and CPU access I / O is more flexible. The disadvantage is that it takes up memory space, a large number of address bits, slow address decoding speed and slow execution speed.

Independent addressing, I / O port address has nothing to do with memory address, independent addressing CPU needs to set up a special input / output instruction access port, also known as I / O mapping mode. There is an obvious difference between input / output instructions and storage instructions, and the programming is clear. The disadvantage is that a set of control signals is added, which increases the control complexity of CPU.

3.I/ O mode

3.1 Program query mode

As can be seen from the time sequence diagram of the program query mode: once CPU starts I / O, it must stop the operation of the current program and insert a section of program into the current program. The main features: CPU has the phenomenon of "step-by-step" waiting, CPU and I / O work in series. The corresponding I / O interface work flow chart and structure are shown in the figure: 1) as can be seen from the flow chart, once the peripheral is started, during the peripheral preparation phase, CPU constantly takes the peripheral state and judges whether the peripheral is ready and cannot execute other instructions; 2) because the DBR of the I / O interface is connected to the calculator of CPU, the efficiency is low when each data is transmitted word by word.

The advantages of program query mode are simple interface design and less equipment, but the disadvantage is that CPU needs a lot of time to query and wait in the process of information transmission, and if exclusive query is used, the information will be slowed down with a peripheral within a period of time, and the efficiency will be greatly reduced.

3.2 Program interrupt mode

Before introducing the program interrupt mode in I / O mode, let's understand the concept of interrupt system.

The basic concept and operation mechanism of interruption

Program interruption refers to the occurrence of some abnormal situations or special requests that continue to be processed by the computer during the execution of the current program. CPU temporarily suspends the current program and turns to deal with these abnormal cases or special requests. After processing, CPU automatically returns to the breakpoint of the current program and continues to execute the original program.

The workflow of the interrupt system is divided into three steps: 1) interrupt request; 2) interrupt response; 3) interrupt processing.

Interrupt request: the interrupt source pair sends an interrupt request signal to the CPU. CPU can determine which device is requested by the interrupt request by querying the interrupt request flag register.

Interrupt response: three conditions must be met in response to an interrupt: 1) there is an interrupt request from the interrupt source; 2) the CPU allows the interrupt to be turned on; and 3) one instruction has been executed without more urgent tasks. The condition of interrupt response is satisfied, that is, the interrupt is judged, and the subsequent interrupt processing is performed according to the set priority order.

Interrupt handling: there are two main tasks in the interrupt processing process: 1) save the breakpoint of the original program; 2) execute the interrupt service program. The first task is realized by the interrupt implicit instruction (saving the breakpoint in the form of stack and sending the entry address of the interrupt service program to PC), while the interrupt service program first protects the site (the state of the registers of the original program), and then executes the services of each interrupt device. The specific process is shown in the following figure flow chart:

There are many types of interrupts. In a broad sense, interrupts can be divided into internal interrupts (interrupt requests come from within CPU) and external interrupts (interrupt requests come from external sources and have nothing to do with the currently executed program). Different types of interrupts have different priorities. When different types of interrupt sources send interrupt requests at the same time, the priority determines the corresponding limited order of interrupt requests. (in general, hardware failure > software fault > unshielded fault > shielded fault).

Decision and priority setting of interruption

As mentioned earlier, when multiple interrupt sources appear at the same time, interrupt decision should play a role, and there are two ways to set priority: 1) hardware implementation, through hardware queuing device; 2) software implementation, through query program.

The flexible adjustment of priority order can be realized by introducing interrupt masking technology into the basis of the hardware queuer. This technology is mainly used for multiple interrupts (interrupt nesting). The main difference between it and single interrupts is that open interrupts are allowed in the process of interrupt processing. in order to achieve multiple interrupts.

As shown in the figure, on the basis of the hardware queuer, a "shield word" MASK is added at each interrupt request port. Taking the priority setting of An as an example (if you want B to get the highest priority), you can set the shield word MASK of A / C / D to "1", and you will find that when B is interrupted. Even if the A / C / D port sends an interrupt request (that is, A=C=D=1), the response on 2-3-4 is always "0", that is, it is shielded, thus realizing the adjustment of priority.

The process of designing the interrupt source shield word according to the priority order is as follows: according to the priority order, all interrupt sources set the mask word of the interrupt source higher than its priority to "0", and the shield word lower than its priority to "1". Its own shield word is set to "1", and then summarize the shield words of each interrupt source.

Program interrupt mode

After introducing the interrupt mechanism, let's take a look at how the interrupt strategy is applied to "I / O control mode". The above picture shows the composition of the I / O interface structure of "program interrupt mode", which is more complex than the I / O interface of "program query mode". Compared with the "program query mode", the proportion of CPU program execution in the program interrupt mode is greatly increased. It can be found that in the I / O preparation stage, CPU does not need to participate in the preparation stage and can continue to execute the original program. Only when the data transfer between the I / O interface and the peripheral needs to transfer data to CPU, the interrupt request is issued and CPU is given time to deal with: step1: execute interrupt implicit finger, save the breakpoint of the original program. And access the interrupt service program entry. 2) execute the interrupt service program, protect the original program site, and transmit data with the I / O interface, and restore the original program site after completion; 3) continue to execute the original program. It can be seen that the time that CPU is occupied by I / O is significantly reduced.

3.3 DMA mode

Instead of CPU, the DMA controller controls the I / O equipment and transmits data between the I / O peripherals and the main memory in series, which further liberates the CPU.

The CPU indicates to the DMA controller whether to input or output; how much data to transfer; the address of the data in the main memory and peripherals.

Before transmission: accept the DMA request from the peripheral (the peripheral sends an one-word request) and send the bus request to CPU; CPU responds to the bus request, sends out the bus response signal, takes over the control of the bus, and enters the DMA creation cycle

During transmission: define the address and length of the main memory unit for data transmission, and automatically modify the main memory address counter and transfer length counter; specify the transmission direction between main memory and peripherals, issue control signals such as read and write, and perform data transfer operations.

After transmission: reports the end of the DMA operation to the CPU.

The following figure shows the structural composition of the DMA controller, which can be found to be more complex than the I / O interfaces of "program query mode" and "program interrupt mode": 1) the main memory counter AR, which is used to store the main memory address of the exchanged data; 2) the transfer length counter WC, which records the length of the data to be transmitted, and notifies the interrupt mechanism through the overflow signal when the transmission is completed. 3) the function of the data buffer DBR and the device selection circuit is consistent with other "I / O modes"; 4) the DMA request trigger and control / state logic, when the data transmission between the I / O interface and the peripheral is completed, it strives for the access to the main memory from the CPU; 5: the interrupt mechanism, when the data transmission between the I / O interface and the main memory is completed, remind the CPU to complete the transmission.

Take the data transmission from the peripheral to the main memory as an example to introduce the transmission process of DMA:

Step1: in the preprocessing stage, CPU sends information such as the address of the data to be put into the main memory and the number of data transferred to the DMA controller, and notifies the startup I / O device

According to the information provided by CPU, step2:DMA controls to extract data from peripherals, put it into data cache DBR, and request bus use rights from CPU via DMA triggers and control / state logic. After passing, the data is transferred to memory through the data bus until the whole data block is transferred.

Step3: after the data block is transferred, the transfer length counter notifies the interrupt mechanism through the overflow signal, and the interrupt mechanism notifies CPU that "all the data you want has been sent to the main memory". CPU enters the interrupt service program (which is completely different from the service program task of the program interrupt mode), and goes to the main memory to verify whether the data meets the requirements and decide whether to transmit the data.

Step4:CPU continues to execute the main program.

According to the transmission mode of DMA, when CPU and I / O devices access main memory at the same time, in order to avoid conflicts, DMA controller and CPU usually coordinate in three ways: 1) stop CPU access to main memory; 2) DMA and CPU access memory alternately; 3) periodic theft.

The comparison between program interrupt mode and DMA control mode is shown in the following figure.

5. Summary

This section introduces the relevant knowledge of I / O system, mainly introduces the function and composition of I / O system, simply lists a variety of I / O peripherals, and focuses on the I / O interface between I / O bus and peripherals. The function, composition and three main I / O control modes of the interface are introduced in detail.

Knowing the motherboard and then recognizing the "motherboard", the author dismantles the notebook that has not been used for a long time, hoping that with the blessing of the theoretical system, we can get a little deeper understanding of the complex motherboard.

Open the computer motherboard, seemingly dense circuit components and various chips, in fact, can be roughly divided into several parts: 1) CPU and GPU (processor); 2) storage system (main memory and secondary memory); 3) Nanqiao chipset (bus control center) + EC+I / O interface + peripherals (I / O system); 4) clock system; 5) power supply system.

Several subsystems of Beiqiao and Nanqiao chipsets (bus system) are organized according to the following architecture (computer hardware architecture), in which Northbridge chip (MCH) and Nanqiao chip (PCH), as the control center of the bus, connect the hardware subsystems in series, and the architecture of the motherboard is expanded around these two chipsets accordingly. Beiqiao chipset is close to CPU and is responsible for data transmission between high-speed devices (CPU, graphics card and main memory). Nanqiao chip is responsible for the interconnection of low-speed devices. Hard disk, optical drive and USB interface all transmit data to CPU / main memory through Nanqiao chip.

CPU and GPU (processor) CPU and GPU are the two brains on the motherboard, of which CPU is good at logic / arithmetic operations and GPU is good at image processing related operations. The two chips on the front of the CPU are the core of the CPU, and the calculator, controller, cache and other components are integrated on it, often called Die, which are small squares cut from the Silicon Wafer. Before cutting down, each Die needs to go through a variety of processing to engrave the circuit logic onto the Die. As shown in the figure, the main components of the graphics card are the image processor and the display cache, and the relationship between them is the same as that between CPU and memory. The above motherboard, the graphics card integrated on the motherboard, is called the integrated graphics card, which takes up small space and low power consumption, but the performance is also relatively poor. For the occasions with relatively high requirements for image processing (video production, large games), we generally need to choose an independent graphics card, which can be connected to the motherboard through the bus interface, plug and play, powerful performance, but power consumption and volume are also much larger.

Main memory + hard disk (storage system) according to the storage pyramid structure mentioned in the text, the storage of the computer should be divided into three parts: 1) cache; 2) main memory; 3) auxiliary memory. The cache is integrated inside the CPU and is not visible on the motherboard.

Main memory: it consists of memory particles (based on DRAM storage technology), SPD (serial presence detection) and bus interface. Among them, the memory particle is the storage carrier of main memory, as shown in the figure, the memory bar is expanded by 8; SPD is an 8-pin EEPROM chip. The important parameters of the memory, such as working frequency, working voltage, speed, capacity, voltage and row and column address bandwidth, are recorded in the chip, which is convenient for the computer system to configure the corresponding working time sequence and other parameters according to these parameters; the bus interface determines the bus standard of the memory and its transmission rate. Auxiliary memory: the main board above is connected with a mechanical hard disk of 500GB, the memory body is a disk, and the data access is realized through the mechanical rotation of the disk, and the transmission speed is slow. At present, most of the auxiliary memory is a solid-state hard disk with faster access speed, and its memory body is a memory chip based on ROM storage technology, and its structure also includes a cache chip and a main control chip.

The hardware of Nanqiao chip + EC+I / O interface + peripheral (I / O system) I / O system includes I / O bus, I / O interface and external equipment. Among them, Nanqiao chip and EC (embedded control chip) are jointly responsible for the bus control of all I / O peripherals. A large number of interfaces on the side of the motherboard are the I / O interfaces connecting peripherals, and the control circuit adjacent to the interface is very important, which determines the I / O mode of the corresponding peripherals.

Clock system + Power supply system (other) in addition to the above major subsystems, the motherboard also includes a clock system for providing timing signals and a power supply system for the motherboard and specific components. Limited by the cognitive level and length of the author, it is not specific here.

Summary of this article is the author's systematic summary of the five-month learning process of "computer system", from popular "introduction" to continuous "deepening" of professional knowledge points, and then to simple "practice". Basically according to the principle of "complete system, clear structure and progressive hierarchy", it is clear that "how the software / hardware at the bottom of the computer works together to ensure the smooth operation of the code written in the high-level language". At this point, although there are more than 30,000, it is only superficial compared with the complex system of computer knowledge. Even so, as an electromagnetic CAE designer, the author's learning process is difficult to say easy, and the relevant classics and course videos are learned over and over again, and the frustrations of the process often make people question the significance: it can not help you solve the technical problems in the project, nor can it help you improve your programming skills quickly. This may be a common problem for all basic theory learning processes or system-based learning processes. Often, I think of Michael Faraday's answer to a lady who questioned the role of its disc generator: "Ma'am, what's the point of a newborn baby? but he will grow up!"

references

"computer Science crash course", bilibili video, keynote speaker: Carrie Anne

"designing a computer from 0 to 1", bilibili Video, author: Ele Lab

"how computers run", popular Science Reading, by Yukio Yazawa (Japan)

"how does the Program run", popular Science Reading, by Yukio Yazawa (Japan)

"the principle of computer composition", bilibili video, source: Wang Dao postgraduate entrance examination

"in-depth understanding of computer Systems", academic monograph, author: Randal-E-Bryant (USA)

"fully explain the structure and principle of computer motherboard (diagram)", CSDN blog, author: stm32-cyy

This article comes from the official account of Wechat: electromagnetic CAEer (ID:lb1661057986), author: Liu Bing

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.