Basic computer knowledge-Linux operating system 07/02 Update SLTechnology News&Howtos

Basic computer knowledge-Linux operating system

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

This operating system is mainly divided into two parts, one is the knowledge of the operating system in books, and the other is the knowledge related to linux:

Linux related knowledge

# (1) the difference between synchronous async and blocking non-blocking in Linux (super important)

The first is the difference between synchronous async and blocking non-blocking:

Synchronization: synchronization means that when a function call is made, the call does not return until the result is obtained. In other words, you must do one thing at a time, and wait until the previous one is done before you can do the next thing.

For example, the normal BPX S mode (synchronization): submit the request-> wait for the server to process-> the client browser cannot do anything during the period when it is returned after processing.

Async: the concept of asynchrony is relative to synchronization. When an asynchronous procedure call is made, the caller cannot get the result immediately. When the part that actually handles the call is finished, it notifies the caller through status, notification, and callback.

For example, an ajax request (asynchronous): the request is triggered by an event-> processed by the server (which means the browser can still do other things)-> processed.

Blocking: blocking calls means that the current thread is suspended before the result of the call is returned (the thread enters a non-executable state, in which case cpu does not allocate time slices to the thread, that is, the thread pauses). The function does not return until the result is obtained.

Some people may equate blocking calls with synchronous calls, but they are actually different. For synchronous calls, most of the time the current thread is still active, but logically the current function does not return, it will also preempt the cpu to execute other logic, and will actively detect whether the io is ready.

Non-blocking: the concept of non-blocking corresponds to blocking, which means that the function does not block the current thread but returns immediately until the result is not immediately available.

To put it more simply, it is:

1. Synchronization is when I call a function that doesn't end before I die waiting for the result.

two。 Asynchronism means that I call a function and do not need to know the result of the function. Notify me when the function has a result (callback notification)

3. Blocking is when I (function) is called, and I will not return until I have received the data or got the result.

4. Non-blocking means that I (function) is called, and I (function) returns immediately, notifying the caller through select

The difference between synchronous IO and asynchronous IO lies in whether the process is blocked when data is copied.

The difference between blocking IO and non-blocking IO is whether the application call returns immediately.

To sum up, synchronous and asynchronous, blocking and non-blocking, some of them are mixed, but they are not the same thing at all, and they modify different objects.

Five IO models in linux:

Blocking iPink O (blocking IBG O)

Non-blocking I-nonblocking O (non-blocking I-O)

ICandle O Multiplexing (select and poll) (Igamot O multiplexing)

Signal-driven signal driven O (SIGIO)

Asynchronous asynchronous O (the POSIX aio_functions)

* the first four are synchronous, and the last one is asynchronous.

1. Blocking IPUBO:

The application calls an IO function, causing the application to block and wait for the data to be ready. If the data is not ready, keep waiting. . The data is ready and copied from the kernel to user space, and the IO function returns a success indication.

Block I recvfrom O model diagram: the process of waiting for and copying data in the kernel when the recv () / recvfrom () function is called.

When the recv () function is called, the system first checks to see if there is any prepared data. If the data is not ready, the system is waiting. When the data is ready, the data is copied from the system buffer to user space, and the function returns. In a socket application, when the recv () function is called, the data may not already exist in user space, so the recv () function will be in a waiting state.

two。 Non-blocking IPUBO

Non-blocking IO repeatedly calls the IO function through the process (multiple system calls and returns immediately); in the process of data copying, the process is blocked

Setting a SOCKET interface to non-blocking tells the kernel not to sleep the process but to return an error when the requested Imax O operation cannot be completed. In this way, our Iwhite O operation function will continue to test whether the data is ready, and if not, continue to test until the data is ready. In this continuous testing process, it will take up a lot of CPU time.

3. IO reuse

Mainly select and epoll; to an IO port, two calls, two returns, there is no advantage over blocking IO; the key is to be able to monitor multiple IO ports at the same time; select, poll, and epoll functions will be used in the Imax O reuse model, which will also block the process, but unlike blocking Imax O, these two functions can block multiple Imax O operations at the same time. Moreover, multiple read operations and multiple write operations can be detected at the same time, and the Icano operation function is not really called until there is data to read or write.

The IO reuse model uses the select, poll, and epoll functions, which also block the process, but unlike blocking Imax O, these two functions can block multiple Imax O operations at the same time. Moreover, multiple read operations and multiple write operations can be detected at the same time, until the data is readable or writable (note that not all of the data is readable or writable).

4. Signal driven IO

First of all, we allow the socket to do a signal-driven I _ hand O, and install a signal handler, so that the process continues to run without blocking. When the data is ready, the process receives a SIGIO signal, which can be called in the signal processing function to process the data.

5. Asynchronous IO

When an asynchronous procedure call is made, the caller cannot get the result immediately. The part that actually handles the call notifies the caller of the input and output operation through status, notification, and callback after completion.

6. Summarize and compare the following five IO models:

(2) understanding of the file system (EXT4,XFS,BTRFS)

1.Ext4 file system

Ext4 also has some obvious limitations. The maximum file size is 16 tebibytes (about 17.6 terabytes), which is much larger than the current hard drive available to the average user. The maximum volume / partition you can create with ext4 is 1 exbibyte (about 1152921.5 terabytes). By using a variety of techniques, ext4 is much faster than ext3. Similar to some state-of-the-art file systems, it is a journaling file system, meaning that it records the location of files on disk and any other changes made to the disk. For all its features, it does not support transparent compression, deduplication, or transparent encryption. Snapshots are technically supported, but the feature is still in the experimental stage.

2.XFS file system

The XFS file system is an extension of the extended file system (extent file system). XFS is a 64-bit high-performance journaling file system. Support for XFS was incorporated into the Linux kernel around 2002, and in 2009, Red Hat Enterprise Linux 5.4 also supported the XFS file system. For 64-bit file systems, XFS supports a maximum file system size of 8 exbibytes. The XFS file system has some drawbacks, such as its inability to compress and poor performance when deleting a large number of files. Currently, RHEL 7.0 file systems use XFS by default.

3.trfs file system

There are many different names for btrfs, such as Better FS, Butter FS, or B-Tree FS. It is a file system developed almost completely from scratch. Btrfs emerged because its developers initially wanted to extend the capabilities of the file system to include snapshots, pooling, validation, and other features. Although it has nothing to do with ext4, it also wants to retain the features of ext4 that benefit consumers and businesses, and integrate additional features that benefit everyone, especially businesses. For companies that use large software and large databases, file systems that make multiple different hard drives look consistent can benefit them and make data consolidation easier. Removing duplicate data reduces the amount of space that data actually uses, and using btrfs makes data mirroring easier when you need to mirror a single and large file system.

Of course, users can continue to choose to create multiple partitions without having to mirror anything. With this in mind, btrfs can span multiple hard drives and can support more than 16 times the disk space compared to ext4. The btrfs file system has a maximum partition of 16 exbibytes and a maximum file size of 16 exbibytes.

(3) the three commands of file processing grep,awk,sed must be known.

1.grep, sed and awk are all text processing tools. Although they are all text processing toolsheets, they all have their own advantages and disadvantages. One text processing command cannot be completely replaced by another, otherwise there will not be three text processing commands. It's just that, in comparison, sed and awk are more powerful and have been introduced in a separate language.

2.grep: text filters. If you just filter text, you can use grep, which is much more efficient than other filters.

3.sed:Stream EDitor, stream editor, only deals with schema space by default, not the original data. If the data you are dealing with is for rows, you can use sed

4.awk: report generator, displayed after formatting. If you need to generate information such as reports for the data you are working with, or if the data you are working with is processed in columns, it is best to use awk.

(4) in-depth understanding of the three methods of IO reuse (select,poll,epoll), including the differences between the three, the implementation of internal principles?

1. Understanding of IO reuse:

Select,poll,epoll is the mechanism of IO multiplexing. Icano multiplexing uses a mechanism to monitor multiple descriptors, and once a descriptor is ready (usually read-ready or write-ready), it can inform the program to read and write accordingly. But select,poll,epoll is essentially synchronous IMab O, because they all need to be responsible for reading and writing after the read-write event is ready, that is to say, the read-write process is blocked, while asynchronous IWeiO is not responsible for reading and writing on its own, and the implementation of Asynchronous IWeiO is responsible for copying data from the kernel to user space.

At this point, you need to know two concepts:

The so-called blocking mode block, as the name implies, means that when a process or thread executes these functions, it must wait for an event to happen. if the event does not occur, the process or thread is blocked and the function cannot return immediately.

The so-called non-blocking non-block means that when a process or thread executes this function, it does not have to wait for the occurrence of the event. Once it is sure to return, the different return values reflect the execution of the function. If the event occurs, it is the same as the blocking mode. If the event does not occur, a code is returned to inform the process or thread that the event has not occurred, and the process or thread continues to execute, so it is more efficient.

2. Select analysis

The mechanism of select () provides a data structure of FD _ set, which is actually an array of long type. Each array element can establish a connection with an open file handle (whether it is a Socket handle, or other files or named pipes or device handles). The work of establishing the connection is done by the programmer. When select () is called, the kernel modifies the contents of the fd_set according to the IO state. This informs the process that executed select () which Socket or file is readable or writable. It is mainly used in Socket communication.

Select use: it can monitor changes in file descriptors we need to monitor-read, write or exception. The number of descriptors ready, returning 0 if timeout and-1 if error occurs.

1. If one discovers that IWeiO has input, and the other has input during the reading process, there will be no reaction. This requires your program statement to use the select function to know that there is data input.

two。 When the program goes to select, if there is no data input, the program will wait (when blocked) until there is data, that is, there is no loop and sleep in the program.

Function analysis:

# include

Int select (int nfds, fd_set * readfds, fd_set * writefds, fd_set * exceptfds, struct timeval * timeout)

one

two

three

four

The function returns the result: this time select () ends when the file mapped in readfds or writefds is readable or writable or timed out. Programmers use a set of system-provided macros to determine which file is readable or writable at the end of select (), and readfds is particularly useful for Socket programming.

Note: different timeval settings make select () show three characteristics: timeout ending, no timeout blocking, and polling (timeval can be accurate to 1/1000000 seconds).

Select performs the steps in detail:

(1) use copy_from_user to copy fd_set from user space to kernel space

(2) register the callback function _ _ pollwait

(3) iterate through all fd and call their corresponding poll method (for socket, this poll method is sock_poll,sock_poll will call tcp_poll,udp_poll or datagram_poll according to the situation)

(4) take tcp_poll as an example, its core implementation is _ _ pollwait, which is the callback function registered above.

(5) the main job of _ _ pollwait is to hang the current (current process) in the waiting queue of the device. Different devices have different waiting queues. For tcp_poll, the waiting queue is sk- > sk_sleep (note that hanging the process to the waiting queue does not mean that the process is asleep). After the device receives a message (network device) or fills in the file data (disk device), the device wakes up the process waiting for sleep on the queue, and the current is awakened.

(6) when the poll method returns, it returns a mask mask that describes whether the read and write operation is ready, and the fd_set is assigned a value according to this mask mask.

(7) if you have finished traversing all the fd and have not returned a read-write mask mask, the process that calls schedule_timeout (that is, current) that calls select will go to sleep. When the device driver has its own resources readable and writable, it will wake up the process of sleeping in the waiting queue. If no one wakes up after a certain timeout (specified by schedule_timeout), the process calling select will be awakened to get the CPU, and then re-traverse the fd to determine if there is a ready fd.

(8) copy fd_set from kernel space to user space.

The characteristics of select can be obtained from the above workflow:

a. There is an upper limit on the number of event descriptors for each type of event monitored.

Printf ("% d\ n", sizeof (fd_set))

one

All my linux system can care about is 128byte * 8x1024 descriptors.

b. Polling before and after invocation

Using the select function, you must use an auxiliary array to save the concerned descriptors. Because the descriptor set in the select function is input and output parameters, you should poll the array to reset the descriptor set before calling, and then poll the descriptor set to determine whether the event is ready or not.

c. System and user data copy: use copy_from_user to copy fd_set from user space to kernel space.

d. It needs to be reset before calling (the descriptor set is an input-output parameter).

3. Poll analysis

The implementation of poll is very similar to select, except that the fd collection is described in a different way. Poll uses the pollfd structure instead of select's fd_set structure, and everything else is similar.

# include

Int poll (struct pollfd fds [], nfds_t nfds, int timeout)

one

two

Monitor descriptor event options:

Fds: an array of struct pollfd structure type, which is used to store Socket descriptors that need to detect its state. Every time this function is called, the system will not empty the array, so it is convenient to operate. Especially when there are many socket connections, it can improve the efficiency of processing to a certain extent. This is different from the select () function. After calling the select () function, the select () function clears the collection of socket descriptors it detects, causing the socket descriptor to be readded to the set to be detected before each call to select (); therefore, the select () function is suitable for detecting only one socket descriptor, while the poll () function is suitable for the case of a large number of socket descriptors. Very similar to select (), when a positive value is returned, it represents the number of file descriptors that satisfy the response event, and if 0 is returned, no event occurs within a specified time. If you find that the return is negative, you should check errno immediately, because this indicates that an error has occurred.

Note: if no event occurs, the revents will be emptied.

Poll features:

There is no upper limit to the number of monitoring descriptors; the maximum descriptor is + 1, and the number is determined by the fds array.

two。 The monitoring event is inversely separated from the event state after the return, and there is no need to reset before and after the call.

3. Polling after the call detects whether a monitoring event occurs.

4. System and user data copy: copy fds from user space to kernel space using copy_from_user

4. Epoll analysis

Epoll is an improved poll made by the linux kernel to deal with a large number of file descriptors. It is an enhanced version of the multiplexed IO interface select/poll under Linux. It can significantly improve the CPU utilization of the system when there is only a small amount of active programs in a large number of concurrent connections. Another reason is that when getting events, it doesn't have to traverse the entire set of listeners, just traversing the set of descriptors that are asynchronously awakened by kernel IO events and added to the Ready queue. Epoll not only provides horizontal trigger (Level Triggered) of IO events like select/poll, but also provides edge trigger (Edge Triggered), which makes it possible for user space programs to cache IO state, reduce epoll_wait/epoll_pwait calls, and improve application efficiency.

Epoll features:

The difference between 1.epoll and the calling interfaces of select and poll.

Both select and poll provide only one function-- the select or poll function. Epoll provides three functions, epoll_create,epoll_ctl and epoll_wait,epoll_create to create an epoll handle, epoll_ctl to register the type of event to listen for, and epoll_wait to wait for the event to be generated.

two。 Use mmap to accelerate messaging between the kernel and user space.

For the system and kernel data copy of each call of the select and poll functions: epoll is implemented through the same block of memory of the kernel and user space mmap. In the epoll_ctl function: each time a new event is registered into the epoll handle (specify EPOLL_CTL_ADD in the epoll_ctl), all fd is copied into the kernel instead of being copied repeatedly during epoll_wait. Epoll guarantees that each fd will be copied only once during the entire process.

3. There is no need to poll to determine whether the descriptor event is ready after the call.

Polling to check whether the event occurs after each call of the select and poll functions: unlike select or poll, the epoll solution takes turns adding current to the waiting queue of the device corresponding to fd, but only hangs the current once during epoll_ctl (this time is necessary) and specifies a callback function for each fd, which is called when the device is ready to wake up the waiters on the waiting queue. This callback function adds the ready fd to a ready list. Epoll_wait 's job is actually to see if there is a ready fd in this ready list (using schedule_timeout () to sleep for a while and judge the effect of a while).

4. There is no upper limit on the number of monitoring descriptors.

Epoll does not have this limit, and the upper limit of FD it supports is the maximum number of files that can be opened, which is generally much greater than 2048. Note: on machines with 1GB memory, it is about 100000, and the specific number can be seen by cat / proc/sys/fs/file-max. Generally speaking, this number has a lot to do with system memory.

The 5.IO efficiency does not decrease linearly with the increase of the number of FD.

Another fatal weakness of traditional select/poll is that when you have a large set of socket, but due to network delay, only part of the socket is "active" at any time, but select/poll will linearly scan all sets with each call, resulting in a linear decline in efficiency. But epoll doesn't have this problem, it only operates on "active" socket-this is because in the kernel implementation epoll is implemented according to the callback function above each fd. Only the "active" socket will actively call the callback function, while other idle status socket will not.

Extension: the system maintains a red-black tree (balanced search binary tree: stable) stores monitoring descriptors, and a linked list stores ready descriptors. Every time you register or modify and delete a new file descriptor to the epoll handle, you will add a descriptor to the red-black tree of this lesson (easy to add, delete, change and check). When returned, check if there are any nodes on the linked list, and copy them to the descriptor array passed to it by the user.

Compared with select and poll, epoll has the following significant advantages:

(1) the select,poll implementation needs to poll all fd collections on its own until the device is ready, during which sleep and wake may be alternated several times. In fact, epoll also needs to call epoll_wait to continuously poll the ready list, and may alternate sleep and wake up many times during this period, but it calls the callback function when the device is ready, puts the ready fd into the ready list, and wakes up the process that goes to sleep in the epoll_wait. Although both sleep and alternate, select and poll traverse the entire fd collection when they are awake, while epoll only needs to determine whether the ready list is empty while awake, which saves a lot of CPU time. This is the performance improvement brought about by the callback mechanism.

(2) each time select,poll calls, it copies the fd collection from user mode to kernel mode, and hangs the current in the device waiting queue once, while epoll only copies once, and hangs current on the waiting queue only once (at the beginning of epoll_wait, note that the waiting queue here is not a device waiting queue, but a waiting queue defined internally by epoll). It can also save a lot of money.

# 5. Summary:

Poll and epoll are suitable for applications that care about a large number of descriptors. Epoll has the advantage of having only a few descriptors ready at a time (using a callback mechanism to monitor descriptor readiness).

To sum up: epoll is the most efficient of the above three functions.

(5) ET mode and LT mode of Epoll (non-blocking of ET)

LT (level triggered) is the default way to work and supports both block and no-block socket. In this practice, the kernel tells you whether a file descriptor is ready, and then you can IO the ready fd. If you don't do anything, the kernel will continue to notify you, so programming errors in this mode are less likely. The traditional select/poll is the representative of this model.

ET (edge-triggered) is a high-speed working mode and only supports no-block socket. In this mode, the kernel tells you through epoll when the descriptor is never ready to become ready. It then assumes that you know that the file descriptor is ready and will not send any more ready notifications for that file descriptor until you do something so that the file descriptor is no longer ready (for example, you are sending, receiving, or receiving a request, or sending less than a certain amount of data caused an EWOULDBLOCK error).

(6) query process occupies the command of CPU (pay attention to understand the meaning of used,buf,cache)

TOP commands are very common, and the parameters in them are analyzed as follows:

Used: the amount of physical memory that has been used

Total: total physical memory

Free: free physical memory

Buffers: the amount of memory used for kernel caching

Cache: size of buffered swap space

Buffers is different from cached: buffers refers to the read and write buffer of a block device, and cached refers to the page cache of the file system itself. They are the underlying mechanisms of the Linux system in order to speed up access to the disk.

(7) other common commands of linux (kill,find,cp, etc.)

The kill command is used to delete a program or work in progress. Kill can send the specified information to the program

Kill 3268

one

The find command is used to find files in the specified directory. Any string that precedes a parameter is treated as the name of the directory you are looking for. If you use this command without setting any parameters, the find command looks for subdirectories and files under the current directory. And all the subdirectories and files found are displayed.

Find / home-name "* .txt"

one

The cp command is used to copy one or more source files or directories to a specified destination file or directory. It can copy a single source file into a specific file with a specified file name or an existing directory. The cp command also supports copying multiple files at the same time, and when copying multiple files at a time, the target file parameter must be an existing directory, otherwise an error will occur.

Cp file / usr/men/tmp/file1

one

(8) shell script usage

Shell is a scripting language, so you must have an interpreter to execute these scripts. The most common interpreter in linux is bash.

Scripting language does not need to be compiled and is an interpretive language that can be interpreted and run directly through the interpreter.

(9) the difference between hard connection and soft connection

Hard connection refers to the connection through an index node. In Linux's file system, files saved in disk partitions, regardless of type, are assigned a number called index node number (Inode Index). In Linux, it is possible to have multiple file names pointing to the same Inode. For example: an is the hard link of B (An and B are both file names), then the inode node number in A's directory entry is the same as the inode node number in B's directory entry, that is, an inode node corresponds to two different file names, two file names point to the same file, and An and B are completely equal to the file system. Deleting any of them will not affect the access of the other.

Another kind of connection is called symbolic connection (Symbolic Link), also called soft connection. Soft-link files have shortcuts similar to Windows. It is actually a special file. In symbolic links, a file is actually a text file that contains information about the location of another file. For example, An is the soft link of B (An and B are both file names), the inode node number in A's directory entry is different from the inode node number in B's directory entry, An and B point to two different inode, and then point to two different data blocks. But what is stored in the data block of An is only the path name of B (the directory entry of B can be found based on this). There is a "master-slave" relationship between An and B, and if B is deleted, A still exists (because the two files are different), but points to an invalid link.

(10) what do you think of file permissions (rwx)

R: it means reading, 4

W: it means to write, 2

X: represents execution, 1

Combination: through the combination of 4, 2, 1, the following permissions are obtained: 0 (no permission) 4 (read permission) 5 (4 read 1 | read + execute) 6 (4 read 2 | read + write) 7 (4 read 2 write 1 | read + write + execute)

From left to right:

1-3 digits represent the permissions of the file owner

4-6 digits represent the permissions of users in the same group

The 7-9 number represents the permissions of other users.

For example, chmod 777a

(11) when will the three times of the file (mtime, atime,ctime) be changed?

A file also has three kinds of time, namely: access time atime, modification time mtime, status time ctime, which are Access time, Modify time and Change time, respectively.

Access time: once the file is read, its access time will be changed. For example, operations such as cat, more, etc., but state and ls commands like before will not affect atime.

Modification time: the time when the content of the file was last modified. This is the time shown by the ls-l command that we often use. When the file is edited with vim, its mtime will be changed accordingly.

State time: when the state of the file is changed, the state time will change accordingly, for example, when using chmod, chown and other operations to change the file properties will change the ctime of the file.

(12) commands for Linux to monitor network bandwidth and view the usage of network resources by specific processes

Monitor overall bandwidth usage-nload, bmon, slurm, bwm-ng, cbm, speedometer and netload

Monitor overall bandwidth usage (batch output)-vnstat, ifstat, dstat and collectl

Bandwidth usage per socket connection-iftop, iptraf, tcptrack, pktstat, netwatch and trafshow

Bandwidth usage per process-- nethogs

Copyright notice: this article is the original article of CSDN blogger "Zuoer Madness", in accordance with the copyright agreement of CC 4.0BY-SA. Please attach the original source link and this statement to reprint it.

Original link: https://blog.csdn.net/u012414189/article/details/83830848

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.