In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "what's the difference between database index and full table scan". Interested friends might as well take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what's the difference between database index and full table scan"?
Disk structure and basic time consumption
The organizational structure of the disk disk-> track-> sector. Because the disk is operated in parallel, the time to find the disc can be ignored. So basically to find a data need to find the corresponding track (tree-like rings), and then find the corresponding sector (a sector).
The main metrics of disk performance are as follows:
Access time: the time between the time when a read-write request is made and the start of data transmission. That is, the time when the disk locates the data, which is the seek in the program. The access time includes seek time (find track) and rotation wait time (find sector). Usually in a few milliseconds.
Data transfer rate: after positioning the data. The data is transferred from disk to memory. This time is usually dozens of MB per second.
Sequential access vs random access
The files on disk are organized in blocks, and here the block is a logical concept, perhaps 512 bytes to a few KB. Reading data from disk requires reading piece by piece. Even if you only read 1Byte data, you will read one piece.
Sequential access: successive access to blocks adjacent to the disk. In this way, the disk only needs one disk seek.
Random access: random access to blocks in different locations of the disk, usually reading only a small amount of data at a time. In this way, each random access request processed by the disk requires a disk seek. The efficiency of random access is much lower than that of sequential access.
Storage model
Hardware: disk data transfer rate is recorded as T, and average access time is recorded as S.
Data: a dataset containing N pieces of data, the data is comparable. The data is stored disorderly on disk, and the data is evenly distributed. The space occupied by each data is X, so the total size of the data is NX.
This picture shows the order in which the data is stored on disk:
Index: build an index on the data, and the index can be seen as a mapping of the data, a representation. It can all be placed in memory, and the original data can be accurately located.
Query process
Query mode: the query has a filter condition, assuming that the selection of the filter condition is F, which means that the query result set accounts for F times the total amount of data, and F is between [0 ~ 1].
Now there are two ways to query: full table scan and index. Full table scans and indexes are logical concepts.
Full table scan: the simplest query operation. Read the data one by one from the disk to memory for filtering, and finally return the results. The characteristic of this method is that no matter whether the data is useful or not, it is read first, and the total amount of data read by disk is large, but the seek is only once. Sequential access to the corresponding disk.
Yellow indicates that you need to read data from disk to memory, as is the case with a full table scan:
Total table scan time = IO time = NX/T
Index: because the data on the disk is out of order, we build a B+ tree index and maintain the index in memory. The index sorts all the data and records the corresponding disk location. When querying, first filter out the location of all the result sets on the disk on the index, and then read the result set accurately on the disk. This includes a small amount of disk IO+ and a large number of seek. Random access to the corresponding disk.
The effect picture is as follows: the operation of the disk is to locate one data, read, and then locate the next data.
Seek time consuming: NFS
IO time consuming: NFX/T
Total index query time = Seek time + IO time = NFS + NFX/T
Compare
Let's take a look at these parameters. When hardware updates are not taken into account, disk throughput T, average access time S, data volume N, and each data size X are constant and cannot be changed.
There are only five parameters NTFSX, and then only F, this thing is a variable, depending on the query filter conditions. For example, if you want to look for boys over 150 in height, there will be no distinction in this filter. Most of them will be selected if they are FFT 0.8, but if they are over 190, only a small number of them will be selected.
If there is a difference, there are different countermeasures, and we can choose to look up the index or full table scan according to F. Calculate directly when an index query is faster than a full table scan, that is, the following formula:
NFS + NFX/T < NX/T
That is, F < X / (TS+X)
As you can see, it has nothing to do with the total amount of data, when F is small enough, it is better to choose the index. If there are too many result sets and too many seek, then a full table scan is better.
Examples
Take a practical example and feel it:
Average Seek time: Stem5 ms
Disk throughput: tipped 300 MB/s
Single data size: Xuan 128 Byte
At this time, the selection of filter conditions needs to be less than 0.008%.
At this point, I believe you have a deeper understanding of "what's the difference between database index and full table scan". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.