In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you how to deeply understand MySQL from the programmer's point of view, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.
Preface
As a programmer who has worked for 4 years, today I will explore the mysteries of the database from the programmer's point of view and take MySQL as an example!
Basic principles of database
My understanding of DB
First, the composition of the database: storage + instance
Needless to say, the data needs to be stored; storage is not enough. Obviously, the provider needs to encapsulate the stored operation and provide API for addition, deletion, modification and query, that is, instances.
A storage can correspond to multiple instances, which will improve the load capacity and high availability of the storage. Multiple storage can be distributed in different data centers and regions, and disaster recovery will be achieved.
Second, press Block or Page to read the data
As you can see with your thighs, it is impossible for a database to read data by row (Why? ^ _ ^). In essence, databases, such as Oracle/MySQL, are based on physical blocks of fixed size (such as 16K) (Block or Page, which I call Block here) for scheduling and management. To know that Block is the concept of a database, how does it correspond to the file system? It is obvious that you need to point out "where is the address of this Block". When the address is found, reading a fixed size of data is equivalent to reading the Block.
The database is very smart, it will not only read the Block that needs to be read, it will also read and load nearby Block blocks into memory for us. In fact, this is to reduce the number of IO and improve the hit rate. In fact, the Block near a Block block is also hot data, which is necessary!
Third, disk IO is the performance bottleneck of database.
There is no doubt that the data is on disk, without disk IO. What magnetic head rotation, positioning track, addressing process, not to mention, we are programmers, also can not control these. But this process is really time-consuming, and memory reading is not the same order of magnitude, so there are many ways to reduce IO and improve database performance.
For example, increase the memory so that the database loads more data into memory. Memory is good, but it can't be abused. Why do you say that? Suppose there is 100g of data in the database, if all are loaded into memory, that is to say, the database has to manage 100g of disk data + 100g of memory data. Are you tired? The database has to deal with the mapping between disk and memory, the synchronization of data, and the cleaning of in-memory data. If database transactions are involved, it is a series of complex operations. However, it should be pointed out here that in order to speed up the memory lookup speed, the database generally stores the memory with HASH.
For example, the use of indexes, the index compared to memory, is a very cost-effective thing, the following detailed introduction of the principle of MySQL index.
For example, take advantage of better performance disks. (it has nothing to do with us)
Fourth, ask some questions and think about it:
Why do we say that using delete to delete the data of a table is slower than trancate a table?
[it takes a lot of effort to find and delete by row; a deletion based on Block architecture]
Why do we say that a small watch drives a big watch?
Will the small watch drive the big watch fast? What the heck? aren't MitchN and Nymph the same? Where there is a ghost, there is an index!]
Explore the principles behind MySQL indexing
For the vast majority of applications, the read-write ratio is at 10:1, or even 100 select 1, and insert/update is very difficult to have performance problems, the most encountered, the most intractable is select, select optimization is the top priority, obviously without index!
Speaking of MySQL indexes, we will have a lot of these things: BTree index / B+Tree index / Hash index / clustered index / nonclustered index. So many, dizzy!
What exactly is the index and what problem do you want to solve?
It's a clich é. The official website says that MySQL index is a kind of data structure, and the purpose of index is to improve query efficiency.
To put it bluntly, if you don't use the index, the disk IO times are more! What should I do if I want to reduce the number of disk IO?
We want to filter out the final results by constantly narrowing the scope of the data we want, and control the number of disk IO of each data search to a very small order of magnitude, preferably a constant number.
In order to deal with the above problems, the B+Tree index is out!
Hello,B+Tree
In MySQL, different storage engines implement indexes differently. Here we will focus on MyISAM and Innodb.
B+Tree index structure of MyISAM engine
We know that for the MyISAM engine, data files and index files are separate. As can be seen from the figure, after the data is found by the index, the physical address of the data is obtained, and then the records in the data file can be located according to the address. This method is also called nonclustered index.
For the Innodb engine, the data file itself is an index file! To put it colloquially, on the leaf node, MyISAM stores the physical address of the record, while Innodb stores the data content, which is called "clustered index".
Another point to note is that for Innodb, the leaf node of the primary key index stores the data content, while the leaf node of the normal index stores the primary key value! In other words, for the general index field lookup of Innodb, the primary key is first found through the B+Tree of the ordinary index, and then through the B+Tree of the primary key index. As you can see here, for Innodb, the establishment of primary keys is very important!
For MyISAM, the only difference between the primary key index and the ordinary index is that the primary key only needs to find a record to stop, while the ordinary index allows repetition. After finding a record, you need to continue to search, and there is no difference in structure, as shown in the above figure.
Go deep into B+Tree
Ask a few questions:
Why does B+Tree put the real data on the leaf node instead of the inner node?
Why do we say that index fields should be as short as possible, preferably monotonously increasing?
Why is there a leftmost matching principle for composite indexes?
The scope query (>, 26 and sex = 'man',) actually makes use of only the name column of the composite index.
If you want to use the index, you have to be "clean".
What do you mean "clean"? Just don't let the index participate in the calculation! For example, applying a function to an index is likely to cause the index to fail. Why?
In fact, you don't have to think about it. What is stored on B+Tree is data. To compare, you need to apply all the data to functions, which is obviously too expensive.
If you want to build an index, look at the differentiation.
The index is good and cheap, but don't mess with it. Count (distinct col) / count (*) can calculate the degree of discrimination of col, which is obviously 1 for the primary key. If the degree of distinction is too low, you can consider whether it is still necessary to establish an index.
Hash index
This is not to analyze the Hash index in depth, but to show that the idea of Hash is ubiquitous!
In MySQL's Memory storage engine, there is a hash function, give a key, through the hash function to calculate the address, so usually, the hash index lookup will be very fast, O (1) speed. But there are also hash conflicts, which, like HashMap, are resolved in the form of a single linked list.
Think about it, does the hash index support range queries?
Obviously not supported, it can only be found by a KEY. Just like HashMap, will it be quick to find key that contains "zhangfengzhe"?
SQL optimization artifact: explain
There are many scenes optimized by SQL, and there are many skills on the Internet. I can't remember them at all!
To thoroughly solve this problem, I think only the index behind the data structure and principle to do a proper understanding, when writing SQL or SQL slow query, we have the basis to analyze, and then use explain tools to verify, it should not be a problem.
The results of the explain query can tell you which indexes are being used, how the table is scanned, and so on. Here I will demonstrate a Demo.
Data Table student:
Note the composite index (age,address)
Matches the leftmost prefix match
Composite index failure
The above is how to understand MySQL from a programmer's point of view. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.