In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article will explain in detail the example of query performance optimization of MySQL big data. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.
MySQL performance optimization includes table optimization and column type selection. Why can table optimization be subdivided? 1, fixed length and variable length separation; 2, commonly used fields and less commonly used fields should be separated; 3, in one to many, redundant fields should be added to the fields that need associated statistics.
Optimization of tables and selection of column types
Optimization of the table:
1. Separation of fixed length and variable length
For example, id int occupies 4 bytes, and char (4) occupies 4 characters in length, which is also a fixed length. Time means that the number of bytes occupied by each cell is fixed.
The core and commonly used fields should be built with fixed length and put on a table.
The variable-length field of varchar,text,blob is suitable for putting a single table and associating it with the core table with the primary key.
2. Separate the commonly used fields from the less commonly used fields
Need to combine the specific business of the website to analyze, analyze the query scenarios of the fields, and separate the fields with low query frequency.
3. Add redundant fields to the fields that need to be associated with statistics in the case of one to many.
Look at the following effect:
In each section, there are N posts, which show the information of the section and the number of posts under the section on the home page.
How is this done?
If the board table has only the first two columns, you need to remove the block after
Then check the post table, select count (*) from post group by board_id, and get the number of posts in each section.
II. Column type selection
1. Field type priority
Integer > date
Time > enum
Char > varchar > blob,text
Integer: fixed length, no country / region, no character set difference. For example:
Tinyint 1, 2, 3, 4, 5, char (1) a, a,
In terms of space, they all occupy 1 byte, but order by sort, the former is fast. The reason, or you need to consider the character set and proofreading set (that is, collation)
Time has the advantages of fixed length, fast operation and space saving. Considering the time zone, it is not convenient to write sql where > `2018-08-08`
Enum, which can serve the purpose of constraint, is stored internally by integers, but when checked in conjunction with cahr, the internal will undergo the conversion of strings and values.
Char fixed length, considering character set and (sorting) proofreading set
Varchar variable length, to consider the character set conversion and sorting when proofreading set, the speed is slow
Text/blob cannot use temporary tables in memory (sorting and other operations can only be done on disk)
Attachment: with regard to the choice of date/time, the master's clear opinion, directly choose int unsgined not null, store timestamps.
For example:
Gender: take utf8 as an example
Char (1), 3 words long byte
Enum ('male', 'female'); internally converted to digital storage, one more conversion process
Tinyint (), with a fixed length of 1 byte
2. Enough is enough, don't be generous (e. G. smallint varchar (N))
Reason: large bytes waste memory and affect speed.
Take age as an example, tinyint unsigned not null can store 255years old, which is enough. Wasted 3 bytes with int
The contents stored in varchar (10) and varchar (300) are the same, but varchar (300) consumes more memory in table concatenation lookups.
3. Try to avoid using NULL ()
Reason: NULL is not good for indexing and should be marked with special characters.
Actually takes up more space on disk (MySQL5.5 has made improvements to null, but the query is still inconvenient)
III. Index optimization strategy
1. Index type
1.1 B-tree Index
Called btree index, in large terms, all use the balance tree, but the specific implementation, each engine is slightly different, for example, strictly speaking, the NDB engine uses T-tree.
But abstracting the B-tree system, it can be understood as a "sorted fast query structure".
1.2 hash Index
In the memory table, the default is hash index, and the theoretical query time complexity of hash is O (1).
Question: since hash lookups are so efficient, why not all use hash indexes?
Answer:
1. The result calculated by the hash function is random. If the data is placed on the disk, taking the primary key as id as an example, then with the growth of id, the corresponding rows of id are randomly placed on the disk.
2. The range query cannot be optimized.
3, can not use prefix index, for example, in btree, field column value "helloworld", and indexed query x=helloworld can naturally use the index, x=hello can also use the index (left prefix index).
4. Sorting can not be optimized.
5. You must return to the row, that is, to get the data location through the index, and you must go back to the table to get the data.
2. Common misunderstandings of btree index.
2.1 Index columns that are commonly used in where conditions, such as:
Where cat_id = 3 and price > 100; query the third column, goods over 100 yuan.
Myth: indexes are added on both cat_id and price.
Wrong: only cat_id or price indexes can be used because they are independent indexes and only one can be used at the same time.
2.2 after indexing on multiple columns (federated index), the index will play a role in querying which column.
Misunderstanding: on multi-column indexes, the index plays a role and needs to meet the requirements of the left prefix.
Take index as an example, (note that it is related to order)
IV. Index experiment
For example: select * from T4 where c1143 and c2 = 4 and c4 > 5 and c3q2
Which indexes are used:
Explain select * from t4 where c1q 3 and c2 = 4 and c4 > 5 and c3pm 2\ G
As follows:
Note: (key_len: 4)
Clustered index and non-clustered index
Similarities and differences between Myisam and innodb engine and index files
Myisam: consists of two files, news.myd and new.myi, the index file and the data file are separated, which is called non-clustered index. Both the primary and secondary indexes point to the physical row (location of the disk)
Innodb: indexes and data are clustered together, so they are clustered indexes. That line of data is stored directly on the primary index file of innodb, and the secondary index points to a reference to the primary key index.
Note: for innodb:
1. The primary key index not only stores index values, but also stores row data in the leaves.
2. If there is no primary key (primary key), unique key will be used as the primary key.
3. If there is no unique, the system generates an internal rowid as the primary key.
4. In innodb, the index structure of primary key stores both primary key values and row data. This structure is called clustered index.
Clustering index
Advantage: when there are few entries queried according to the primary key, there is no need to return the line (the data is under the primary key node)
Disadvantages: frequent page splits if irregular data insertion is encountered
This is the end of this article on "the example of MySQL big data query performance optimization". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.