In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
A reference for optimizing MySQL. Transferred from the network.
[@ more@] Optimization is a complex task because it ultimately requires an understanding of the entire system. When it is possible to do some local optimization with a little knowledge of your system / application, the more you want to optimize your system, the more you must know about it.
Therefore, this chapter attempts to explain and give some examples of different ways to optimize MySQL. But keep in mind that there are always some (increasingly difficult) faster ways to save the system for doing.
10.1 Overview of Optimization
The most important part of making a system faster is, of course, basic design. You also need to know that your system is going to do something like this, and that's your bottleneck.
The most common bottlenecks are:
Disk seek. Disks take time to find data, and the average time of modern disks used in 1999 is usually less than 10ms, so in theory we can make about 1000 searches a second. This time is slow to improve with the new disk and it is difficult to optimize a table. The way to optimize it is to spread the data across multiple disks.
When the disk is in the right place where we need to read data, the disk reads / writes. In the modern era of 1999, a disk transfer is similar to 10-20Mb/s. This must be easier to optimize because you can read from multiple disks in parallel.
CPU cycle. When we read the data into memory (or if it is already there) we need to process it to achieve our results. This is the most common constraint when we have tables with relatively small memory, but speed with small tables is usually not a problem.
Memory bandwidth. Cache bandwidth becomes a memory bottleneck when CPU requires more data than is suitable for cpu caching. This is an unusual bottleneck for most systems, but you should know it.
10.2 Adjustment of system / compile time and startup parameters
We start with something at the system level, because some of these decisions are made early. In other cases, a quick scan of this section may be enough, because it's not important for big gains, but it's always good to have a sense of how much you get at this level.
The default OS you use is really important! To maximize the use of multiple CPU, you should use Solaris (because threads work really well) or Linux (because of the 2.2 core and really good SMP support). And on 32-bit machines, Linux has a default file size limit of 2G. It is hoped that this will be corrected soon when the new file system is released (XFS).
Since we don't run production MySQL on many platforms, we advise you to test the platform you plan to run before you may choose it.
Other suggestions:
If you have enough RAM, you can delete all switching devices. Some operating systems will use a SWAP device in some cases, even if you have free memory.
Use the MySQL option of-- skip-locking to avoid external locking. Note that this will not affect MySQL functionality, as long as it runs on only one server. Just remember to stop the server (or lock the relevant parts) before you run myisamchk. This switch is mandatory on some systems because external locking does not work in all cases. When compiling with MIT-pthreads, the-- skip-locking option defaults to on because flock () is not fully supported by MIT-pthreads on all platforms. The only situation is that if you are running a MySQL server (not a client) on the same data, you cannot use-- skip-locking, otherwise run myisamchk on tables that do not flushing or lock the mysqld server first. You can still use LOCK TABLES/ UNLOCK TABLES, even if you are using-- skip-locking.
10.2.1 how compilation and linking affect the speed of MySQL
Most of the following tests are done on Linux and with MySQL benchmarks, but they should give some instructions to other operating systems and workloads.
When you use the-static link, you get the fastest executable. Using Unix sockets instead of TCP/IP to connect to a database can also give better performance.
On Linux, when compiling with pgcc and-O6, you will get the fastest code. In order to compile "sql_yacc.cc" with these options, you need about 200m of memory, because gcc/pgcc needs a lot of memory to inline all functions. When configuring MySQL, you should also set CXX=gcc to avoid including the libstdc++ library (it is not required).
You can get a 10-30% acceleration in your application only by using a better compiler or a better compiler option. This is especially important if you compile the SQL server yourself!
On Intel, you should use the pgcc or Cygnus CodeFusion compiler for maximum speed, for example. We have tested the new Fujitsu compiler, but it is not error-free enough to optimize the compilation of MySQL.
Here are some of the measurements we have made:
If you use pgcc with-O6 and compile anything, the mysqld server is 11% faster than using gcc (using the string 99 version).
If you link dynamically (without-static), the result is 13% slower. Note that you can still use a dynamically linked MySQL library. Only the server is critical to performance.
If you use TCP/IP instead of Unix sockets, the result is 7.5% slower.
On a Sun SPARCstation 10, gcc2.7.3 is 13% faster than Sun Pro C++ 4.2.
On Solaris 2.5.1, MIT-pthreads is 8-12% slower than Solaris with native threads on a single processor. With more load / cpus, the difference should become greater.
The distribution of MySQL-Linux provided by TcX is compiled with pgcc and statically linked.
10.2.2 disk issu
As mentioned earlier, disk seeking is a big performance bottleneck. This problem becomes more and more obvious when the data starts to grow so that caching becomes impossible. For large databases, where you have more or less random access to data, you can rely on that you will need at least one disk seek to read and write several times. To minimize this problem, use a disk with low seek time.
To increase the number of available spindles (and thus reduce seek overhead), it is possible to symbolically join files to different disks or split disks.
Use symbolic connections
This means that you link index / data file symbols from normal data directories to other disks (which can also be segmented). This makes seek and read times better (if the disk is not used for other things). See 10.2.2.1 using symbolic links for databases and tables.
Split up
Splitting means that you have many disks and put the first on the first, the second on the second, and the nth on the (n mod number_of_disks) disk, and so on. This means that if your normal data is smaller than the split size (or perfectly arranged), you will get better performance. Note whether the segmentation depends heavily on the OS and the size of the split. So test your application with different partition sizes. See 10.8 use your own benchmark. Note that the difference in the speed of segmentation depends on the parameters, depending on how you split the parameters and the number of disks, you can get a quantitative difference. Note that you must choose to optimize for random or sequential access.
To be reliable, you may want to use attack RAID 0room1 (split + Mirror), but in this case, you will need 2N drives to hold N drives of data. If you have money, this may be the best choice! However, you may also have to invest in some volume management software investment to handle it efficiently.
A good choice is to have the more important data (which can be regenerated) be stored on the RAID 0 disk, and the really important data (such as host information and log files) on a RAID 0room1 or RAID N disk. If you have a lot of writes because of newer parity bits, RAID N may be a problem.
You can also set parameters for the file system used by the database. An easy change is to mount the file system with the noatime option. This is the last access time for it to skip updates in inode, and this will avoid some disk seek.
10.2.2.1 use symbolic links for databases and tables
You can move tables and databases from the database directory elsewhere and replace them with symbols linked to the new location. You may want to do this, for example, by moving a database to a file system with more free space.
If MySQL notices that a table is a symbolic link, it parses the symbolic link and uses the table it actually points to, and it works on all systems that support realpath () calls (at least Linux and Solaris support realpath ())! On systems that do not support realpath (), you should not access the table through real paths and symbolic links at the same time! If you do this, the table will be inconsistent after any update.
MySQL does not support database links by default. As long as you don't make a symbolic link between databases, everything will work fine. Suppose you have a database db1 under the MySQL data directory and make a symbolic link db2 to point to db1:
Shell > cd / path/to/datadir
Shell > ln-s db1 db2
Now, for any table TBL _ an in db1, there seems to be a table tbl_a in the db2 species. If one thread updates db1.tbl_a and another thread updates db2.tbl_a, there will be a problem.
If you really need this, you must change the following code in "mysys/mf_format.c":
If (! lstat (to,&stat_buff)) / * Check if it's a symbolic link * /
If (S_ISLNK (stat_buff.st_mode) & & realpath (to,buff))
Change the code like this:
If (realpath (to,buff))
10.2.3 adjust server parameters
You can get the default buffer size of the mysqld server with this command:
Shell > mysqld-- help
This command generates a table of all mysqld options and configurable variables. The output includes default values and looks like something like this:
Possible variables for option-set-variable (- O) are:
Back_log current value: 5
Connect_timeout current value: 5
Delayed_insert_timeout current value: 300
Delayed_insert_limit current value: 100
Delayed_queue_size current value: 1000
Flush_time current value: 0
Interactive_timeout current value: 28800
Join_buffer_size current value: 131072
Key_buffer_size current value: 1048540
Lower_case_table_names current value: 0
Long_query_time current value: 10
Max_allowed_packet current value: 1048576
Max_connections current value: 100
Max_connect_errors current value: 10
Max_delayed_threads current value: 20
Max_heap_table_size current value: 16777216
Max_join_size current value: 4294967295
Max_sort_length current value: 1024
Max_tmp_tables current value: 32
Max_write_lock_count current value: 4294967295
Net_buffer_length current value: 16384
Query_buffer_size current value: 0
Record_buffer current value: 131072
Sort_buffer current value: 2097116
Table_cache current value: 64
Thread_concurrency current value: 10
Tmp_table_size current value: 1048576
Thread_stack current value: 131072
Wait_timeout current value: 28800
If there is a mysqld server running, you can see the value of the variable it actually uses by executing this command:
Shell > mysqladmin variables
Each option is described below. For buffer size, length, and stack size values given in bytes, you can use the suffix "K" or "M" to indicate that the values are displayed in K bytes or megabytes. For example, 16m indicates 16 megabytes. The case of the suffix letters does not matter; 16m and 16m are the same.
You can also use the command SHOW STATUS to see some statistics from a running server. See 7.21 SHOW syntax (get information about tables and columns).
Back_log
The number of connections required for MySQL. This works when the main MySQL thread gets a lot of connection requests in a very short period of time, and then the main thread takes some (albeit short) time to check the connection and start a new thread. The back_ log value indicates how many requests can be stored on the stack in a short period of time before MySQL temporarily stops answering new requests. Only if you expect to have many connections in a short period of time, you need to increase it, in other words, this value is the size of the listening queue for incoming TCP/IP connections. Your operating system has its own limits on the queue size. The man page for Unix listen (2) system calls should have more details. Check your OS document to find the maximum value of this variable. Trying to set the limit that back_log is higher than your operating system will not be valid.
Connect_timeout
The number of seconds that the mysqld server is waiting for a connection message before replying with Bad handshake (bad handshake).
Delayed_insert_timeout
The amount of time an INSERT DELAYED thread should wait for an INSERT statement before terminating.
Delayed_insert_limit
After inserting the delayed_insert_limit row, the INSERT DELAYED processor checks to see if any SELECT statements are not executed. If so, execute allow these statements before continuing.
Delayed_queue_size
How large a queue (in rows) should be allocated for processing INSERT DELAYED. If the queue is full, any customer doing INSERT DELAYED will wait until the queue is available again.
Flush_time
If this is set to a non-zero value, all tables will be closed every flush_time second (to free resources and sync to disk).
Interactive_timeout
The number of seconds the server waits for action on an interactive connection before shutting it down. An interactive customer is defined as a customer who uses the CLIENT_INTERACTIVE option for mysql_real_connect (). Wait_timeout can also be seen.
Join_buffer_size
The buffer size for all joins (join) (not indexed joins). The buffer allocates a buffer for each join between the two tables, and when it is not possible to increase the index, increasing this value results in a faster full join. (usually the best way to get a quick join is to add an index. )
Key_buffer_size
The index block is buffered and shared by all threads. Key_buffer_size is the buffer size for index blocks, increasing it to get better-handled indexes (for all reads and multiple overrides) as much as you can afford. If you make it too big, the system will start to change pages and really slow down. Remember that since MySQL does not cache read data, you will have to leave some space for the OS file system cache. To get more speed when writing multiple rows, use LOCK TABLES. See 7.24LOCK TABLES/UNLOCK TABLES syntax.
Long_query_time
If a query takes longer than it (in seconds), the Slow_queries counter will be added.
Max_allowed_packet
The maximum size of a bag. The message buffer is initialized to net_buffer_length bytes, but can be increased to max_allowed_packet bytes if needed. By default, this value is too small to catch large (possibly incorrect) packets. If you are using a large BLOB column, you must increase this value. It should be as big as the largest BLOB you want to use.
Max_connections
The number of simultaneous customers allowed. Increasing this value increases the number of file descriptors required by mysqld. See the comments below for file descriptor restrictions. See 18.2.4 Too many connections error.
Max_connect_errors
If there are more than that number of broken connections from one host, the host blocks further connections. You can use the FLUSH HOSTS command to dredge a mainframe.
Max_delayed_threads
Do not start a thread with more than this number to process the INSERT DELAYED statement. If you try to insert data into a new table after all INSERT DELAYED threads are in use, the row will be inserted as if the DELAYED attribute was not specified.
Max_join_size
Joins that may be reading more than max_join_size records will return an error. If your user wants to execute a join that does not have a WHERE clause, takes a long time and returns millions of rows, set it.
Max_sort_length
The number of bytes used when sorting BLOB or text values (only the first max_sort_length bytes of each value are used; the rest are ignored).
Max_tmp_tables
The choice is not to do anything yet. A customer can keep the maximum number of temporary tables open at the same time.
Net_buffer_length
The communication buffer is reset to that size between queries. Usually this should not be changed, but if you have very little memory, you can set it to the desired size of the query. (that is, the expected length of the SQL statement issued by the customer. If the statement exceeds this length, the buffer is automatically expanded to max_allowed_packet bytes. )
Record_buffer
Each thread that performs a sequential scan allocates a buffer of this size to each table it scans. If you do a lot of sequential scans, you may want to increase this value.
Sort_buffer
Each thread that needs to sort allocates a buffer of that size. Increase this value to speed up ORDER BY or GROUP BY operations. See 18.5 where MySQL stores temporary files.
Table_cache
The number of open tables for all threads. Increasing this value increases the number of file descriptors required by mysqld. MySQL requires 2 file descriptors for each unique open table, as noted below for file descriptor restrictions. For information about how table caching works, see 10.2.4 MySQL how to open and close tables.
Tmp_table_size
If a temporary table exceeds this size, MySQL generates an error in the form of The table tbl_name is full, and if you do many advanced GROUP BY queries, increase the tmp_table_ size value.
Thread_stack
Stack size of each thread. Many of the limitations detected by the crash-me test depend on this value. The general operation of the default team is large enough. See 10.8 use your own benchmark.
Wait_timeout
The number of seconds the server waits for action on a connection before shutting it down. Interactive_timeout can also be seen.
MySQL is a very scalable algorithm, so you can usually run it with less memory or give MySQL more memory for better performance.
If you have a lot of memory and a lot of tables and a medium number of customers who want maximum performance, you should have something like this:
Shell > safe_mysqld-O key_buffer=16M-O table_cache=128
-O sort_buffer=4M-O record_buffer=1M &
If you have less memory and a lot of connections, use something like this:
Shell > safe_mysqld-O key_buffer=512k-O sort_buffer=100k
-O record_buffer=100k &
Or even:
Shell > safe_mysqld-O key_buffer=512k-O sort_buffer=16k
-O table_cache=32-O record_buffer=8k-O net_buffer=1K &
If there are many connections, "swap problems" can occur unless mysqld has been configured to use very little memory per connection. Of course, mysqld performs better if you have enough memory for all connections.
Note that if you change an option for mysqld, it actually only remains for the server example.
To understand the effect of a parameter change, do this:
Shell > mysqld-O key_buffer=32m-- help
Make sure-- the help option is the last; otherwise, the effect of any options listed after it on the command line will not be reflected in the output.
10.2.4 how MySQL opens and closes database tables
Table_cache, max_connections, and max_tmp_tables affect the maximum number of files that the server keeps open. If you increase one or two of these values, you can encounter limits imposed on the number of file descriptors that can be opened by each process in your operating system. However, you can add this limit on many systems. Consult your OS documentation to find out how to do this, because the ways to change limits vary greatly from system to system.
Table_cache is related to max_connections. For example, for 200 open connections, you should buffer a table to at least 200 * n, where n is the maximum number of tables in a join.
The cache that opens the table can be increased to a maximum of table_cache (the default is 64; this can be changed with the-O table_cache=# option of mysqld). A table is never closed unless the cache is full and another thread tries to open a table or if you use mysqladmin refresh or mysqladmin flush-tables.
When the table cache is full, the server uses the following procedure to find a cache entry to use:
Tables that are not currently in use are released in LRU order.
If the cache is full and there are no tables to release, but a new table needs to be opened, the cache must be temporarily expanded.
If the cache is in a temporarily enlarged state and a table changes from being in use to not in use, it is closed and released from the cache.
Open a table for each concurrent access. This means that if you let two threads access the same table or access the table twice in the same query (using AS), the table needs to be opened twice. The first opening of any table accounts for two file descriptors; each additional use of the table accounts for only one file descriptor. The extra descriptor that is opened for the first time is used for the index file; this descriptor is shared among all threads.
10.2.5 disadvantages of creating a large number of database tables in the same database
If you have many files in a directory, opening, closing, and creating operations will be slow. If you execute SELECT statements on many different tables, there will be a little overhead when the table cache is full, because for each table that must be opened, the other must be closed. You can reduce this overhead by making the table buffer larger.
10.2.6 Why are there so many open tables?
When you run mysqladmin status, you will see something like this:
Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12
If you only have six watches, this may be a little confusing.
MySQL is multithreaded, so it can have many queries on the same table at the same time. In order to minimize the problem that two threads have different states on the same file, the table is opened independently by each concurrent process. This consumes some memory and an additional file descriptor for the data file. Index file descriptors are shared among all threads.
10.2.7 how MySQL uses memory
The following table shows some of the ways in which mysqld servers use memory. Where it is applied, give the name of the server variable related to the use of memory.
The keyword buffer (variable key_buffer_size) is shared by all threads; other buffers used by the server are allocated when needed. See 10.2.3 to adjust server parameters.
Each connection uses some thread-specific space; a stack (default 64K, variable thread_stack), a connection buffer (variable net_buffer_length), and a result buffer (variable net_buffer_length). The connection buffer and the result buffer are dynamically expanded to max_allowed_packet when needed. When a query is running a copy of the current query, the string is also assigned.
All threads share the same base memory.
Nothing is memory-mapped yet (except for compressed tables, but that's another story). This is because 4GB's 32-bit memory space is not large enough for the largest database. When a 64-bit addressing space system becomes more common, we can add full support for memory mapping.
Each request that does a sequential scan is allocated a read buffer (variable record_buffer).
All joins are done at once and most joins can be done without even a temporary table. The most temporary tables are memory-based (HEAP) tables. Temporary tables with large record lengths (calculated by the sum of the lengths of all columns) or tables containing BLOB columns are stored on disk. A problem before MySQL version 3.23.2 was that if a HEAP table exceeded the size of the tmp_table_size, you got the wrong The table tbl_name is full. In newer versions, this is handled by automatically converting an in-memory (HEAP) table to a disk-based (MyISAM) table if necessary. To solve this problem, you can increase the size of the temporary table by setting the tmp_table_size option for mysqld or by setting the SQL_BIG_TABLES option for SQL in the client program. See 7.25 SET OPTION syntax. In MySQL 3.20, the maximum size of a temporary table is record_buffer*16, so if you are using this version, you must increase the record_ buffer value. You can also use the-- big-tables option to launch mysqld to always store temporary tables on disk, however, this will affect the speed of many complex queries.
Most sorting requests allocate a sort buffer and one or two temporary files. See 18.5 where MySQL stores temporary files.
Almost all syntax parsing and computation are done in a local memory. There is no memory overhead for small projects and general slow memory allocation and release are avoided. Memory is allocated only for unexpectedly large strings (this is done with malloc () and free ()).
Each index file is opened only once, and the data file is opened once for each thread running concurrently. For each concurrent thread, a table structure, a column structure for each column, and a buffer of size 3 * n are allocated (where n is the maximum row length, excluding BLOB columns). A BLOB uses 5-8 bytes plus BLOB data.
For each table with a BLOB column, a buffer is dynamically enlarged to read a larger blob value. If you scan a table, allocate a buffer as large as the maximum blob value.
The table processor for all tables in use is stored in a cache and managed as a FIFO. Typically, the cache has 64 entries. If a table is used by two running threads at the same time, the cache contains two entries for this. See 10.2.4 how MySQL opens and closes database tables.
A mysqladmin flush-tables command closes all tables that are not in use and marks that all tables in use are ready to be closed at the end of the currently executing thread. This will effectively free up most of the memory in use.
Ps and other system state programs can report that mysqld uses a lot of memory. This can be caused by thread stacks at different memory addresses. For example, the Solaris version of ps counts unused memory between stacks as used memory. You can verify it by checking the available swap area with swap-s. We tested mysqld with a commercial memory vulnerability profiler, so there should be no memory vulnerabilities.
10.2.8 how does MySQL lock database tables
All locks in MySQL will not be deadlocked. This is managed by always requesting all necessary locks immediately before a query and always locking the tables in the same order.
The locking method used with WRITE,MySQL works as follows:
If there is no lock on the watch, put a lock on it.
Otherwise, the lock request is placed in the write lock queue.
The locking method used with READ,MySQL works as follows:
If there is no write lock on the table, put a read lock on it.
Otherwise, the lock request is placed in the read lock queue.
When a lock is released, the lock can be obtained by the thread in the write lock queue, followed by the thread in the read lock queue.
This means that if you have many changes on a table, the SELECT statement will wait until there are no more changes.
To solve the problem of many INSERT and SELECT operations in a table, you can insert rows in a temporary table and occasionally update the real table with records from the temporary table.
This can be done with the following code:
Mysql > LOCK TABLES real_table WRITE, insert_table WRITE
Mysql > insert into real_table select * from insert_table
Mysql > delete from insert_table
Mysql > UNLOCK TABLES
If you prioritize retrieval under certain circumstances, you can use the INSERT of the LOW_PRIORITY option. See 7.14 INSERT syntax.
You can also change the lock code in "mysys/thr_lock.c" to use a single queue. In this case, write locking and read locking will have the same priority, which may help some applications.
10.2.9 problems with database table-level locking
MySQL's table locking code is not deadlocked.
MySQL uses table-level locking (rather than row-level locking or column-level locking) to achieve high locking speed. For large tables, table-level locking is better for most applications than row-level locking, but of course there are some drawbacks.
In MySQL3.23.7 and later, a person can insert a row into an MyISAM table while other threads are reading the table. Note that currently it works only if there are deleted rows in the table.
Table-level locking enables many threads to read a table at the same time, but if a thread wants to write a table, it must first gain exclusive access. During the change, all other threads that want to access that particular table wait until the change is ready.
Because database changes are generally considered more important than SELECT, updating all statements in a table takes precedence over statements that retrieve information from a table. This should ensure that the changes are not "starved to death", because a person will issue a lot of onerous queries against a particular table.
Starting with MySQL 3.23.7, one can use the max_write_lock_count variable to force MySQL to issue a SELECT after a specific number of inserts on a table.
A major problem with this is as follows:
A customer sends out a SELECT that takes a long time to run.
The other customer then issues a UPDATE; on a table that is used and the customer will wait until the SELECT is complete.
Another customer issues another SELECT statement on the same table; because UPDATE has a higher priority than SELECT, the SELECT will wait for UPDATE to complete. It will also wait for the first SELECT to complete!
Some possible solutions to this problem are:
Try to make SELECT statements run faster; you may have to create some summary tables to do this.
Start mysqld with-- low-priority-updates. This will give all statements that update (modify) a table a lower priority than the SELECT statement. In this case, the last SELECT statement in the previous case will be executed before the INSERT statement.
You can use the LOW_PRIORITY attribute to give lower priority to a specific INSERT, UPDATE, or DELETE statement.
Specify a low value for max_write_lock_count to start mysqld so that READ locks are given after a certain number of WRITE locks.
By using the SQL command: SET SQL_LOW_PRIORITY_UPDATES=1, you can specify from a specific thread that all changes should be done with a low priority. See 7.25 SET OPTION syntax.
You can use the HIGH_PRIORITY attribute to indicate that a particular SELECT is important. See 7.12 SELECT syntax.
If you have questions about combining INSERT with SELECT, switch to using the new MyISAM tables because they support concurrent SELECT and INSERT.
If you mainly mix INSERT and SELECT statements, the INSERT of the DELAYED attribute will probably solve your problem. See 7.14 INSERT syntax.
If you have questions about SELECT and DELETE, the DELETE of the LIMIT option can help you. See 7.11 DELETE syntax.
10.3 make your data as small as possible
One of the most basic optimizations is to make your data (and index) take up as little space as possible on disk (and in memory). This can make a huge improvement because the disk reads faster and usually uses less main memory. If you index on smaller columns, the index also takes up fewer resources.
You can use the following techniques to improve table performance and minimize storage space:
Use the most efficient (smallest) type as much as possible. MySQL has many specialization types that save disk space and memory.
If possible to make the table smaller, use a smaller integer type. For example, MEDIUMINT is often better than INT.
If possible, declare as NOT NULL. It makes everything faster and you save one for each column. Note that if you do need NULL in your application, you should no doubt use it, but avoid having it on all columns by default.
If you don't have any variable length columns (VARCHAR, TEXT, or BLOB columns), use a fixed size record format. This is faster, but unfortunately it may waste some space. See 10.6. Select a table type.
Each table should have a primary index as short as possible. This makes the identification of a line easy and effective.
For each table, you must decide which storage / indexing method to use. See 9.4 MySQL table type. You can also see 10.6 to select a table type.
Create only the indexes you really need. Indexing is good for retrieval, but it gets worse when you need to store things quickly. If you access a table mainly by searching for a combination of columns, use them as an index. The first index section should be the most commonly used column. If you always use many columns, you should first use columns with more copies to get better column index compression.
If it is possible that an index has a unique prefix on the first few characters, it is better to index that prefix only. MySQL supports indexes on part of a character column. Shorter indexes are faster, not only because they take up less disk space, but also because they will give you more hits in the index cache and therefore less disk seek. See 10.2.3 to adjust server parameters.
In some cases, it is beneficial to split a table that is often scanned into two tables. Especially if it is a table in dynamic format and it may make a table in a smaller static format that can be used to find related rows after scanning.
10.4 use of MySQL index
The index is used to quickly find rows with a specific value on a column. Without an index, MySQL has to start with the first record and then read the entire table until it finds the relevant row. The bigger the watch, the more time it takes. If the table has an index for the column of the query, MySQL can quickly reach a location to find the middle of the data file, and there is no need to consider all the data. If a table has 1000 rows, it is at least 100 times faster than sequential reads. Note that you need to access almost all 1000 rows, it is faster to read sequentially, because we avoid disk seek at this time.
All MySQL indexes (PRIMARY, UNIQUE, and INDEX) are stored in the B-tree. Strings automatically compress prefixes and ending spaces. See 7.27 CREATE INDEX syntax.
The index is used to:
Quickly find lines that match a WHERE clause.
When performing a join, retrieve the row from another table.
Find the MAX () or MIN () value for a specific index column.
If sorting or grouping occurs on the leftmost prefix of an available key (for example, ORDER BY key_part_1,key_part_2), sort or group a table. If all key value parts follow DESC, the key is read in reverse order.
In some cases, a query can be optimized to retrieve values without consulting data files. If all columns used for some tables are numeric and form the leftmost prefix of some keys, values can be retrieved from the index tree for faster.
Suppose you issue the following SELECT statement:
Mysql > SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2
If a multi-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-row indexes exist on col1 and col2, the optimizer tries to find more restrictive indexes and use that index to fetch rows by determining which index will find fewer rows.
If the table has a multi-column index, any leftmost index prefix can be used by the optimizer to find the row. For example, if you have a 3-row index (col1,col2,col3), you have indexed the search capabilities on (col1), (col1,col2), and (col1,col2,col3).
MySQL cannot use a partial index if the column does not constitute the leftmost prefix of the index. Suppose you show the following SELECT statement:
Mysql > SELECT * FROM tbl_name WHERE col1=val1
Mysql > SELECT * FROM tbl_name WHERE col2=val2
Mysql > SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3
If an index exists on (col1, col2, col3), only the first query shown above uses the index. The second and third queries do contain the columns of the index, but (col2) and (col2, col3) are not the leftmost prefixes (col1, col2, col3).
If the LIKE parameter is a constant string that does not start with a wildcard character, MySQL also uses an index for LIKE comparisons. For example, the following SELECT statement uses an index:
Mysql > select * from tbl_name where key_col LIKE "Patrick%"
Mysql > select * from tbl_name where key_col LIKE "Pat%_ck%"
In the first statement, only "Patrick" select * from tbl_name where key_col LIKE other_col is considered
In the first statement, the like value begins with a wildcard character. In the second statement, the like value is not a constant.
If column_name is an index, searches using column_name IS NULL will use the index.
MySQL usually uses an index that finds the minimum number of rows. An index is used for the column in which you compare with the following operators: =, >, > =, select benchmark (1000000dlg1)
+-- +
| | benchmark (1000000 no. 1) | |
+-- +
| | 0 |
+-- +
1 row in set (0.32 sec)
It shows that MySQL can execute 1000000 + expressions on PentiumII 400MHz in 0.32 seconds.
All MySQL functions should be highly optimized, but there may be some exceptions and benchmark (loop_count,expression) is an excellent tool to find out if there is something wrong with your query.
10.5.1 estimate query performance
In most cases, you can estimate performance by calculating disk seek. For small tables, you can usually find rows in 1 disk seek (because this index may be buffered). For a larger table, you can estimate it (using breadth + tree index), and you will need: log (row_count) / log (index_block_length/3*2/ (index_length + data_pointer_length)) + 1 seek to find rows.
In MySQL, the index block is usually 1024 bytes and the data pointer is usually 4 bytes. This pair of tables with 500000 rows with an index length of 3 (medium integers) gives you: log (500000) / log (1024 Universe 2 / (3 seek 4)) + 1 = 4 seek.
An index like the one above will require about 500000 * 7 * 3 OS 2 = 5.2m, (assuming the index buffer is filled to 2max 3 (it is typical), you will probably have most of the index in memory and you will probably only need 1-2 calls to read data from the row to find out.
For writing, however, you will need 4 seek requests (as above) to find out where to store the new index and usually need 2 seek to update the index and write rows.
Note that the above does not mean that your application will slowly degenerate with N log N! As the table becomes larger, as long as everything is buffered by an OS or SQL server, things will only be more or less slower. After the data becomes too large to be buffered, things will start to get slower until your application is only limited by disk seek (it increases in N log N). To avoid this increase, the index buffer increases as the data increases. See 10.2.3 to adjust server parameters.
10.5.2 Speed of SELECT queries
In general, when you want to make a slower SELECT. WHERE is faster, and the first thing to check is whether you can add an index. See 10.4 the use of MySQL indexes. All references between different tables should usually be done with an index. You can use EXPLAIN to determine which index is used for a SELECT statement. See 7.22 EXPLAIN syntax (get information about a SELECT).
Some general suggestions:
To help MySQL better optimize the query, run myisamchk-analyze on a table after it has loaded the relevant data. This updates a value for each, indicating the average number of rows with the same value (of course, for a unique index, this is always 1. )
To sort an index and data according to an index, use myisamchk-- sort-index-- sort-records=1 (if you want to sort on index 1). If you have a unique index, you want to read all records according to the order of the index, which is a good way to make it faster. Note, however, that this sort is not best written, and it will take a long time for a large table!
10.5.3 how MySQL optimizes the WHERE clause
Where optimizations are placed in SELECT because they are mainly used there, but the same optimizations are used in DELETE and UPDATE statements.
Also note that this section is incomplete. MySQL does make a lot of optimizations and we don't have time to document them all.
Some of the optimizations implemented by MySQL are listed below:
Remove unnecessary parentheses:
((an AND B) AND c OR ((an AND B) AND (c AND d)
-> (an AND b AND c) OR (an AND b AND c AND d)
Constant call:
(a)
-> b > 5 AND baccalaurec AND astat5
Delete constant condition (required due to constant call):
(B > = 5 AND AND 5) OR (Blood6 AND 5) OR (Bath7 AND 5)
-> Battle5 OR Bron6
The constant expression used by the index is evaluated only once.
COUNT (*) without a WHERE on a single table retrieves information directly from the table. When using only one table, do the same for any NOT NULL expression.
Early detection of invalid constant expressions. It is not possible for MySQL to quickly detect certain SELECT statements and does not return rows.
If you don't use GROUP BY or grouping functions (COUNT (), MIN ()...) HAVING merged with WHERE.
For each subjoin (sub join), construct a simpler WHERE to get a faster WHERE calculation and skip records as soon as possible.
The table of all constants is read out first before any other table in the query. A table of constants is:
An empty table or a table with 1 row.
A table used with an UNIQUE index, or a WHERE clause of a PRIMARY KEY, where all index parts use a constant expression and the index part is defined as NOT NULL.
All the following tables are used as constant tables:
Mysql > SELECT * FROM t WHERE primary_key=1
Mysql > SELECT * FROM T1 and T2
WHERE t1.primary_key=1 AND t2.primary_key=t1.id
The best join combination for a join table is to find it by trying all the possibilities. If all the columns in ORDER BY and GROUP BY come from the same table, then when clean, the table is selected first.
Create a temporary table if you have an ORDER BY clause and a different GROUP BY clause, or if ORDER BY or GROUP BY contains columns that are not from other tables in the join queue.
If you use SQL_SMALL_RESULT,MySQL, you will use a table in memory.
Because DISTINCT is transformed to a combination of GROUP BY,DISTINCT and ORDER BY on all columns, a temporary table will also be required in many cases.
The index of each table is queried and uses an index that spans less than 30% of the rows. If such an index cannot be found, use a quick table scan.
In some cases, MySQL can read rows from the index without even consulting data files. If all the columns used by the index are numeric, only the index tree is used to answer the query.
Lines that do not match the HAVING clause are skipped before each record is output.
Here are some quick examples of queries:
Mysql > SELECT COUNT (*) FROM tbl_name
Mysql > SELECT MIN (key_part1), MAX (key_part1) FROM tbl_name
Mysql > SELECT MAX (key_part2) FROM tbl_name
WHERE key_part_1=constant
Mysql > SELECT... FROM tbl_name
ORDER BY key_part1,key_part2,... LIMIT 10
Mysql > SELECT... FROM tbl_name
ORDER BY key_part1 DESC,key_part2 DESC,... LIMIT 10
The following query can be solved using only the index tree (assuming the index column is numeric):
Mysql > SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val
Mysql > SELECT COUNT (*) FROM tbl_name
WHERE key_part1=val1 AND key_part2=val2
Mysql > SELECT key_part2 FROM tbl_name GROUP BY key_part1
The following queries are retrieved in sort order using an index, without another sort:
Mysql > SELECT... FROM tbl_name ORDER BY key_part1,key_part2,...
Mysql > SELECT... FROM tbl_name ORDER BY key_part1 DESC,key_part2 DESC,...
10.5.4 how MySQL optimizes LEFT JOIN
In MySQL, the A LEFT JOIN B implementation is as follows:
Table B is set to depend on Table A.
Table An is set to depend on all tables used in the LEFT JOIN condition (except B).
All LEFT JOIN conditions are moved to the WHERE clause.
Perform all standard join optimizations, except that a table is always read after all the tables it depends on. If there is a circular dependency, MySQL will issue an error.
Perform all standard WHERE optimizations.
If there is a row in A that matches the WHERE clause, but no row in B matches the LEFT JOIN condition, then generate a row in B with all columns set to NULL.
If you use LEFT JOIN to find rows that do not exist in some tables and in the WHERE section you have the following tests: column_name IS NULL, where column_name is declared as a column of NOT NULL, then MySQL will stop looking after more rows (for a specific key combination) after it has found a row that matches the LEFT JOIN condition.
10.5.5 how MySQL optimizes LIMIT
In some cases, when you use LIMIT # instead of HAVING, MySQL will process the query differently.
If you use LIMIT to select only a few rows, when MySQL generally prefers to do a full table scan, it will use indexes in some cases.
If you use LIMIT # and ORDER BY,MySQL once the first # row is found, the sort will end instead of sorting the entire table.
When combining LIMIT # and DISTINCT, once MySQL finds # unique rows, it will stop
In some cases, a GROUP BY can be solved by sequentially reading the key (or sorting on the key) and then calculating the summary until the key value changes. In this case, LIMIT # will not calculate any unnecessary GROUP.
As long as MySQL has sent the first # line to the customer, it will discard the query.
LIMIT 0 will always quickly return an empty collection. This is useful for checking the query and getting the column type of the result column.
The size of the temporary table uses LIMIT # to calculate how much space is needed to resolve the query.
10.5.6 Speed of INSERT queries
The time to insert a record consists of the following:
Connection: (3)
Send the query to the server: (2)
Analysis query: (2)
Insert record: (1 x record size)
Insert index: (1 x index)
Close: (1)
The figures here are somewhat proportional to the overall time. This does not take into account the initial overhead of opening the table (it is done once for each query that runs concurrently).
The size of the table slows down index insertion at the speed of N log N (B-tree).
Some ways to speed up insertion:
If you insert many rows from the same customer at the same time, use the INSERT statement of multiple value tables. This is faster than using separate INSERT statements (several times in some cases).
If you insert many rows from different customers, you can get higher speed by using INSERT DELAYED statements. See 7.14 INSERT syntax.
Note that with MyISAM, if there are no deleted rows in the table, you can insert rows while SELECT:s is running.
Use LOAD DATA INFILE when loading a table from a text file. This is usually 20 times faster than using many INSERT statements. See 7.16 LOAD DATA INFILE syntax.
When a table has many indexes, it is possible to do more to make LOAD DATA INFILE faster. Use the following procedure:
Create tables selectively with CREATE TABLE. For example, use mysql or Perl-DBI.
Execute the FLUSH TABLES, or shell command mysqladmin flush-tables.
Use myisamchk-- keys-used=0-rq / path/to/db/tbl_name. This removes the use of all indexes from the table.
Insert data into the table with LOAD DATA INFILE, which will not update any indexes, so it is fast.
If you have myisampack and want to compress the table, run myisampack on it. See 10.6.3 characteristics of the compression table.
Create the index again with myisamchk-r-Q / path/to/db/tbl_name. This creates an index tree in memory before writing it to disk, and it is faster because it avoids a large number of disk pathfinders. As a result, the index tree is also perfectly balanced.
Execute the FLUSH TABLES, or shell command mysqladmin flush-tables.
This process will be constructed into a future version of LOAD DATA INFILE in MySQL.
You can lock your watch to speed up insertion.
Mysql > LOCK TABLES a WRITE
Mysql > INSERT INTO a VALUES (1), (2), (34), (4)
Mysql > INSERT INTO a VALUES (8. 26), (6. 29)
Mysql > UNLOCK TABLES
The main speed difference is that the index buffer is cleaned to disk only once, after all INSERT statements are completed. There is generally an index buffer cleaning that is the same as having a different INSERT statement. If you can insert all rows with a single statement, locking is not necessary. Locking will also reduce the overall time for multi-connection testing, but the maximum wait time for some threads will rise (because they wait for the lock). For example:
Thread 1 does 1000 inserts
Thread 2, 3, and 4 does 1 insert
Thread 5 does 1000 inserts
If you don't use locking, 2, 3, and 4 will be done before 1 and 5. If you use locking, 2, 3, and 4 may not be done before 1 or 5, but the overall time should be about 40% faster. Because INSERT, UPDATE, and DELETE operations are fast in MySQL, you will get better overall performance by locking things that insert or update a row more than five times in a row. If you insert a lot of rows, you can do a LOCK TABLES, followed by an occasional UNLOCK TABLES (about every 1000 rows) to allow other threads to access the table. This will still lead to good performance. Of course, LOAD DATA INFILE is still faster for loading data.
To get some faster speed for LOAD DATA INFILE and INSERT, expand the keyword buffer. See 10.2.3 to adjust server parameters.
10.5.7 Speed of UPDATE queries
The change query is optimized to a SELECT query with a write overhead. The write speed depends on the size of the data being updated and the number of indexes being updated.
Another way to make changes faster is to defer changes and then make a lot of changes one line at a time. If you lock the table, it's faster to make a lot of changes one row at a time than one at a time.
Note that a change in the dynamic record format a record with a longer total length may cut the record. So if you do this often, it's important to OPTIMIZE TABLE from time to time. See 7.9 OPTIMIZE TABLE syntax.
10.5.8 Speed of DELETE queries
The time it takes to delete a record is precisely proportional to the number of indexes. To delete records more quickly, you can increase the size of the index cache. See 10.2.3 to adjust server parameters.
Deleting all rows from a table is also much more than deleting a large portion of the rows.
10.6 Select a table type
With MySQL, currently (version 3.23.5) you can choose between the formats of 4 available tables from a speed point of view.
Static MyISAM
This format is the simplest and safest, and it is also the fastest on disk format. Speed comes from the difficult and easy way in which data can be found on disk. When there is something in an index and static format, it is simple, just the length multiplied by the number of rows. And when scanning a table, it is easy to read constant records with each disk read. Security comes from the fact that if your computer crashes when writing a static MyISAM file, myisamchk can easily point out where each line begins and ends, so it usually reclaims all records, except the one that was partially written. Note that in MySQL, all indexes can always be rebuilt.
Dynamic MyISAM
This format is a bit complicated because each line must have a header indicating how long it is. When a record becomes longer when it changes, it can also end in more than one location. You can use OPTIMIZE table or myisamchk to organize a table. If you have static data accessed / changed in the same table as some VARCHAR or BLOB columns, it may be a good idea to move dynamic columns into another table to avoid fragmentation.
Compressed MyISAM
This is a read-only type generated with the optional myisampack tool.
Memory (HEAP heap)
This table format is useful for small / medium lookup tables. It is possible to speed up the join of multiple tables for copying / creating a commonly used lookup table (using joins) to a (perhaps temporary) HEAP table. Assuming we want to make the following connections, it may take several times longer to use the same data.
SELECT tab1.a, tab3.a FROM tab1, tab2, tab3
WHERE tab1.a = tab2.an and tab2.a = tab3.an and tab2.c! = 0
To speed it up, we can create a temporary table with the join of tab2 and tab3, because we use the same column (tab1.a) to look up. Here is the command to create the table and select the results.
CREATE TEMPORARY TABLE test TYPE=HEAP
SELECT
Tab2.an as a2, tab3.an as a3
FROM
Tab2, tab3
WHERE
Tab2.a = tab3.an and c = 0
SELECT tab1.a, test.a3 from tab1, test where tab1.a = test.a1
SELECT tab1.b, test.a3 from tab1, test where tab1.a = test.a1 and something
10.6.1 characteristics of static (fixed length) tables
This is the default format. It is used when the table does not contain VARCHAR, BLOB, or TEXT columns.
All CHAR, NUMERIC, and DECIMAL columns are filled to the column width.
Very fast.
It's easy to buffer.
It is easy to rebuild after a crash because the record is in a fixed location.
It doesn't have to be reorganized (with myisamchk) unless a huge amount of records are deleted and you want to return free disk space to the operating system.
Usually requires more disk space than dynamic tables.
10.6.2 characteristics of dynamic tables
Use this format if the table contains any VARCHAR, BLOB, or TEXT columns.
All string columns are dynamic (except for those less than 4 in length).
Each record is preceded by a bitmap that indicates which column is empty ('') for a string column or zero for a numeric column (unlike a column that contains a null value). If the string column has zero length after the trailing white space is deleted, or if the numeric column has a zero value, it is marked in the bitmap and is not saved to disk. A non-empty string is stored as a length byte plus string contents.
It usually takes up more disk space than fixed-length tables.
Each record uses only the required space. If a record becomes larger, it is cut into multiple segments as needed, which results in record fragments.
If you update a line with information that exceeds the line length, the line will be segmented. In this case, you may have to run myisamchk-r from time to time for better performance. Use myisamchk-ei tbl_name to do some statistics.
It is not easy to rebuild after a crash, because a record can be divided into many segments and a connection (fragment) can be lost.
The expected row length for dynamic size records is:
three
+ (number of columns + 7) / 8
+ (number of char columns)
+ packed size of numeric columns
+ length of strings
+ (number of NULL columns + 7) / 8
There is a penalty of 6 bytes per connection. Whenever a change causes a record to grow, a dynamic record is linked. Each new link will be at least 20 bytes, so the next increase is likely to be in the same chain. If not, there will be another link. You can use myisamchk-ed to check how many links there are. All links can be deleted with myisamchk-r.
10.6.3 characteristics of compression tables
A read-only table made with the myisampack utility. All customers with MySQL extended email support can retain a copy of myisampack for their internal use.
The unzipped code exists in all MySQL distributions so that even customers with no myisampack can read tables compressed with myisampack.
Take up a small amount of disk space to minimize disk usage.
Each record is compressed separately (with little access overhead). The header of a record is fixed in length (1-3 bytes), depending on the largest record in the table. Each column is compressed in a different way. Some types of compression are:
There is usually a different Huffman table for each column.
Suffix blank compression.
Prefix blank compression.
Use 1-bit storage with a number with a value of 0.
If the value of an integer column has a small range, the column is stored using the smallest possible type. For example, if all values range from 0 to 255, a BIGINT column (8 bytes) can be stored as a TINYINT column (1 byte).
If the column has only a small collection of possible values, the column type is transformed to ENUM.
Columns can use the combination of compression methods above.
Can handle fixed-length or dynamic-length records, but not BLOB or TEXT columns.
Can be decompressed with myisamchk.
MySQL can support different index types, but the general type is ISAM. This is a B-tree index and you can roughly calculate the size of the index file to (key_length+4) * 0.67, the sum of all the keys. This is the worst-case scenario, when all keys are inserted in sort order. )
The string index is blank compressed. If the first index part is a string, it will also compress the prefix. If the string column has a lot of trailing whitespace or a full-length VARCHAR column, white space compression makes the index file smaller. Prefix compression is helpful if many strings have the same prefix.
10.6.4 Features of memory tables
Heap tables only exist in memory, so if mysqld is turned off or crashes, they will be lost, but because they are fast, they are useful anyway.
The HEAP table inside MySQL uses a 100% dynamic hash with no overflow area and has no issues with deletions.
You can only access things by using the equation of an index in the heap table (usually using the = operator).
The disadvantages of stacking tables are:
You need enough extra memory for all the heap tables you want to use at the same time.
You cannot search on one part of the index.
You cannot search for the next entry in sequence (that is, use this index to make an ORDER BY).
MySQL also cannot calculate the approximate number of rows between two values. This is used by the optimizer to decide which index to use, but on the other hand there is no need for disk seek.
10.7 other optimization techniques
Suggestions for speeding up the unclassified system are:
Use persistent connections to the database to avoid connection overhead.
Always check all your queries to make sure you use the index you have created in the table. In MySQL, you can do this with the EXPLAIN command. See 7.22 EXPLAIN syntax (for information about SELECT).
Try to avoid complex SELECT queries on tables that have been changed a lot. This avoids problems associated with locking tables.
In some cases, it makes sense to introduce a "hash" column based on information from columns from other tables. If the column is short and has a reasonable unique value, it can be faster than a large index on many columns. In MySQL, it's easy to use this extra column: SELECT * from table where hash='calculated hash on col1 and col2' and col_1='constant' and col_2='constant' and.. .
For tables with a lot of changes, you should try to avoid all VARCHAR or BLOB columns. As long as you use a single VARCHAR or BLOB column, you will get the dynamic row length. See 9.4 MySQL table type.
It's just that because the rows are too large, it's generally useless to split a table into different tables. In order to access rows, the biggest performance shock is disk seek to find the first byte of the row. After finding the data, most new disks are fast enough for most applications to read the entire row. The only situation where it is really necessary to split is if the table with its dynamic row size (see above) can become a fixed row size, or if you need to scan the table frequently without most columns. See 9.4 MySQL table type.
If you often need to calculate based on information from many rows (such as counting), it might be better to introduce a new table and update the counter in real time. The type change UPDATE table set count=count+1 where index_column=constant is very fast! This is really important when you use a database like MySQL that has only table-level locking (multiple reads / single writes). This will also give better performance for most databases because the lock manager has less work to do in this case. 11111111111111111111111
If you need to collect statistics from a large record file table, use a summary table instead of scanning the entire table. Maintaining summaries should be faster than trying to do "real-time" statistics. When there is a change rather than a need to change the running application, it is much faster to regenerate a new summary table from the record file (depending on the business decision)!
If possible, reports should be classified as "real-time" or "statistics", where the data required for statistical reports is based only on summary tables generated from actual data.
Take full advantage of the fact that there are default values listed. When the inserted value is different from the default value, the value is only explicitly inserted. This reduces the parsing required by MySQL and improves insertion speed.
In some cases, it is convenient to wrap and store data in a BLOB. In this case, you have to add additional code to your application to package / unpack things in BLOB, but this approach can save a lot of access at some stages. This is useful when you have data that does not conform to a static table structure.
In general, you should try to save data in the third normal form, but if you need these to get faster, you should not have to worry about repeating or creating summary tables.
Stored procedures or UDF (user-defined functions) may be a good way to get better performance, but if you use some database that does not support it, in this case, you should always have zero method (slower) to do this.
You can always get some benefits by buffering queries / answers in your application and trying to do a lot of inserts / updates at the same time. If your database supports locked tables (such as MySQL and Oracle), this should help ensure that the index buffer is emptied only once after all updates.
But you don't know when to write your data, use INSERT / *! DELAYED * /. This speeds up processing because many records can be written with a single disk write.
Use INSERT / * when you want to make your choice more important! LOW_PRIORITY * /.
Use SELECT / *! HIGH_PRIORITY * / to get the queued option, which is done even if someone is waiting to make a write.
Use multiple-line INSERT statements to store many lines with a single SQL command (supported by many SQL servers).
Use LOAD DATA INFILE to load a large amount of data. This is faster than normal insertion and will be even faster when myisamchk is integrated into mysqld.
Use the AUTO_INCREMENT column to form a unique value.
When using dynamic table formats, use OPTIMIZE TABLE occasionally to avoid fragmentation. See 7.90 PTIMIZE TABLE syntax.
Use the HEAP table when possible for faster speed. See 9.4 MySQL table type.
When using a normal Web server setting, the image should be stored as a file. This is only a reference to a file stored in the database. The main reason for this is that a normal Web server buffers files much better than database contents, so if you are using files, it is easier to get a faster system.
Use memory tables for frequently accessed unimportant data, such as information about the last display of slogans for users without cookie.
Columns with the same information in different tables should be declared the same and have the same name. Prior to version 3.23, you had to rely on slower connections. Try to simplify the name (use name instead of customer_name in the customer table). To make your name portable to other SQL servers, you should make them shorter than 18 characters.
If you need a really high speed, you should study the underlying interfaces for data storage supported by different SQL servers! For example, if you access MySQL MyISAM directly, you can get 2-5 times faster than using the SQL interface. In order to do this, however, the data must be on the same server as the application, and usually it should only be accessed by one process (because external file locking is really slow). The above problems can be eliminated by introducing the underlying MyISAM command into the MySQL server (this may be an easy way to achieve better performance if necessary). With a well-designed database interface, this type of optimization should be fairly easy to support.
In many cases, accessing data from a database (using a real-time connection) is faster than accessing a text file, just because the database is more compact than a text file (if you use digital data) and this will involve less disk access. You also save code because you don't have to analyze your text files to find line and column boundaries.
You can also use replication acceleration. See 19.1 Database replication.
10.8 use your own benchmark
You decide that you should test your application and database to find out where the bottleneck is. By fixing it (or by replacing the bottleneck with a "dumb module"), you can easily identify the next bottleneck (etc.). Even if the overall performance is "good enough" for your application, you should at least make a "plan" for each bottleneck, and if someone "really needs to fix it," how to solve it.
For some examples of portable benchmarks, see the MySQL benchmark suite. See 11 MySQL benchmark kit. You can take advantage of any program in this suite and modify it for your needs. By doing so, you can try different solutions to your problems and test which one is the fastest solution for you.
It is common for problems to occur when the system is under heavy load, and we have many customers who contact us who have a (test) system in the production system and have load problems. So far, one of these situations is related to the basic design (table scans do not perform well under high load) or OS/ library problems. If the system is no longer in the production system, most of them will be easy to fix.
To avoid such problems, you should put some effort into testing your entire application under the worst possible load!
10.9 Design selection
MySQL stores row and index data in separate files. Many (almost all) other databases mix row and index data in the same file. We believe that the choice of MySQL is good for a very wide range of modern systems.
Another way to store row data is to store each column of information in a separate area (examples such as SDBM and Focus). This will achieve a performance breakthrough for each query that accesses more than one column. Because this degrades rapidly when more than one column is accessed, we believe that this model is not the best for general-purpose databases.
A more common situation is that indexes are stored with data (like Oracle/Sybase). In this case, you will find the row information on the leaf page of the index. The advantage of this layout is that it saves a disk read in many cases (depending on how the index is buffered). The disadvantages of this layout are:
Table scanning is slower because you have to finish reading the index to get the data.
You lose a lot of space because you have to repeat the index from the node (because you can't store rows on the node)
Deletion worsens the database table over time (because indexes in nodes are usually not updated after deletion).
You can't just use index tables to retrieve data for a query.
Indexed data is difficult to buffer.
10.10 MySQL Design limitations / tradeoffs
Because MySQL uses extremely fast table locking (multiple reads / writes), the biggest problem left is a mix of a stable data stream inserted in the same table and a slow selection.
We believe that, in other cases, for most systems, unusually fast performance makes it a winner. This situation can usually be solved by multiple copies of the table, but it takes more effort and hardware.
For some common application environments, we are also developing some extension functions to solve this problem.
10.11 portability
Because all SQL servers implement different parts of SQL, it takes effort to write portable SQL applications. It's easy to choose / insert easily, but the more you need, the more difficult it is, and if you want the application to be fast for many databases, it becomes more difficult!
To make a complex application portable, you need to choose many SQL servers with which it should work.
When you can use MySQL's crash-me program (http://www.mysql.com/crash-me-choose...50 things.
For example, if you want to be able to use Informix or DB2, you should not have a column name longer than 18 characters.
The MySQL benchmark and crash-me are database independent. By watching what we do with it, you can get a sense of what you have to do to write your database-independent application. The benchmark itself can be found in the "sql-bench" directory of the MySQL source code distribution. They use the DBI database interface to write in Perl (it solves the access part of the problem).
Go to http://www.mysql.com/benchmark.html.? As a result.
As you can see in these results, all databases have some weaknesses. This is the different behavior caused by their different design compromises.
If you strive for database independence, you need to get a good feel for every SQL server bottleneck. MySQL is fast in retrieving and updating, but there will be a problem mixing readers / writers on the same table. On the other hand, Oracle has a big problem when you try to access your recently updated rows (until they are emptied to disk). In general, transactional databases are not good at generating summary tables from record file tables, because in this case, row-level locking is almost useless.
In order to make your application "really independent of the database", you need to define an easy and extensible interface with which you can manipulate your data. Because C++ is available on most systems, it makes sense to use a C++ interface to the database.
If you use some database-specific functionality (in MySQL, like the REPLACE command), you should code a method for the SQL server to achieve the same function (but slowly). With MySQL, you can use / *! * / syntax to add MySQL-specific keywords to the query. The code in / * / will be treated as a comment by most other SQL servers (ignored).
If high performance is really more important than accuracy, as in some web applications. One possibility is to create an application layer that buffers all the results to give you higher performance. By simply letting the old results' expire 'after a short period of time, you can keep the cache refreshed reasonably. This is quite good under extremely high loads, in which case you can dynamically increase the cache to a larger size and set a higher expiration time until everything returns to normal.
In this case, the table that creates the information should contain information about the initial size of the cache and how many times the table should generally be refreshed.
10.12 where have we used MySQL?
During the initial development of MySQL, the functionality of MySQL was suitable for our biggest customers. They handle data warehouses for some of the largest retailers in Sweden.
We get a weekly summary of all red card transactions from all stores and we are expected to provide all shopkeepers with useful information to help them figure out how their advertising campaigns affect their customers.
The data is quite large (about 7 million transactions per month) and we keep 4-10 years of data that need to be presented to users. We get a request from our customers every week that they want to access the new report from the data "immediately".
We solve it by storing all the information in a compressed "transaction" table every month. We have a simple set of macros / scripts to generate different conditions from the transaction table (product group, customer id, store.) The summary table of. A report is a web page dynamically generated by a small perl script that parses the page, executes SQL statements in the script and inserts the results. We'd love to use PHP or mod_perl now, but they didn't.
For graphic data, we have written a simple tool in C language, which can generate freebies based on the results of SQL queries (some processing of the results), which is also dynamically executed from perl scripts that analyze HTML files.
In most cases, a new report is done by simply copying an existing script and modifying the SQL query in it. In some cases, we will need to add more fields to an existing summary table or generate a new one, but this is also quite simple because we keep all the transaction tables on disk. At present, we have at least 50G of trading table and 200G of other customer data.
We also let our customers access the summary table directly through ODBC so that advanced users can experiment with the data themselves.
We use very mid-range Sun Ultra sparcstation (2x200 Mz) to deal with it without any problems. We recently upgraded one of the servers to a 2-CPU 400 Mz Ultra sparc, and we now plan to handle production transactions, which will mean a 10-fold increase in data. We think we can catch up with it just by adding more disks to our system.
We are also experimenting with Intel-Linux in order to get more cpu power more cheaply. Now that we have a binary portable database format (introduced in 3.32), we will start using it in some parts of the application.
Our initial impression is that Linux performs better at low to medium loads, but when you start to get the resulting high load, Solaris will perform better because of the limits of disk IO, but we don't have any conclusions about this yet. After discussions with some core Linux developers, this may be a side effect of Linux, which gives batch processing too many resources to make interaction performance very low. When a large batch is in progress, this makes the machine feel slow and unresponsive. Hopefully this will be solved in the future Linux kernel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.