In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article focuses on "database optimization to solve massive data", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "database optimization to solve massive data" bar!
(1) Separation of active data
Some data have a large amount of data, but there is not much active data. For example, users have many registered users, but the number of active users is very small. At this point, active data can be separated from inactive data. The active data is stored separately in the active table, and the data that becomes inactive in the active table is migrated out regularly through offline tasks. When querying, first query the data from the active table, and then query the data from the inactive table if the query cannot be found. This can improve the search efficiency.
(2) Separation of reading and writing
The essence of read-write separation is to cluster the database and disperse the operation of the database from a single database server to multiple database servers, so as to reduce the pressure on a single server. Considering that the data stored by all servers in the cluster must be consistent, the design architecture of one master and multiple slaves is generally adopted, in which the master server is used for data writing and multiple slave servers are used for data reading, thus dispersing the read pressure in the scenario of high concurrent requests. When the master server synchronizes the data to the slave server, if there are a large number of slave servers, you can let the master server synchronize the data to a part of the slave server first, and then synchronize to the other part after receiving the data from the server.
(3) batch reading and delayed modification
Batch reads and deferred modifications improve efficiency by reducing the number of operations. Multiple data queries in a single upstream request can be merged into one batch query, or data queries for multiple requests over a period of time can be merged into a single data query. Delayed modification is mainly for high concurrency and frequently modified data. You can temporarily save the modified results in the cache, and then save the cached data to the database regularly. When reading the data, the program can read the data in the database and cache at the same time. However, this can cause temporary data inconsistencies, and when the cache fails, data that is not written to the database may be lost.
(4) distributed database
Distributed database is to store multiple tables in different databases and then to different servers. In this way, when processing a request, if you need to call multiple tables, you can let multiple servers process at the same time, thus improving the processing speed. Distributed database is used to solve the problem that a single request itself is very complex, assigning a single complex request to multiple servers for processing. After using distributed, each node can also use read-write separation at the same time to form multiple node groups. You can also save the data tables of different businesses to different nodes, so that different businesses can call different databases. Using distributed database will face the problem of data consistency and multi-table query.
(5) Database optimization
Database optimization mainly includes: table structure optimization, index optimization, SQL statement optimization, partition and table partition, using stored procedures.
Table structure optimization: the large table is divided into multiple sub-tables, the fields with high query relevance are stored in one table, and some business data are redundant.
Index optimization: the essence of an index is that when the data changes, it is saved in a table-like structure according to the pre-specified order of fields, and when queried by the index field, the pointer of the data record can be obtained through this structure and the record can be obtained from the data table. Scanning with an index is more efficient than a full table scan. However, we should pay attention to whether the use of the index is reasonable, whether the index brings significant efficiency improvement, and which index is used in a query when multiple indexes coexist. Too many indexes will affect the efficiency of the storage engine to optimize the query, whether too many indexes will affect the efficiency of data update, and whether the amount of data contained in the index will be too large when using clustered indexes, resulting in cache storage and access to the disk.
SQL statement optimization: including syntax and business logic optimization, with indexing and caching. Do not return data for all columns, and do not return too many records in a batch query, because the query results can be stored in the database server's cache. Then we need to do a good job of monitoring and slow query log, find the slow query in time and analyze the reason. The execution results of the same query statement may be different for different query conditions.
Partition: divide the data in a table into different areas according to certain rules. For example, by time. In this way, when querying, if the scope of the data is in the same area, you can operate on the data of only one area.
Sub-table: if the data in a table can be divided into many different types, and there are not many scenarios in which different types of data are operated at the same time, you can consider saving different types of data in different tables. Or you can split a large table into multiple small tables, thereby reducing the granularity of the lock. For example, a table is accessed frequently, and a write operation only changes the data of individual columns, while other unmodified columns may be accessed by other queries. The problem of sub-table is how to find the corresponding table, and multi-table operation is needed when querying complete data.
Use stored procedures: stored procedures only need to be compiled once and can perform some complex operations.
PS reason analysis: no reasonable use of indexes, too many indexes lead to low efficiency of storage engine optimization query, low efficiency of data update, too many fields, return too many records, unreasonable query statements, join table query, table structure is too large, lock granularity is too thick, high concurrency query, buffer breakdown, penetration, avalanche.
(VI) NoSQL
When using a relational database, it is usually necessary to define the structure of the table, what fields there are, and the type of each field, and a field can only hold a single information, not a multi-tier class capacity. NoSql breaks through the shackles of relational database, realizes the unstructured data storage, and improves the performance of data operation by storing data through multiple blocks.
(7) Hadoop
In the big data scene, the corresponding solution is provided for the storage and processing of data. Its underlying structure is distributed plus cluster, the data in the same data table is divided into multiple blocks and stored in different nodes (distributed), and each piece of data is saved by multiple nodes (cluster). Any node does not save a complete data table, but it saves the data of multiple data tables. When processing a query request, find the corresponding node for each block of data for processing, and then aggregate the processing results of multiple nodes.
(VIII) caching and page static
Reduce the number of calls to back-end services and access to the database. Local caches mainly use map, while distributed caches include redis, memcache and Ehcache. The main consideration is when the cache is set, when it is updated, and when it is deleted. If the application needs to operate on both cache and DB, what is the order of operation of cache and DB.
At this point, I believe you have a deeper understanding of "database optimization to solve massive data". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un