In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you how to deal with tens of millions of units of records in SQLServer. I hope you will get something after reading this article. Let's discuss it together.
Project background
This is a project for a data center, the difficulty of the project is appalling, this project really makes me feel that the mall is like a battlefield, and I am just a small soldier among them, too many tactics, too many battles between the top brass, too many inside stories. Specific to the situation of this project, I am free to write related blog posts.
This project requires environmental monitoring. For the time being, the monitored equipment is called the acquisition equipment, and the attribute of the acquisition equipment is called the monitoring index. Project requirements: the system supports no less than 10w monitoring indicators, the data update of each monitoring indicator is less than 20 seconds, and the storage latency is not more than 120 seconds. So, we can get the ideal state by simple calculation-the data to be stored is 30w per minute, 1800w per hour, that is, 430W / 2 million per day. In fact, the amount of data will be about 5% larger than this. (in fact, most of it is information garbage, which can be processed by data compression, but what can others do if they want to mess with you?)
The above are the indicators required by the project. I think many students who have a lot of experience in dealing with big data will be impressed with their noses, is that all? Well, I have also seen a lot of things handled by big data, but I have not dealt with them before. It seems that it is really easy to solve what is distributed and what is the separation of reading and writing. However, the problem is not so simple, as I said above, this is a very bad project, is a typical project of vicious competition in the industry.
There are no more servers, but this server is not only equipped with database and centralized collector (that is, data parsing, alarm, storage program), but also supports 30w northbound interface (SNMP). Before the program is optimized, CPU takes up more than 80% all the year round. Because the project requires the use of dual-computer hot backup, in order to save trouble and reduce unnecessary trouble, we put the related services together so that we can make full use of the features of HA (externally purchased HA system).
The correctness of the system data requires extremely abnormal, from the bottom acquisition system to the top monitoring system, a single piece of data can not be bad.
Our system architecture is as follows. We can see that the pressure on the database is very high, especially on the LevelA node:
The hardware configuration is as follows:
CPU: Intel ®Xeon ®processor E5-2609 (4 cores, 2.40GHz, 10MB, 6.4 GT/s)
Memory: 4GB (2x2GB) DDR3 RDIMM Memory, 1333MHz
Hard disk: 500GB 7200 RPM 3.5 'SATA3 hard disk, Raid5.
Database version
Uses the SQLServer2012 standard edition, the genuine software provided by HP, lacks a lot of enterprise version of NB functions.
Recommend your own linuxC/C++ communication group: 973961276! Sorted out some personal feel better learning books, video materials and large factory noodle video sharing in the group file, the need for partners can add their own oh! ~
Write bottleneck
The first obstacle we encountered was that we found that under the existing program, SQLServer could not handle so much data at all. What is the specific situation?
Our storage structure
Generally speaking, in order to store a large amount of historical data, we will carry out a physical sub-table, otherwise millions of records a day would be hundreds of millions of records a year. So, it turns out that our table structure looks like this:
CREATE TABLE [dbo]. [His20140822] ([No] [bigint] IDENTITY (1d1) NOT NULL, [Dtime] [datetime] NOT NULL, [MgrObjId] [varchar] (36) NOT NULL, [Id] [varchar] (50) NOT NULL, [Value] [varchar] (50) NOT NULL, CONSTRAINT [PK_His20140822] PRIMARY KEY CLUSTERED ([No] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]
No as the unique identification, acquisition equipment Id (Guid), monitoring index Id (varchar (50)), recording time, recording value. And take the collection equipment Id and the monitoring index Id as the index for quick search.
Batch write
BulKCopy was used to write at that time, and yes, that's it. It claims to write millions of records in seconds.
Public static int BatchInert (string connectionString, string desTable, DataTable dt, int batchSize = 500) {using (var sbc = new SqlBulkCopy (connectionString, SqlBulkCopyOptions.UseInternalTransaction) {BulkCopyTimeout = 300, NotifyAfter = dt.Rows.Count, BatchSize = batchSize, DestinationTableName = desTable}) {foreach (DataColumn column in dt.Columns) sbc.ColumnMappings.Add (column.ColumnName Column.ColumnName) What is the problem with sbc.WriteToServer (dt);} return dt.Rows.Count;}?
In the above architecture, 40 million of the data every day is OK. However, when adjusting to the configuration under the above background, the centralized monitoring program overflowed memory, and the analysis found that too much data received was placed in memory, but did not have time to write to the database, resulting in the resulting data greater than the consumption of data, resulting in memory overflow, the program can not work.
Where on earth is the bottleneck?
Is it because of a problem with the RAID disk? Is it the data structure? Is it the hardware? Is it the SQLServer version? Is there a problem with no zoning table? Or is it a matter of procedure?
At that time, there was only a week, and maybe a week, and the project supervision was about to get us out of here, so we had the feat of working for 48 hours in a row, and there were chickens who called everywhere to ask for help....
However, what is needed at this time is to be calm, and then calm down. SQLServer version? Hardware? It's unlikely to change at the moment. RAID disk array, probably not. So what is it, really TM can't calm down.
You may not feel the tense atmosphere at the scene, in fact, after so long, it is very difficult for me to return to that situation. But it's fair to say that maybe we have all kinds of methods now, or we have more thinking from outsiders, but when a project presses you to give up, your thoughts and considerations at that time are restricted by environmental factors on the spot. There could be major deviations. It may make you think quickly, or you may stop thinking. Some colleagues in this high-pressure environment, there are even more low-level mistakes, thinking has been completely confused, less efficient …... Stay awake for 36 hours, or just take a nap for two or three hours on the construction site (there is mud everywhere on rainy days and mud when it dries), and then continue to work for a week in a row! Or go on!
A lot of people give a lot of ideas, but they seem to be useful and useless. Wait, why "it seems to work, but it doesn't seem to work"? Vaguely, I seem to have grasped a trace of direction, what is it? By the way, verification, we are now running in the on-site environment, there is no problem before, it does not mean that there is no problem under the current pressure, to analyze such a small function in a large system, the impact is too great, we should decompose it. Yes, it's a "unit test", a test of a single method, and we need to verify each function. Where does each individual step take time?
Step by step test to verify the bottleneck of the system
Modify the parameters of BulkCopy
First of all, what I think of is that the parameters of Shuga BulkCopy, BulkCopyTimeout and BatchSize, are constantly tested and adjusted, and the results always fluctuate in a certain range, which actually has no effect. It may affect some CPU counts, but it falls far short of my expectations, and the write speed still fluctuates at 5 seconds 1w~2w, far less than the required 20w records in 20 seconds.
Storage by acquisition device
Yes, the above structure is a record according to each value of each indicator, is it too much waste? So is it feasible to press the acquisition equipment + acquisition time as a record? The question is, how to solve the problem that the properties of different acquisition devices are different? At this time, a colleague gives full play to his ability, and the monitoring indicators + monitoring values can be stored in XML format. Wow, can you still do that? Query can be in the form of for XML.
So there is this structure: No, MgrObjId, Dtime, XMLData
The results show that it is slightly better than the above, but it is not too obvious.
Data table partition?
At that time, I had not yet learned this skill. After reading the articles on the Internet, it seemed very complicated. I didn't dare to try because I didn't have much time.
Stop other programs
I know that this will definitely not work, because the architecture of software and hardware cannot be modified for the time being. But I want to verify that these factors are the influence. It was found that the hint was indeed obvious, but it still did not meet the requirements.
Is it the bottleneck of SQLServer?
There's nothing we can do about it. Is this the bottleneck of SQLServer? Check the relevant information on the Internet, may be the bottleneck of IO, Nima, what else can be done, to upgrade the server, to replace the database, but does the project give it?
Wait, there seems to be something else, the index, the index! The existence of the index will affect insert and update
Remove the index
Yes, the query must be slow after removing the index, but I must first verify that removing the index will speed up the write. If you decisively remove the indexes of the MgrObjId and Id fields.
Run, a miracle occurred, each write 10w records, in 7 to 9 seconds can be written, so as to meet the requirements of the system.
How to solve the query?
A table needs more than 400 million records a day, which is impossible to query without an index. What to do!? I thought of our old way, the physics table. Yes, we used to watch it by day, so now we watch it by hour. Then 24 tables, each table only needs to store about 1800w records.
Then query the history of an attribute in an hour or several hours. The result is: slow! Slow down! Slow down! It is simply unthinkable to query more than 10 million records without the index. What else can I do?
Continue to divide the table, I think, we can also continue to divide the table according to the underlying collector, because the acquisition equipment is different in different collectors, so when we query the historical curve, we can only check the historical curve of a single index. so it can be scattered in different tables.
Just do it. As a result, the problem of writing more than 400 million records a day and supporting simple queries has been solved by generating 240 tables per day by 10 embedded collections and 24-hour sub-tables (the history table name is similar to this: His_001_2014112615).
Query optimization
After the above problems are solved, half of the difficulties of the project have been solved, and the project supervision is embarrassed to come to find fault. I don't know what kind of tactical arrangement it is.
After a long time, now that it is approaching the end of the year, the problem comes again, which is to delay you so that you cannot accept other projects at the end of the year.
This time the requirement is like this: because the above is a simulation of 10w monitoring indicators, but now the actual line, but only about 5w of equipment. Well, this obviously cannot meet the requirements of the tender and cannot be checked and accepted. So what should I do? These smart people thought, since the monitoring target is halved, then we will halve the time, won't we? that is to say, according to the current 5w equipment, you have to store it in storage within 10s. Oh, my God, according to your logic, if we only have 500 monitoring indicators, don't we have to store them in 0.1 seconds? Don't you consider the feelings of the monitored devices?
But what can you do if others want to play with you? Take it. As a result, after reducing the time to 10 seconds, the problem arises. After a careful analysis of the above logic, we can see that the sub-table is divided according to the collector, and now the number of collectors is reduced, but the number has increased. What happens can be supported by writing? however, the record of each table is close to 400w, and some acquisition equipment monitoring indicators are close to 600w. How to break?
So the technical staff met to discuss the relevant measures.
How to optimize the query without indexing?
Some colleagues have proposed that the order of where clauses will affect the results of the query, because according to the results after you brush the selection, you can first brush to select a part of the data, and then proceed to filter the next condition. It sounds reasonable, but doesn't the SQLServer query analyzer optimize automatically? Forgive me for being a rookie. I just feel it. I think it should be the same as the compiler of VS. It should be optimized automatically.
To be specific, we still have to speak with the facts:
Results after the colleagues modified the client, the test feedback improved greatly. I looked at the code:
Does it really have such a big impact? Wait, did you forget to clear the cache, creating an illusion?
Then ask your colleague to execute the following statement to get more information:
-- DBCC FREEPROCCACHEDBCC DROPCLEANBUFFERSSET STATISTICS IO ONselect Dtime,Value from dbo.his20140825 WHERE Dtime > =''AND Dtime='' AND Dtime before optimization
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.