In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to speed up SQL with two Unix commands. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
I tried to run a simple join query on MariaDB (MySQL), but the performance was simply terrible. How to reduce the query time from 380 hours to less than 12 hours.
Here is the query, which is part of the GHTorrent analysis, which I implemented using the relational online analysis processing framework simple-rolap.
Select distinct project_commits.project_id, date_format (created_at,'% x% v1') as week_commit from project_commits left join commits on project_commits.commit_id = commits.id
Both connection fields have indexes. However, MariaDB implements the join by doing a full table scan of project_commits and an index lookup of commits. This can be seen in the output of EXPLAIN.
There are more records in these two tables: project_commits has 5 billion rows and commits has 847 million rows. The memory of the server is relatively small, only 16GB. So it is probably because there are not enough indexes in it and you need to read the disk, which has a serious impact on performance. According to pmonitor's analysis of the temporary table, the query has been running for half a day and will take another 373 hours to complete.
/ home/mysql/ghtorrent/project_commits#P#p0.MYD 6.68% ETA 373mono 38purl 11
In my opinion, this is too much, because the Imax O time required to sort merge joins (sort-merge join) should be an order of magnitude lower than the expected execution time. I asked for help on dba.stackexchange.com, and some suggestions were given to me to try, but I was not confident that they would solve my problems. I tried some suggestions, but the result was not optimistic. It took at least half a day to try each suggestion, and then I decided to adopt a solution that I thought could solve the problem effectively.
I export the two tables to a file, wire them together using Unix's join command, pass the results to uniq, remove the duplicate lines, and then import the results back to the database. The import process, including re-indexing, starts at 20:41 and ends at 9:53 the next day. The following are the specific steps.
1. Export database tables as text files
I first export the fields needed to join the two tables and sort them by join field. To ensure that the sort order is compatible with the sort order of the Unix tool, I convert the fields to character types.
I saved the output of the following SQL query to the file commits_week.txt.
Select cast (id as char) as cid, date_format (created_at,'% x% v1') as week_commit from commits order by cid
Then save the output of the following SQL query to the project_commits.txt file:
Select cast (commit_id as char) as cid, project_id from project_commits order by cid
The following two files are generated.
-rw-r-r- 1 dds dds 15G Aug 4 21:09 commits_week.txt-rw-r-r- 1 dds dds 93G Aug 5 00:36 project_commits.txt
To avoid running out of memory, I use the-quick option to run the mysql client, otherwise the client will try to collect all records before outputting the results.
two。 Working with files using the Unix command line tool
Next, I use Unix's join command to concatenate the two text files. This command linearly scans two files and combines * records with the same fields. Because the records in the file have been sorted, the whole process is completed very quickly, almost as fast as Imax O. I also passed the result of the join to uniq to eliminate duplicate records, which solved the distinct problem in the original query. Similarly, on the ordered output, the weight can be removed by a simple linear scan.
This is the Unix command I ran.
Join commits_week.txt project_commits.txt | uniq > joined_commits.txt
After an hour of processing, I got the desired result.
-rw-r-r- 1 dds dds 133G Aug 5 01:40 joined_commits.txt
3. Import text files back to the database
*, I import the text file back to the database.
Create table half_life.week_commits_all (project_id INT (11) not null,week_commit CHAR (7)) ENGINE=MyISAM;load data local infile 'joined_commits.txt'into table half_life.week_commits_allfields terminated by'; thank you for reading! This is the end of the article on "how to use 2 Unix commands to speed up SQL". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.