How to solve the problem of MySQL data Analysis 07/19 Update SLTechnology News&Howtos

How to solve the problem of MySQL data Analysis

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to solve the problem of MySQL data analysis". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to solve MySQL data analysis.

As the most popular open source database, MYSQL is becoming the choice of more and more enterprises. MySQL database is widely used in a variety of business systems, in addition to online business logic reading and writing, there will be some additional data analysis requirements, such as BI report, visual large screen, big data application and so on. However, due to the problems such as MySQL architecture, it is often inadequate in the face of data analysis scenarios.

In response to this situation, there are many solutions in the industry. A new method, data lake analysis, is recommended here, which is a good choice in the face of low-cost scenarios. Before we start the formal content, let's make a brief introduction to the relatively unfamiliar concept of data Lake.

Data Lake is an interactive federated query service based on Serverlessization. Using standard SQL, you can analyze and integrate data from object storage (OSS), database (PostgreSQL/MySQL, etc.), NoSQL (TableStore, etc.) data sources.

01. Background of the scheme

Demand scenario one

MySQL database is widely used in a variety of business systems, in addition to online business logic reading and writing, there will be some additional data analysis requirements, such as BI report, visual large screen, big data application and so on. With the development of business, after the stand-alone MySQL database reaches a certain amount of data, the direct use of MySQL for data analysis performance is relatively poor, and will affect the reading and writing performance of online business. In this case, it is necessary to seek a new data analysis scheme.

Demand scenario 2

The data in MySQL needs to be analyzed jointly with log data. In this scenario, some companies will use open source big data systems (such as Hive,Hadoop,Spark, etc.) to build data warehouses. Although this method can solve the problem, it requires the highest manpower costs and server and other resource costs. How can we analyze the data of MySQL and other systems together at low cost?

Demand scenario 3

When the amount of data in MySQL exceeds the stand-alone performance, in order to ensure the online business performance, DBA usually uses the technology of sub-database and sub-table to split the single table data in one database into multiple tables in multiple databases. Because a logical table is split into multiple tables, it will be very complicated to analyze the data at this time. New analytical solutions are needed to solve the problem.

02. Case evaluation factor

In the MySQL analysis scenario, what are the main factors to consider if you want to solve the above three scenario problems? If there are multiple solutions, how should I choose? You can refer to the following key factors.

1. Cost factor

The cost mentioned here is a comprehensive concept, which not only refers to the economic cost, but also includes the time, manpower, risk cost and so on. Users should consider the comprehensive "performance-to-price ratio" when making a scheme choice.

two。 Ability factor

The capability dimension includes two aspects, namely, function and performance. Functionally, whether the scheme provides complete analysis ability and expansion ability. In terms of performance, whether it meets the requirements of users for timeliness and parallelism, especially on a massive scale.

3. Maintainability

A good product should provide good maintainability. Users can use it in a very concise way. When there is a problem, it can also be easily checked and solved.

4. Ease of use

The product itself should be easy to use. Users only need a very low threshold to use data analysis services.

03. Choice of scheme

There are a variety of solutions for MySQL data analysis scenarios, including analysis directly on MySQL read-only instances, self-built open source data warehouse and data lake construction scheme. Let's take a closer look at the pros and cons of these solutions.

Analysis of read-only examples based on MySQL

Build MySQL read-only database instances by purchasing additional servers, and then do data analysis based on read-only instances. The pros and cons of this scheme:

Disadvantages:

The function cannot meet the requirements of scenario 2 and scenario 3, even for scenario 1, when the amount of data increases (see TPC-H 10G SQL below is time-consuming), the analysis performance based on read-only instances will be very poor.

The cost is higher: the cost of additional read-only instances is also higher.

Advantages:

The scheme is simple and can prevent the impact on online business; it is easy to use and good compatibility.

Self-built open source data warehouse

Build a data warehouse using open source big data systems (such as Hive,Hadoop,Spark, etc.), then synchronize MySQL data to the data warehouse, and then analyze the data based on Spark or Hive.

Disadvantages:

Poor ease of use: the threshold for open source big data system is relatively high, which requires special big data engineers for operation and maintenance; in addition, Sqoop synchronization does not support table structure changes, adding and deleting columns will lead to synchronization failure.

The highest cost: in addition, additional servers need to be purchased to build the system, which increases the hardware cost, and the overall cost of this solution is the highest.

Advantages:

It can solve the problem of demand scenario one and two, and the analysis performance is good.

Analytical database

Use open source or commercial analytical database, complete data synchronization through data synchronization tools, and then analyze data based on SQL.

Disadvantages:

The maintainability is poor and special operation and maintenance personnel are needed.

The cost is high and additional resources need to be purchased.

Advantages:

Data analysis to meet the mass scale

Construction scheme of data lake

Based on Aliyun data lake analysis and construction scheme, it can perfectly meet the needs of low-cost analysis of MySQL data.

Advantages:

Easy to use: using one-click Jiancang can easily synchronize the entire database to the data lake.

Strong analytical ability: the data lake analysis (Data Lake Analytics) experience is exactly the same as the MySQL experience, and the increase in the amount of data has little impact on the analysis performance.

The cost is extremely low: no need to buy servers, charge according to query volume, no query, no charge; no maintenance cost.

Impact on source database: data analysis has no impact on online business.

04. Evaluation data and technical principle of data lake construction scheme

Next, let's take a detailed look at the evaluation data and technical principles of the data lake construction scheme.

Low cost and high performance

Low cost

The following is a comparison of costs. The monthly cost of purchasing an additional high-performance RDS (MySQL database) is 2344 CNY. Take TPC-H 10G as an example. If you execute 22 SQL of TPC-H once a day, the cost of using DLA for one month is only 26.64 CNY, which is less than 1 CNY per day. High-performance analytics can be obtained at a cost of 1%; in addition, the row storage consumption of DLA only requires 3G, while native Mysql storage may consume about 20 gigabytes.

High performance

After synchronizing the data from the source database, data Lake Construction uses column + compression to store data. Taking TPC-H 10G data as an example, storage in MySQL will consume about 20 GB of storage, but using column + compression will only consume about 3G storage.

Using Aliyun data Lake Analysis (DLA) analysis, we can obtain efficient analysis at very low cost. Taking the data of TPC-H 10G as an example, the average execution time of 22 SQL of TPC-H is 5.5s in DLA and 345.5s in MySQL, and 4 SQL can not escape.

The following figure shows the time-consuming comparison of 22 SQL of TPC-H 10G in MySQL and DLA.

Ease of use

Support for rich data sources

Aliyun data Lake analysis and construction solution supports a wide range of data sources, including self-built MySQL, SQLServer, PostgreSQL, Oracle, cloud database RDS, PolarDB, ADB, etc. Compared with the traditional data warehouse, its design goal is "simple", so that users can synchronize data to DLA through simple configuration.

Automatic synchronization to keep data consistent

Data lake construction supports automatic synchronization of updated data, including creating tables, deleting tables, adding columns, modifying columns, deleting columns and other metadata operations. In the scene of sub-database and sub-table, the data lake construction can merge a logical table distributed in multiple databases into one table, and realize data analysis based on one table. In addition, there is no upper limit on the number of tables built to support synchronization in the data lake.

Incremental construction

The data Lake Analysis (DLA) team is developing a data lake incremental construction to support the incremental mode of synchronizing source database data, which can completely eliminate the impact on the source database, and can greatly improve the timeliness of data analysis. Incremental build will be released in the near future, please look forward to it.

Impact on source and sink

The query based on the data lake analysis has no impact on the source database at all; when the data lake synchronizes data from the source database, the impact on the source database is guaranteed to be less than 10%. The following figure shows the CPU consumption of data lake construction for source databases of different specifications: with the increase of machine specifications, the number of connections will automatically increase, and the average CPU consumption of the source database will be less than 10%.

In order to minimize the impact of synchronization on the source database, the construction of the data lake has done a lot of optimization. These include:

The data lake construction automatically adjusts the number of connections according to the machine specifications of the source database to ensure that the pressure on the source database is less than 10%.

When synchronizing a table concurrently, we give priority to selecting index columns for segmentation, and quickly locate a data range through the index to reduce the impact of synchronization on the source database.

Data Lake Construction selects business trough for data synchronization by default to prevent the online business from being affected.

Finally, the pressure on the source database is almost negligible. If users want to speed up synchronization, they can also manually increase the number of connections to speed up synchronization.

05. Practice of Aliyun data Lake

If you want to try data Lake Analysis to build MySQL low-cost Analysis, you only need the following steps to activate the trial.

1. Log in to Data Lake Analytics Management console. In the upper left corner of the page, select the region where DLA is located. (https://datalakeanalytics.console.aliyun.com)

Click the solution in the left navigation bar. On the solution page, click the enter wizard in Jiancang.

3. Configure the parameters according to the prompt on the page.

4. After the above parameters are configured, click create, and you can start a pleasant analysis using the data lake.

At this point, I believe you have a deeper understanding of "how to solve MySQL data analysis". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.