Support "go to O" database evaluation through self-developed database portrait tool 07/06 Update SLTechnology News&Howtos

Support "go to O" database evaluation through self-developed database portrait tool

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

"going to O" has been a hot topic in recent years, which has given rise to all kinds of doubts, including existing database evaluation, technology selection and so on. Going to O is a systematic project and needs to be fully evaluated. Through self-research tools, this paper generates database portraits to provide first-hand data for de-O evaluation, hoping to give you a reference.

1. Common doubts

When considering going to O, many companies are often faced with the problem of "not knowing enough about their own database", and they can't help but have some doubts:

[manager]

Is it expensive to go to the database?

Is there a lot of work?

Is the construction period long?

Are there any risks?

[architect]

Can you use MySQL to carry the existing business scale?

Are there any technical risks?

Is it necessary to introduce sub-database and sub-table?

Is it necessary to introduce caching?

Is the research and development complex?

How long does it take to build a project?

What are the data access characteristics?

Is there a large amount of data before and after migration?

[developer]

Is there a lot of complicated SQL?

Is the amount of modification very large?

Do you use Oracle dialects and proprietary objects that need to be modified?

Wait

In the face of the above problems, it is necessary to quickly understand the objects, statements, access characteristics, performance and so on of the existing Oracle, and evaluate the technical solution, migration scheme and subsequent workload accordingly. In other words, we need to "portrait" our database. Based on the above database portrait, it will be very helpful to guide the whole cycle of de-O work, including the following:

Decision-making stage: overall difficulty, cost (human and financial hours), technical risk

Architecture phase: technical solution, object structure, performance evaluation

Research and development phase: compatibility, complexity, testing

Migration stage: structure migration, data migration, data verification

It is based on this kind of demand that some companies launch evaluation products, such as Ali's database and application migration service (ADAM), but such products often need to deploy agent and upload analysis packages, which is not feasible for security-sensitive enterprises. My company faced this problem when it started to work in O two years ago. Therefore, a green version of Mini Program is specially developed, which can be run locally to facilitate the evaluation work.

Address: https://github.com/bjbean/oracle-estimate-report

Second, design ideas

Collect and summarize the Oracle database information, including environment, space, objects, access characteristics, resource overhead and SQL statements and other six aspects of information, fully cover the actual operation of the database. To be more targeted for information collection, the tool sets some thresholds through parameters. By running the command line, the information is collected and the WEB version of the evaluation report is produced, which is reflected visually. It can be used not only as a basis for de-O evaluation, but also as a data reference for follow-up transformation.

III. Interpretation of Portrait

The following is an interpretation of the report data and an explanation of the common de-O selection-MySQL.

3.1 Summary Information

Displays a summary of the collected targets, including IP, instances, users, and so on. Pay attention to the analysis time, the script will extract the database execution characteristics (within 24 hours), so it is recommended to run after the business peak.

3.2 Spatial information

Space size is one of the key indicators to be considered in database selection, and it will also affect the subsequent migration. If the scale of the database is large, split treatment should be considered. The principle of splitting is to control the size of the single database as much as possible. Generally speaking, the following split priority principles can be followed:

1) Vertical split of business layer

At the application level, the data is split according to different business lines. For example, the e-commerce platform is split according to orders, users, goods, inventory and so on. Separate parts, business cohesion, no strong data dependency.

2) split the business layer horizontally

Within the same business, the life cycle management of data is established and the hot and cold layers of data are carried out. According to the different characteristics of data access in different layers, further splitting can be done. For example, in the e-commerce platform, orders can be divided into active orders (refundable within two weeks), inactive orders (two weeks to half a year, customer service is acceptable), and historical orders (more than half a year).

3) Application layer sub-library sub-table

If the scale of a single library is still large after the above split, you can consider using sub-library and sub-table technology. The usual practice is to introduce the database middle layer, logically virtualize a database, but physically divide it into multiple databases. This is a less "elegant" solution because it is difficult to apply transparently. In other words, there must be a compromise in research and development, at the expense of some database capabilities. Common technical solutions can be divided into three categories: Client, Proxy and SideCar. Proxy mode is recommended (SideCar mode can be considered in container deployment).

4) basic layer distributed database

What is more thorough than the "sub-database and sub-table" approach is to directly use the distributed database. It provides a solution that can host a larger scale (capacity, throughput). In recent years, distributed database has gradually matured, popularized and started to try to use it in key scenarios.

3.3 object Information

For the objects in Oracle, there are different key points to be considered in the modification. Summary data are given in the report, and detailed data can also be given for easy query.

1) tabl

The excessive number of tables directly affects the size of the data dictionary, and then affects the overall efficiency of the database. From the point of view of MySQL, file handles and other issues need to be considered. There are no fixed rules for this indicator and need to be considered according to the circumstances. This is more about the data architecture level to avoid too many data tables in a single database. I have experienced 100000 tables in a single database with poor performance, and integrated into 20, 000 optimization cases after optimization. If MySQL is selected, it is recommended to have no more than 5000 tables in the single database and no more than 20000 tables in the library.

2) Table (large table)

Controlling the size of a single table is one of the key points of the design, which directly affects the access performance. If the table is too large, you should consider using the above principles to split it. There is no general rule for table size, which can be configured through parameters. It can be set according to two dimensions: physical size or number of records. The key point here lies in the access method of the table. For all simple KV access, it is better to have a larger scale; if the access is more complex, it is recommended to set a lower threshold. For example, MySQL, large table complex query or multi-table association are not good at scenarios, so consider using ES, solr+hbase and other methods to deal with complex queries asynchronously.

3) Table (partition table)

Since 9i and 10g, the partition function of Oracle has become more and more perfect and enhanced. It can be said that it has become a sharp weapon for Oracle to deal with massive data. However, partitioning is still not recommended for MySQL. On the one hand, with the enhancement of hardware capacity, the bearing capacity of single table becomes larger; on the other hand, the use of MySQL partition also has to face the problems of "DDL magnification", "lock change" and so on. If the team has a good command of the database middle tier, it is recommended to use a lower complexity split-table technique. This may slightly increase the amount of research and development, but for operation and maintenance, there are many benefits.

4) Fields (large objects)

Large objects are not recommended in any database. If you use it, get rid of it while you transform your work. The large object function is a rib for the database. Due to the ACID capability of the database itself, efforts should be made to preserve more important data.

5) Index (B-tree)

Too many indexes will affect the efficiency of DML and take up a lot of space. Through the "index / table", we can roughly reflect the reasonable degree of the number of indexes. There is no recommended value here, which can be considered as appropriate. For any database, there is a similar problem, that is, how to "build a strategic index strategy." Here you can refer to the following table (selected from Li Huazhi's book "Mass Database Solutions") to sort out the index requirements. Create and maintain the index scientifically.

6) Index (other)

Oracle supports other types of indexes in addition to the usual B+ tree indexes. If you choose other databases, then these indexes need to be modified and implemented in other ways.

7) View

The view, as a logical encapsulation of SQL statements, makes sense in some scenarios, such as security. However, it has high requirements for the optimizer, and Oracle has done a lot of work in this area (see the author's book "SQL Optimization Best practices"). For MySQL, it is not recommended to use, consider the transformation.

8) triggers / stored procedures / functions

For the database, it carries two kinds of capabilities: computing and storage. As the most difficult component to expand in the entire infrastructure, it is important to maximize the core capabilities of the database. Compared with storage capacity, computing power can be solved through the application layer, and the application layer is often easy to expand. In addition, taking into account the future maintainability, mobility and other factors, this part should be solved on the application side.

9) sequence

A sequence in Oracle that provides an incremental, discontinuous guarantee serial number service. There is a similar implementation in MySQL, which is done through self-incrementing attributes. This part should be able to migrate, but if the concurrency is very large, you can also consider using the solution of the generator.

10) synonyms

Synonyms are the manifestation of data coupling, no matter in any database, should be discarded. Splitting on the business side should be considered and is no longer dependent on this feature.

3.4 access characteristics

Here is a collection of Top20 with the highest number of DML in the database in the past 24 hours. This directly reflects the "hot" object of the current system's operation. These objects need to focus on evaluating their performance after selection and before migration. The hot spot pressure of these objects can be reduced by considering split, caching and other means. It is not only limited to these objects, but also recommended to establish a "business stress model". Through the full understanding and evaluation of the business, the business logic is abstracted and transformed into a data pressure model. The difficulty here lies in the abstract ability of business logic and the proportional evaluation of module business volume.

Form pseudo code similar to the following:

The stress test code can be compiled according to the above pseudo code. Call the test code through some tools, creating the pressure to simulate the test. This is of great significance for system transformation, upgrade, capacity expansion evaluation, new hardware selection and so on. In the specific de-O work, whether the new technology scheme meets the needs can be evaluated and verified by this method. Use more business language to compare the changes in carrying capacity before and after going to O. This is also one of the factors to consider whether the technical scheme is feasible or not. Of course, the above information, which only includes DML, is not included in the query part, and the data can be obtained from Oracle AWR. More completely, we can consider combining the application to do the full-link pressure test. Jiaozuo traditional Chinese Medicine Gastrointestinal Hospital: http://jz.lieju.com/zhuankeyiyuan/37845056.htm

3.5 Resource consumption

Here is a list of resource usage for the last 24 hours. These data have two main purposes: which hospital in Zhengzhou treats infertility and infertility well: byby.zztjyy.com

1) evaluate the overall load

Because the above metrics are shown by Oracle metrics, there is no direct analogy to other databases. The load pressure can be evaluated based on expert experience and historical data. One of the bases used to evaluate other technical options. Some of these indicators (such as user calls, etc.) can be transformed into quantitative indicators to guide follow-up testing.

2) evaluate the bottleneck point

In the case of a very prominent indicator, it indicates that there is also a bottleneck in the existing business, so try to consider it at the design stage when moving to other options, and focus on testing to reduce possible technical risks.

3.6 SQL statement

The rewriting of SQL statements is the most troublesome part of the whole migration work. Unless it is a complete refactoring, it is a task that requires attention to SQL rewriting. This involves a lot of content, such as rewriting quantity, complexity, performance comparison and so on, many of which still need to be screened manually.

The author has had such experience that the project team took one month to complete the "structure + SQL" migration of a project, but it took another 3 months to complete sentence optimization or even structure adjustment. The reason is that the statement after the migration is launched can not meet the performance requirements. And this is on the side of the line, while the adjustment, the process is extremely painful. Therefore, the early identification of the existing SQL situation is of great significance for the evaluation of workload, rewriting difficulty and performance evaluation. The above section is a collection of all the SQL of the analysis user in history (you can turn on the details switch to display the full amount of SQL), which contains the following dimensions. Jiaozuo traditional Chinese Medicine Gastrointestinal Hospital: https://www.jianshu.com/p/b8966d1a468e

1) Total number of SQL

This indicator can approximately reflect the busy degree of business. In addition, it can also be used for the proportional analysis basis of subsequent problematic statements.

2) Super long SQL

Statements that exceed the specified number of characters are listed here, and the threshold can be configured by parameters. If you are considering MySQL, it is recommended to use a "short and pithy" SQL, which generally performs poorly in the face of complex SQL. Then for these super-long sentences, they are all objects of concern, at least those that are prone to problems.

3) ANTI SQL

Reverse query, database processing are more difficult, this part also tests the optimizer. Although reverse queries have been well optimized in newer versions of MySQL, this part is still worth paying attention to.

4) Oracle Syntax SQL

Writing with Oracle characteristics, that is, the dialects of Oracle (such as unique functions, pseudo columns, etc.), all of which need to be dealt with in the migration. Of course, there are also some manufacturers who declare that their products are compatible with Oracle syntax, but it is also recommended to do special tests for these.

5) Join 3 + Table SQL

Multi-table association is also a comparative test of the optimizer. In particular, the association efficiency between MySQL tables is low, so it is not recommended to use associations with more than 2 tables. Here are 3 or more associated queries that need to be modified. For particularly complex queries, consider unloading them to the big data platform to complete.

6) SubQuery SQL

The situation of sub-query is similar to the above, which MySQL is not good at. Although the optimizer can be optimized to some extent, it is still worth paying attention to.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.