In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This article is based on the speech delivered by Yang Tingkun at the Ninth China Database Technology Conference on May 11, 2018.
Lecturer introduction:
Yang Tingkun, a senior consultant and moderator of ITPUB Oracle database management, known as "Elder Yang", has persisted in Oracle technology research and writing for more than a decade, known as "the encyclopedia of Oracle". So far, he has published more than 3000 technical articles on his blog. In 2010, he co-edited and published the book "Oracle DBA Notes" with Eygle, and was awarded the title of ACE by Oracle in 2007.
Sharing outline:
Basic concept of zoning
Zoning evolution history
Zoning best practices
The latest features of the partition
Body speech:
It's a pleasure to be reunited with you at the database performance optimization session of DTCC 2018. This time I would like to share with you the optimization features related to partitioning. I will mainly introduce some of Oracle's latest partitioning technologies and our best practices in our day-to-day operations.
one。 Basic concept of zoning
Oracle's definition of partition is to split the data of a table into multiple segments according to internally defined rules. After the partition, each partition is a separate data segment. The biggest advantage of Oracle partitioning is transparency, applications do not need to know the underlying data, and access is the same as before. In other words, when the table is made into a partitioned table, the program can run without any modification and adjustment, and do not make any adjustments if you change the partition table back to the regular table. However, this does not mean that after the application, we do not need to know whether the table is partitioned or not. If you want partitioning to bring additional performance benefits, then the partitioning strategy must be combined with applications and business access methods.
Oracle provides several ways to access partitions, the most commonly intuitive of which is to specify to a partition directly through a partition extension statement. However, we do not recommend this approach, and our more recommended way is to let Oracle help you locate some of the data you need to access by really controlling access to data and adding regulatory conditions.
What are the benefits of zoning? It turns out that all operations are based on a large table, but after partitioning, a large table is transformed into several small units, based on which we can do operations such as deletion, truncation, migration, indexing, and so on. Partition provides a good means of fine-grained operation.
When it comes to partitioning, many people's first reaction is the performance advantage it brings, but in fact, I think the biggest advantage of partitioning is that in terms of manageability and maintainability, the cost of managing many small tables or small data segments is much lower than that of managing a large table. and the average maintenance time will be less, but faster.
The second benefit is enhanced usability, which is officially claimed by Oracle. Originally, if there was a problem with a data file corresponding to a table, it would affect the entire table, but now it may affect only one of the partitions because it is logically isolated.
Based on the above two points, we can begin to talk about performance improvement. In earlier versions, Oracle only said directly that there was a certain improvement in performance, but in fact this is not accurate, because if the partition is not well designed, your performance will degrade. Later, Oracle itself realized that, so since Oracle 11, Oracle has split performance into two different scenarios: OLTP and OLAP.
OLTP deals with a large number of concurrency in a short period of time, so the advantage of partitioning is to reduce the contention of shared resources and eliminate hot blocks; OLAP is faced with massive data processing, so we need to make better use of parallelism to improve performance.
II. Oracle Partition Evolution History
Oracle first introduced partitions from version 8. The table above lists the partitioning features of each version and interprets them from the perspectives of functionality, performance, and management. Although many people are using partitions, as far as I know, many of the features they apply are still concentrated in versions 8 and 9, such as commonly used range partitions, historical partitions, list partitions, and so on. But in fact, in later versions, Oracle has always had a lot of better and updated features for us.
Many practical new features have been added to Oracle 12 and 18c, which greatly simplify your daily operation. Next, I will focus on these features.
three。 Zoning best practices
Next, let's introduce the best use scenarios for partitions.
The best use scenario for range partitioning is for data with natural time attributes. For example, the order time, generation time, startup time recorded in the system, and so on. Of course, it is not possible to use range partitioning as long as there is time attribute data, but also depends on whether the business has visibility requirements for time. In addition, it would be better for a business to be more concerned about recent data, so range partitioning can be a great help for performance improvement and cleaning of expired historical data.
As the data accumulates, we all worry that the database is getting bigger and bigger. If the business has clear requirements for the life cycle of the data, then we can keep the existing system in a relatively stable state by cleaning up the data.
How to clean up the data? Traditional Delete data cleaning will face many problems, such as inefficiency, unable to free space and so on. But if our data store has a clear time limit, then partitioning is a good choice.
The cost and speed of cleaning up the partition is something to look forward to. When cleaning up data, we recommend that you build a global index as much as possible. The data cleanup operation will affect the usage of the global index, and even cause the index to fail. However, if we do partition data cleaning on a regular basis, it will not affect the role of the global index.
What are the benefits of range partitioning? First of all, the distribution of data is relatively uniform, because it is divided according to time, and the quantity is controllable; the speed of cleaning up data through DDL is very fast, and there will not be a large number of redo and undo problems. At the same time, in the design, we should also consider to let a query be completed in a partition as far as possible to improve the query efficiency. By cleaning partitions by regular DDL, the size of partitions and tables can be maintained at a stable level, and the index will not grow rapidly over time.
We just introduced that range partitioning is the most common way to clean up out-of-date data, but in a real production environment, we are faced with a variety of different scenarios. For example, we cannot simply delete all the data, because there may be a small amount of data that needs to be retained.
Faced with such a scenario, if you use Delete to delete, you will find that although we use partitions, we do not enjoy the benefits of partitions. And if you clean up the data in the original way, you will inevitably encounter the original problem.
Do we have a better solution? We can do this in the way of Insert+Exchange. I will delete this part of the data from the table first, and if there is a small amount of data needed, I can insert it back in. This approach not only ensures efficiency, but also avoids previous problems.
The picture above is that I simply show you the whole process through the code.
There is a T_PART table and a P (3) partition, assuming that seven or eight thousand pieces of data are unimportant, but 30 of them may need to be retained, so I will use the Insert+Exchange approach mentioned earlier. Here is another tip, we are first Insert, then Exchange. Why do you do this? if the consistency of this table is very important, we should lock up the table before operation to prevent others from manipulating it, and then Insert the data that needs to be retained into a temporary table, and then do Exchange. In this way, we can keep the data consistent at all times.
In addition to the above challenges, we may also face the challenge of master and child tables. Assuming that both my main table and child table are partitioned, we may face the following challenges. First, the child table may not have a partitioned time column for the main table, for example, there are two tables, the order table and the order detail table, and the two are master-child table relationships. The order table is partitioned according to the order time, so the partition time column of the order details will be controversial. If it is created according to the order details, then it may be inconsistent with the order time, and the two themselves have an one-to-many relationship, so in the data cleaning, the main table may not be cleaned up, if you want to use the order time to partition, then you need to redundant order time in the table.
If it is only redundant order time, I believe many people can accept it, but the challenge does not stop there. Once there are external key constraints, the master table cannot perform truncate operations, so you must first remove the constraints disable, which will cause a lot of inconvenience to the operation and maintenance staff.
How to solve this problem? Oracle 11g provides us with a new function called reference partitioning, which is solved like this: the fields of the main table are still partitioned according to the time column of the primary table, but the child tables no longer need redundant primary table fields, but directly rely on the primary foreign key relationship with the primary table to do partitioning.
The reference partition applies to the master child table to establish the same data policy, while the child table does not have appropriate partition fields, and the master child table is often associated with the scenario of access. In addition, Oracle 12 also enhances this to support cascading, in other words, when I have a master child table, I don't have to truncate the child table first, but truncate it directly in the main table, and it will recursively truncate all the child tables.
Hash partitioning is relatively simple, and its applicable scenarios are those with no time attribute and lack of business fields that distinguish data. If the system is facing competition for shared resources, you can also use hash partitions.
I suggest that the hash partition key value column should choose fields with low repetition as far as possible, so that it is not easy to lead to uneven data distribution; the number of partitions is preferably power of 2, which is also to avoid uneven distribution of partitions; for tables without time attributes and clear business attributes, we usually do not clean up regularly, so I recommend using a global index. In addition, the hash partition index can effectively solve the problem of index hot blocks.
Some people may wonder, since my data has no business characteristics, why partition it? We have encountered such a case before, the customer's order of magnitude is very large, 400T of data may have 395T of data in the same table, the problem faced by the customer is that the table may not hold so much data, Oracle has no limit on the capacity of the table, but there is a limit on the capacity of the tablespace, then you will find that if you do not use partitioning, this problem is unsolved.
One of the problems that hash partitioning can solve is the hotspot block problem. Why is there a hot block problem? For OLTP systems, there will be a lot of data insertion. There are generally two types of data insertion, one is the primary key, and the other is the time class. What these two sets of data have in common is that the newly inserted data is always the largest. When multiple people insert at the same time, there is no effect because the table is disordered, but the index is ordered, so there will be resource contention when updating the index. If you are an RAC architecture, the competition between GC multiple nodes will further aggravate the hotspot problem.
The traditional solution is to use the inverse key index provided by Oracle to reverse the key value and disperse the hot spot block. However, the inverse key index has a big disadvantage, that is, although it can solve hot issues, it does not support range scanning.
How to solve the hash partition? We create an index, specify that the index is partitioned in a hash manner, and then specify the number of partitions. What are the benefits of this? It turns out that I am an index with only one highest value, and everyone is fighting for this highest value, but now that I have 32 partitions with 32 highest values of the index, there will be much less competition for resources. Of course, technology has two sides, in this case, when you do an index range scan, you will not only have to scan one index tree, but 32 index trees, and the extra cost will be much higher. Therefore, the choice of the number of zones is also very particular.
List partitioning is most common for scenarios with certain business attribute data. We can do zoning according to clear business characteristics.
The region field is the list partition common candidate key value column, data distribution and access method to determine the partition key value division, at the same time, I also suggest setting a DEFAULT partition, assuming that we save most of the domestic to the list out, but one day there is no corresponding partition of the provincial data came in, if there is no DEFAULT partition will directly report an error, and with this partition, the data will have accommodation.
The list partition matches the business better, and the business can clearly know where the data is stored and find it efficiently, without too much use of Oracle internal associations and queries.
There are two common indexes, one is global index, the other is local index. There is no difference between a global index on a partition and an index on a single table, no matter how many partitions there are on the table, there is only one index tree, and all the data comes from the partition. The advantage of this is that even if the table is partitioned, the access cost will not increase, but the disadvantage is that if you do some DDL operations on the partitioned table below, it can easily lead to index invalidation.
Local index and table partition is an one-to-one corresponding relationship, when you do data operations on the table, Oracle will also operate on the index partition, will not lead to the unavailability of the partitioned index, while the local index also supports parallel scanning and creation.
The disadvantage of the local index is that if you want to create a cost index with the primary key, then the primary key must be included in the index, and there is no guarantee of data consistency between each independent index tree, so the key value column must be added to ensure uniqueness, but the query with undefined partitions will scan all index partitions, which will add extra overhead.
Finally, let's take a look at the hash partition index used to eliminate hot spot blocks. although the index can also be partitioned, it does not have any relationship with table data. Any index partition can access all tables. Its advantage is to disperse hot spots. However, the disadvantage is that when accessing the index, you need to access each partition of the index to get a complete record.
four。 The latest features of the partition
Oracle 12 provides many new features that are valuable for day-to-day maintenance, so let's introduce them in more detail.
First of all, let's introduce some partitions. We just mentioned that there are two commonly used indexes, the global index and the local index, so what are the partial partitioned indexes? In the real environment, it is not necessary for me to create indexes on all partitions. The access frequency of many historical data partitions is very small and there is no need to pay attention. Some partition indexes just meet such application scenarios. It can compress the index range to a reasonable range.
Suppose you create a partition table with 9 partitions that store 17-year and 18-year data respectively, and four partitions that store 17-year data we do INDEX OFF operation. When creating the index, we will find that although 9 index segments have been created, the four partitions that have done the operation are not available. In fact, only 5 index segments have been created.
When you access a partial partitioned index, it directly divides the execution plan into two parts, one scanning in the index segment and the other scanning in the corresponding content. The biggest change for some partitioned indexes is that they can tell which indexes are available and which are not.
If you add a time to the specified key value, then this time is our partition line. When we access the data, Oracle can know which zone I want to visit according to time. Other areas that do not need to access also have data, but Oracle will have a filter with a false condition here. When you visit here, you will skip to the next section.
One of the most powerful features of Oracle 12.1 is index asynchronous maintenance. When creating a range partition, it is not recommended to build a global index because you often have to clean up the DDL. But Oracle 12 solves this problem, assuming that there are 310000 records in the corresponding data partition on January 4 and nearly 300000 records in the corresponding data partition on March 5. When we go to create the partition, it takes about 0.18 seconds, and then when we check the status, there is no accident that the global index is invalid. After the reconstruction, we add a Update Index operation to the partition creation, because to synchronize the DDL operation, it updates all states, and many people think it will become very slow, but it only takes 0.17 seconds, which is faster than the previous 0.18 seconds. This is because after Oracle 12, it will automatically identify the data, not really synchronous maintenance, but asynchronous maintenance in the background, not only improve the availability of the index, but also improve the efficiency.
A very interesting feature of Oracle 12.2 is automatic list partitioning. We previously suggested that you must add the default value when partitioning the list, otherwise the data insertion will report an error, especially when a table has only one Keyboard, it is very difficult to insert data in the automatic list partition. The convenient function provided by Oracle is to convert a regular table into a partition table online, so that I don't introduce downtime, maintenance and so on because it is a partition table.
Oracle 18c provides stronger changes to this situation. For example, I created a partitioned table and three indexes. When I changed from a regular table to a partitioned table, but I was not satisfied with the partitioned table policy, I could change the data table policy directly through the Oracle statement, and this operation could be done online when DDL occurred.
The last function is for the changed index, we can do partition MERGE operation online and merge several partitions online.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.