In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "what are the reasons and benefits of using database partitioning". In the daily operation, I believe that many people have doubts about the reasons and benefits of using database partitioning. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what are the reasons and benefits of using database partitioning?" Next, please follow the editor to study!
What is a database partition?
Partitioning is dividing tables and indexes into smaller parts, or even subdividing them into smaller parts. Think of it as a large warehouse (a large table) with millions of different magazines with different themes and years (for example, 2000-2019). Partitioning means that you will organize them in different rooms in that large warehouse. They still belong to a warehouse, but now you group them at the logical level according to the database partitioning policy.
What are the benefits of using database partitioning?
A good partitioning strategy is feasible for a number of reasons, such as better performance, better manageability, higher availability, or load balancing.
(1) better performanc
If you take the magazine warehouse above as an example, you are looking for the October 2009 issue of Sports Illustrated, and you partition the warehouse in the year of release, just look at the room or find the 2009 partition you want. Now, imagine that in a "2009" room in the warehouse, you put several large boxes, and then put magazines in these boxes according to the subject or publication (Sports Illustrated, Professional photographer, fishing Heroes, …) . This will create a subpartition, which will make it easier to find the magazines you need, as you will no longer need to browse all 2009 magazines, but you can go directly to the 2009 conference room and go directly to the Sports Illustrated box. This is the division of performance. Now, if you put different magazines in the cabinet and all the back of the magazine (title and month / year) are outside, it will be the same as adding an index. In this way, you can quickly find the right magazine and pick it out without having to pick each magazine and read the whole magazine until you find the one you need. Also, if you need data (magazines) from two or more different partitions (rooms, boxes), you can have two or more processes (people) looking for it, and you can still make sure they don't look at the same part twice.
(2) better manageability and load balancing
You can store different logical partitions on different physical disks and, in some cases, even on the server. In this way, working on one partition / disk / server does not necessarily mean that other partitions have to be offline. The same is true of maintenance, where you can easily delete (or back up and delete) the oldest partitions if you keep only the last 10 years of data and partition the table on an annual basis. Going back to the warehouse for comparison, if you divide the warehouse into multiple rooms each year and finish the work in the room in 2011, then all other rooms can still be used. If you want to delete all the magazines from 2000, you only need to take everything out of one room instead of browsing every magazine every year.
(3) higher availability
Downtime due to failures or other reasons does not necessarily affect other partitions. The same is true for backup and recovery, where you can operate on each partition independently of the others, reducing downtime.
When should I partition my table?
The following things need to be considered in the partitioning strategy:
Large tables are always a good candidate for partitioning, and everything that goes beyond 2Gb is a start.
If your table contains historical data, and the data will be added in chronological order. For example, this is especially true if you load data in batches on a monthly or daily basis. In this type of partition, one partition can only keep the latest partition updatable, while other partitions (old) are read-only.
If you need to allocate tables between different types of storage (and in some cases even servers). For example, most queries run on partitions for the last three years, but you need to keep 10 years of data in the table. The oldest seven years are rarely queried. This allows you to place the latest three partitions on high-performance SSD drives and the oldest seven partitions on older and cheaper but slower HD drives.
Why not use indexes but partitions?
Okay, the index is good. They're even great! They are absolutely essential tools in your database performance toolbox. But the index is very good at retrieving a small part of the data, especially the data scattered throughout the table. If the index can provide you with more than 20% of the data, they can be considered to perform best in the end, and the real turning point is about 1% of the data. 5%. But, of course, it depends largely on many different factors, such as servers, settings, data, weather (no, not weather!),. The most important thing: don't think about partitioning or indexing.
In a good partitioning strategy, there is still plenty of space and opportunities to make good use of indexes. For example, the magazine warehouse example I mentioned above is partitioned annually inside the warehouse and by publication inside the room. By making the title / month more visible in the cabinet rather than in the box (which creates an index), it leaves more opportunities for performance improvement.
What database partitioning strategy can I base on?
Each row you enter in the partition table should explicitly belong to a specific partition. The partition key defines the partition in which the row will be stored. This partitioning key can be a field or a combination of multiple fields, which is unique as long as it is ambiguous. Going back to the magazine example, you can use "year of publication" as the partition key, but you cannot use "pictures containing car brands" as the partition key and partition for each brand, because this means that the latest "Top Gear Magazine" needs to be in both the "Ferrari" and "Aston Martin" sections, and there are more.
What are the different types of partitions?
Range partition
List partition
Hash partition
Subpartition
(1) range Subarea
Partition by range of values. This means that if the value of the partition key is within the partition range, the table is partitioned so that the row belongs to a specific partition. For example, magazine: we divide the warehouse into multiple rooms every year. This means that we have range partitions, where each partition ranges from 01-01-YYYY to 31-12-YYYY. Therefore, if I have a magazine published in May 2015, it will enter the range from May 1, 2015 to January 31, 2015 to December 31, 2015.
Interval partitions are a special type of range partitions where ranges in interval partitions are always time-based and are automatically created when new data is added to the table that should belong to the new interval range.
(2) list Partition
By partitioning the database through a list, you can explicitly control and define which value enters which partition by defining a partition key in the definition of the table and each partition:
SQL > Create table magazines (Pub_Title varchar, Pub_Date (date)) partition by list (Pub_Title) (partition Pub_Title1 values ("Sports Illustrated", "Mens Health"), partition Pub_Title2 values ("Vanity", "Flair"), partition Pub_Title3 values ("Yoga Magazine", "Zen Magazine"))
Or, in the case of a magazine, you might already be able to read information from SQL: we divide the warehouse into rooms for each publication title, but we don't have enough rooms for each different publication title, so we "list" several for each room. In the first conference room, we put Sports Illustrated and Mental Health; in the second conference room, we put vanity and genius; in the third conference room, we put Yoga Magazine and Zen Magazine. Does that make sense?
(3) Hash partition
In a hash partition, data rows are mapped to partitions based on the hash value of the partition key. This feature is useful if you want to use partitions on tables that do not have explicit business or data logical partitions and require evenly distributed partitions. For example, you have an entire warehouse full of magazines, but the publication title and publication date are random. There is no complete year, and so on. However, you want to distribute them evenly among ten different rooms in the warehouse so that when you need to retrieve one (or more) magazines later, you can send ten people to look for them at the same time. Ten different rooms. It's more useful than hash partitions. For example, you select a combination of publication title and publication date as the partition key, then have the database hash the key, then select the appropriate partition, and distribute all magazines equally to all rooms.
(4) Sub-partition
A subpartition (also known as a composite partition) creates a partition within the partition. It is a further partition of each partition of the partitioned table. Let's go back to the magazine warehouse. First of all, we put the whole set of magazines in different rooms every year. This is the first partition based on the range partition. Next, put boxes in each of these grade rooms, and then put all magazines with the same publication name in each box. Since we already have only one publication for a specific year in that room, we will only have a specific publication name and a magazine for a specific year in that box. Therefore, it is very easy and quick to find "Yoga Magazine", "November 2017 edition". Or, in the code:
SQL > Create table magazines (Pub_Title (varchar), Pub_Date (date) Partition by rage (Pub_Year (Pub_Date)) Subpartition by list (Pub_Title) (partition Pub_Title1 values ("Sports Illustrated", "Mens Health"), partition Pub_Title2 values ("Vanity", "Flair"), partition Pub_Title3 values ("Yoga Magazine", "Zen Magazine"))
As you can see, partitioning is a valuable tool in addition to many other tools in the database toolbox, and is useful for improving database performance, availability, and maintainability. Which partitioning strategy you choose certainly depends on your data, but it also depends on the technical possibilities of the specific database system you are going to implement.
At this point, the study on "what are the reasons and benefits of using database partitioning" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.