What are the C-Store knowledge points of Vertica? 07/16 Update SLTechnology News&Howtos

What are the C-Store knowledge points of Vertica?

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the C-Store knowledge points of Vertica". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the C-Store knowledge points of Vertica".

Background knowledge

Vertica is a commercial product of C-Store, and C-Store has not been in development since it released version 0.2 in 2006. Some of C-Store started the Vertica project in 2006 and was acquired by Hewlett-Packard (HP) in 2011. Vertica does not use the code of the C-Store prototype system, but only draws on ideas.

As of 2012, Vertica has been deployed in more than 500 production environments, and at least 3 projects have reached the PB level. Like C-Store, Vertica provides a classic relational interface, and Vertica proves that a system can support both complete ACID transactions and efficient queries of PB-level data. I feel that this statement has surpassed the current NewSQL distributed relational database.

Business scenario

Transactional: there are many requests per second (thousands), and each request processes only a small portion of the data. Most transactions insert or update a row of data.

Analytical: there are only a few requests per second (dozens), but each request traverses a large portion of the data in the table. For example, aggregate sales data by time and space.

Now that the data in a table in a commercial company has reached millions or billions, the difference between transactional and analytical scenarios is becoming more and more obvious. Optimization for analytical scenarios alone can improve the system performance by several orders of magnitude compared with one-size-fits-all.

The difference between Projection and materialized view

Projection can be thought of as a materialized view with restrictions, but it is different from the standard materialized view, because projection is only the physical structure of the data, not an auxiliary index. Traditional materialized views usually also contain aggregates, joins, and other query results. But projection does not include. And the cost of maintaining materialized views in distributed systems is high, especially the support for aggregation and filtering is unrealistic.

All in all, materialized views are more complex than projection, and their implementation is complex, which needs to be discarded in distributed systems.

Join index

The join index mentioned in C-Store is obsolete, it is too expensive to maintain this index, and requires a lot of extra id. So how do you build a complete line? Vertica maintains a super projection that contains all the columns, that is, a complete table.

Storage model

For each projection, which data is stored in a segment and on which node is determined by the segmentation strategy. The data is sorted only within each segment. The first projection is sorted by hash (sale_id) segments and by date. The second projection is sorted by hash (cust) segments and by cust.

Inter-node segmentation: Segmentation

The segmentation mentioned here is between nodes and is used to determine which data is allocated to which nodes. The segmentation method is specified when the projection is defined. The segmentation is based on an integer expression. If you give a row of projection data, an integer is calculated and assigned to different nodes according to the size of the integer. The author gives a piecewise formula here.

In fact, it is a consistent hash ring, which will be introduced later.

Internal data partition of a node: Partitioning

Partitioning means that each data partition is stored in a file that is physically separated.

The first benefit of partitioning is batch deletion, which usually divides the data into multiple files according to year and month, so that a file can be simply deleted when the data is deleted for a period of time. If the data is not partitioned in advance, you need to traverse the records one by one.

Batch deletion can only be achieved when multiple projection partitions of a table are the same, otherwise only some projection partitions can be deleted, so Vertica partitions are specified at the table level.

Another benefit of partitioning is to speed up the query, each partition has a summary information, you can quickly skip some partitions.

I find his interpretation of partition very awkward. In consistent hash, partition is used to control which node the data is stored on.

Three components

Like C-Store, Vertica includes a Read Optimized Store (ROS) and a Write Optimized Store (WOS). In general, each file can store one column or multiple columns, which is similar to a hybrid architecture.

The data is not compressed in WOS, because it is very small, and there is no difference between rows or columns in memory. Vertica's WOS has changed from row to row, and then to row, mainly because of software engineering considerations, and there is no difference in performance.

Tuple mover: two main functions: (1) Moveout, moving the data in WOS to ROS, that is, flush (2) Mergeout, merging small files in ROS into large files. In fact, it is the concept of LSM, which is called differently.

Vertica has a feature that allows new writes to be written directly to ROS when flush. I don't understand this. How to keep the order? Although the author finally mentioned this function again, saying that it is a waste of memory to write the imported data into WOS when initializing the imported data, but the memory is used for sorting, otherwise ROS will not be messed up.

Fault tolerance

To ensure that each projection is recoverable, each projection must have at least one buddy projection that contains the same column and the same fragmentation.

Because each projection can have its own sort key, there are two cases of recovery:

(1) like the sort key, you can copy files directly, and copy recovery is also done.

(2) sort keys are different, you need to query before writing, there is no better way.

In addition, Vertica can tolerate K errors, so when designing a projection, the database needs to ensure that each segment needs to be backed up at least 1 node. The meaning of this sentence should be to generate 1 projection directly, rather than simply copying segment.

Limitation

Vertica solves one of C-Store 's big problems: join index, but there are faults to pick:

It is not said how to generate projection, how to choose the order, how many copies to be allocated, and whether different projection will be stored in different order, which will slow down the write speed.

Generally speaking, users will not set the maximum space occupation, but will only set the number of copies, and no user will give the system a maximum available space limit, and then let the database fill up all these spaces by themselves. at most, it is a hooliganism to give an original data how much space and the proportion of the space allowed by the database, according to a preset available space to choose the number of copies of the database.

The load balancer did not mention how to do it.

Thank you for your reading, these are the contents of "what are the C-Store knowledge points of Vertica". After the study of this article, I believe you have a deeper understanding of what the C-Store knowledge points of Vertica have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.