Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The replacement advantage of Vertica in Communication Industry

2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

one。 Background analysis

Traditional relational database has occupied a solid dominant position in the enterprise market for a long time, and many people have not realized that there are other types of databases besides traditional relational database. Traditional relational databases are very good at transactional operations such as update operations. However, it is a bit stretched when dealing with bulk operations with a large amount of data. For example, DB2, as a relational database management system developed by IBM, is widely used in large-scale data warehouse projects, especially in the mobile industry. Since the establishment of the business analysis system, basically using DB2 database to build BI main warehouse to focus on data analysis, supporting internal management decision-making, marketing promotion and customer service and other work.

As big data brings a new business model and business growth, it not only puts forward higher challenges to the IT support work, but also puts forward new requirements for the existing technical architecture, including data warehouse support capacity. Today, with the rapid change of business requirements, the traditional database software seems to be stretched, and there are many shortcomings, including:

1. Performance is difficult to meet business needs

With the development of business, the amount of data increases geometrically after time accumulation, and the complexity of business puts forward high requirements for database performance. However, the traditional relational database has great pressure on IO,CPU and other resources on the support of complex SQL because of row storage and centralized operation, and it is difficult to meet the business requirements.

two。 Lack of scalability

Traditional database expansion can only be Scale-up (vertical expansion) to increase processors, high-end storage capacity and other resources to upgrade to meet the requirements of application performance, but larger and stronger servers are also expensive. After expansion, data balance operation or even database shutdown operation is required, which will directly affect production.

3. Poor high availability of products

There is a copy of the metadata of traditional databases such as DB2 configuration files and logical nodes, but the system defaults to the same directory and cannot be changed and can no longer be backed up, so its security is very low.

4. High investment cost

DB2 hosts need IBM minicomputers, ORCLE requires Exadata, and all storage requires devices that cost millions of dollars, such as EMC. Lisence for these software is also very expensive. And in terms of personnel expenditure, the cost of these database administrators is not low.

two。 Scheme selection

Considering that the distributed database supports the deployment mode based on X86, and it has become a trend for X86 to replace the traditional computer, it has the advantages of low cost and good expansibility, and the expansion mode of traditional database relying on computer upgrade has come to an end. It is decided to introduce MPP (Massively Parallel Processor) distributed database to replace the existing DB2 database. MPP in the database non-shared cluster, each node has its own disk storage system and memory system. The business data is divided into each node according to the database model and application characteristics. Each data node is connected to each other through a private network or commercial general network to cooperate with each other to provide database services as a whole. Non-shared database cluster has the advantages of complete scalability, high availability, high performance, excellent performance-to-price ratio, resource sharing and so on. Use MPP to synchronize to the data analysis bazaar after the completion of data modeling in the core data warehouse, and the data analysis bazaar as the big data Mart carries all the applications developed based on the database standard SQL.

The comparison of products is as follows:

1. Contrast IBM DB2

Vertica

DB2

Hardware architecture

True MPP architecture without sharing

No special nodes

Have master node

Have management module

Software architecture

Pure column database

Traditional row database

Compress

More than 12 compression algorithms, including Lempei-Ziv

You can specify a different compression algorithm for each column of the table

Compression ratio above 10x

Standard Lempel-Ziv

Table-level compression algorithm

Compression ratio of 1-3x

High availability

High availability through embedded k-safty

All nodes are available

No HA configuration required

Easy and simple configuration

Hot Standby hot backup mode

There must be at least one free Server

It will take 1-5 minutes to take over in the event of a failure

Complex configuration requires additional HA software

Daily management

No concepts of tablespace, index, MDC, etc.

Automatic database design

3 different levels of database parameters (db2set,dbm,db)

Need to manage tablespaces, indexes, partitions, MDC, etc.

Performance

It also supports real-time loading and real-time query.

50-1000 times performance improvement over traditional databases

It is impossible to realize real-time loading and real-time query at the same time.

Expandability

Add and delete nodes online

Implement database node expansion at the level of minutes and hours

Adding a node requires a restart

It usually takes hours or even days.

two。 Contrast Oracle Exadata

Hardware configuration comparison

Exadata 1/2 Rack

Vertica

Description

Server configuration

A total of 11, including 4 databases and 7 storage servers

11 DL380p Gen8

Keep the same number of Server as Exadata

CPU kernel number

Database node: 64 cores (2.9GHz focus E5-2690)

Storage node: 84 cores (2.0GHz focus E5-2630L)

176Core, 2.6GHz, E5-2670

20% more CPU cores

CPU processing capacity

SpectInt2006_rates

5559

7084

Overall CPU processing capacity increased by 28%

Memory

1024GB

1408GB

40% increase in memory

Hard disk model

600GB SAS disk 15000rpm (3.5)

900GB SAS disk 10000rpm (2.5)

The performance is basically the same.

Number of hard drives

eighty-four

two hundred and seventy five

3.3-fold increase in the number of hard drives

Available capacity

22.5TB

900GB*22*11=218TB

218TB * 70% (Raid5 loss) * 50% (K-safe=1) = 76TB

3.4 times increase in available capacity

Data loading speed

8TB/hour (theoretical maximum)

200MB/s per node, about 8TB/hour (measured average)

Basically the same.

Comparison of software features

Exadata

Vertica

Description

Data storage mode

Row storage + mixed column compression

Pure column data storage

Exadata can be compressed by columns only if the data entered in direct load mode

Compression method

6 compression algorithms (2 row compression, 4 column compression)

12 compression methods

Exadata can only specify compression at the table level; Vertica can specify different compression algorithms for each column of the table.

Loading and real-time query at the same time

Disable index is usually required for direct load loading, so real-time queries cannot be performed at the same time

Can be carried out at the same time

Vertica supports highly concurrent queries while data is loaded

Deploy Architectur

Shared everything

Architecture

Shared nothing MPP

Architecture

Shared everything architecture cannot extend too many nodes, while shared nothing's MPP architecture is more scalable and more suitable for parallel processing of large amounts of data.

Database management

Complex management requires very experienced DBA and dedicated OEM tools

Simple, automatic, without too much human intervention

Analysis function

A few simple analysis functions

Embedded multiple analysis functions and flexible analysis query

Hadoop interface

Not supported

Support

Vertica has an embedded interface with Hadoop to support both structured and unstructured analysis.

Cost comparison

Exadata 1/2 Rack

Vertica

Description

Hardware price

(transaction price / RMB)

800-10 million

2 million

DL380gen8 is quoted at 100000 per unit, plus some peripherals and services

Software price (transaction price)

8.64 million

3 million

The price of Vertica is estimated; Oracle is usually profitable.

3-year service charge

(800 / 8% / 864 / 22%) * 2 = 5.08 million

(2008.8% + 3000.21%) * 2 = 1.58 million

Hardware press 8%, software press 22% (Oracle) and 21% (Vertica)

Total investment for 3 years

21.72 million

6.58 million

Vertica scheme accounts for 30% of Exadata investment.

three。 Construction plan 1. Environmental deployment

In the aspect of disaster recovery, the main and standby database clusters in different places in the same city can be synchronized incrementally, so as to realize remote data disaster recovery. At the same time, in the aspect of backup, the product provides a special VBR backup and recovery tool, which can easily achieve full, incremental table granularity or library granularity of various forms of backup for the data in the entire cluster. Support data parallel backup and parallel recovery, backup files can be distributed in different backup storage areas, configure a different number of backup and recovery nodes.

The general design of the platform architecture is as follows:

1.1. Functional architecture

The construction of MPP resource pool cluster is mainly divided into two independent MPP database construction, namely the core data warehouse and the data Mart.

1.2. Physical deployment

The cluster can deploy more than 3 nodes, and the K-safy value can be set to n, that is, there are n redundant data for each data, and each redundant data is deployed in different racks.

In order to facilitate the future expansion of this project, we recommend a hierarchical network, with two 10 Gigabit access switches per cabinet, which are then cascaded to the core switches.

Switch:

Configure at least 2 core switches to form a high availability cluster.

Each cabinet is equipped with two 10 Gigabit switches and forms a high availability cluster, and each cabinet top switch is connected to two core switches through two 40G uplink ports.

Set VLAN for the internal cluster communication network of the MPP database to facilitate the isolation of external service networks.

The connection mode of each server: two 10GE network cards are respectively connected with the 10G optical ports of the two switches at the top of the cabinet through optical fiber, bind into a logical network card in the server section, set two sets of network IP addresses inside and outside, and set VLAN for the internal cluster communication network to facilitate external service network isolation, form a reliable internal communication network and avoid network failures.

For each computing node operating system disk, two disks are used as raid1 to achieve hot backup, or every seven disks are used as a group of Raid5,3 Raid5 and then Raid0 to synthesize a data disk to improve storage, read and write performance and take into account data security.

The 10-gigabit service network, the gigabit management network and the 20-gigabit binding network are adopted to solve the possible network bottleneck problem in batch data processing.

two。 Application migration

In order to reduce the interaction between economy-saving and economy-saving, once they are decoupled, all the existing DB2 applications such as warehouse model, regionalization application, mobile phone economy and so on are transferred to the MPP platform.

This migration and transformation adopts the strategy of gradual implementation according to the application, and follows the following steps:

Transfer the execution program code on DB2 to MPP directly, adjust the corresponding syntax according to the error, and through the batch adjustment mode, ensure that all programs can be executed normally.

After importing the data source, run the program, compare the result data on MPP with DB2, find the problem and solve the problem after finding the difference data, and ensure that the data on both sides are consistent.

Optimize the program (such as small table replication table, large table partition key is reasonable) and ensure that the time to generate data is not later than the time of DB2 generation.

After the stable operation of the program for half a month, switch the external interface data source, and finally achieve a smooth migration.

System optimization

With the gradual migration of provincial business applications to Vertica, the amount of business data has increased faster than expected, and there may be performance degradation and insufficient capacity, which are generally divided into two major categories of problems.

(1) loading problems, including: loader network instability, low loading efficiency, poor stability, loading timeout, loading data quality problems, and inconsistent loading file fields and table structure, etc.

(2) performance problems, including: performance degradation of the database after running for a period of time, slow execution efficiency of query insertion statements, exhaustion of system resources by junk SQL, unreasonable construction of Projection, a large number of invalid views and expired tables are not cleaned up in time.

The above problems include both platform level and business level (SQL quality and development specifications). Therefore, the system is specially optimized from the technical level and page level.

3.1. Technical level optimization

Network optimization: upgrade the network card driver, modify the network card mode, adjust the switch hash algorithm to complete the load balance of data traffic, adjust the TCP-related kernel parameters, and increase the size of the kernel socket receiving buffer area. After the improvement, the traffic of the network card can run steadily at 20,000 megabytes, the efficiency and stability of the network card are significantly improved, and the phenomena of packet loss and retransmission basically disappear.

Database optimization: optimize the control file parameters, load parallelism, increase the read-ahead cache, optimize the number of CPU threads, increase the communication delay parameters between the load server and the client, optimize the load timeout parameters, and reduce the load failure rate from about 30% to less than 0.5%. By optimizing the Projection configuration parameters, the execution time of the same SQL can be greatly reduced, such as the large table association operation between the customer table and the account table. Compared with the traditional database, the overall performance of Vertica is improved by 60% compared with the traditional database, and the database memory parameters are optimized to improve the system stability.

3.2. Business layer optimization

Because the platform application developers are not familiar with the new products, some of the SQL quality is poor (such as direct partition reading data, unreasonable table hash key selection, unreasonable replication or distribution table attribute selection, etc.), poor execution performance (redundant subquery, Cartesian product, unreasonable correlation fields), these problems are adjusted and optimized respectively.

At the same time, we also conducted product training and guidance, issued and continuously updated multi-version MPP database development specifications to help application developers make better use of the MPP platform. In this way, at least 99% of the SQL is running normally, and individual SQL time-consuming and newly developed SQL need to continue to be targeted optimized.

four。 Domestic mobile usage 1. Domestic mobile usage distribution

Data Analysis system widely used in domestic Telecom Industry

2. The application scope of Vertica in the mobile industry at present

BSS

OSS

MSS

Other

Business domain

▲ Business Analysis (BASS)

▲ detailed list (CDR) analysis

Analysis of ▲ flow Management

Precision Marketing Analysis of ▲

▲ History Library and report

▲ partner and Electronic Channel Analysis

User Classification and income Analysis of ▲

……

▲ signaling analysis

▲ Network error and performance Optimization

▲ network analysis

Comprehensive Analysis of ▲ Network

Comprehensive Monitoring and Analysis of ▲

▲ network capacity management

▲ customer experience Management

▲ roaming analysis

……

▲ ERP data warehouse

▲ financial statement system

Centralized data Analysis in ▲

……

▲ value-added service analysis ▲ platform

▲ big data analysis platform

▲ Internet Analysis

Analysis of ▲ user behavior

▲ user location Analysis

▲ clickstream analysis

……

Challenges

▲ has a large amount of data

▲ analysis and processing ability is required.

▲ has a huge amount of data

▲ requires high real-time performance.

High performance requirements for ▲ analysis

▲ has a huge amount of data

▲ analysis and processing ability is required.

▲ analysis algorithms are rich in requirements.

Advantages of HP

▲ ultra-fast analysis speed and unlimited scalability; ▲ SQL standard and openness

▲ real-time analysis and aggregation, unlimited scalability

▲ SQL standards and openness

▲ ultra-fast analysis speed; ▲ SQL standard and openness

The openness of ▲ Vertica and its integration with Hadoop/R

▲ ultra-fast analysis speed and unlimited scalability; ▲ unstructured analysis

3. Project case 3.1. Signaling analysis

Quasi-real-time analysis, realized by big data

60 nodes

HP BL460c G7 Blade (VC Flex10) D2200sb Storage (12*1TB7200RPM SATA)

Real-time data storage exceeds 200MB/s, including GB,IUPS,GN,DPI, etc.

The storage period of the original table is at the level of minutes

The storage period of summary table is 15 minutes.

Keep it for 1 month, the total amount exceeds 500TB.

Summary of multiple granularities: 5 minutes, hours, days

3.2. Network optimization

Quasi-real-time analysis to improve the quality of network service

Processing cloud: Hadoop cluster completes data analysis and summary

Storage cloud: HP Vertica cluster completes big data storage

6 node clusters with more than 30T of data

The main form table exceeds 20TB, and the minute-level quasi-real-time storage

3.3. Business analysis

Big data's high performance analysis supports fine marketing and differential competitiveness.

32 nodes, data volume 360TB

Greatly improved performanc

-- data compression ratio as high as 9 times

-- 10 billion-level data analysis query, second-level response, performance improvement of about 50 times

-- can support hundreds of users' self-help analysis.

Data self-service: business user self-service analysis

Marketing big data analysis:

Big data's Analysis of various Marketing modes and Promotion methods

-- Association analysis of customer attribute data, including customer profile query, tag customer base refresh, tag analysis data refresh

Migrating the main database to another 32-node cluster

3.4. Mobile big data analysis platform

Integrate BSS/OSS data, 280TB / 40 nodes

O-domain http/socket data, first enter the hadoop cluster for cleaning

Password A signaling data, 1-5TB/ days, first enter the hadoop cluster for cleaning

B domain divides data, 300GB/ days, and extracts directly from DB2.

Comprehensive big data

Analysis of user behavior and user profile

Self-Service

Marketing analysis

Data retention cycle

Detailed data: 7-15 days

Summary data: 6 months-1 year

High summary: 1-3 years

Manufacturer

Basic platform: Huawei

ETL/BI: Huawei created me

Precision Marketing: Asian Union

Mobile network-wide intelligent monitoring platform

An intelligent analysis platform for in-depth analysis of network-level monitoring data, providing data analysis services for group, intra-company and provincial companies

Requirements: ability to low-cost expansion of the stored data scale of rapid growth; ability to handle more unstructured data, such as web logs and Internet of things application logs; ability to carry out deep data mining under large-scale data, beyond the traditional random sampling analysis method, and replaced by full-data level analysis capability.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report