[lecture record] the practice of massive unstructured data management at PB level in banks 03/16 Update SLTechnology News&Howtos

[lecture record] the practice of massive unstructured data management at PB level in banks

2026-03-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

[lecture record] the practice of massive unstructured data management at PB level in banks

Hao Dawei

Recently, Hao Dawei, Technical Director of Giant Sequoia Database, was invited to give a speech on the theme of "Bank PB level massive unstructured data Management practice" at the Seventh data Technology Carnival, sharing some practices and thoughts of Giant Sequoia Database on financial industry database management and financial database technology and application.

Data explosion: the data shows a rapid growth, which puts forward higher requirements for the amount of data stored, concurrency and response speed. Take large commercial banks as an example, they usually have hundreds of business systems and huge data of hundreds of millions of users, and the number is growing exponentially, from TB level to PB level, and will soon increase to EB level in the future, which require effective management and real-time access.

Data fusion: not only in the financial industry, in the past, the data of each business existed independently in the form of isolated islands, and what we need is unified data management and maintenance across businesses and cross-business systems. it even requires data communication under the support of a unified architecture. Breaking the data island has become a real demand for the financial industry.

Unstructured data: a form of data existence in which the proportion of unstructured data in the amount of data in the financial industry is gradually dominant. Images, pictures, voice, and formatted documents are all unstructured data, and the amount of unstructured data is growing by about 80% a year. The rapid increase in the amount of data, coupled with the requirements for the data security of the banking industry in two places and three centers, has increased the requirements for the storage and management of unstructured data. This is also the demand of the financial industry.

With the establishment and upgrade of bank remote account opening, paperless counter, double recording, accounting file management and other systems, the image system not only meets the ever-improving access performance requirements of the commercial bank online business system, but also needs to provide high availability, disaster preparedness and even "double live" capabilities as an online system to ensure the absolute security of system data.

Core competence of financial grade database

In the face of the new needs of the financial industry, the new generation of financial-level databases need to redefine the traditional database architecture from the perspectives of distributed architecture, unstructured data management, multi-mode data processing, standardized data access, data reliability, and mixed load.

1) distributed architecture

As the single point architecture of traditional database can not meet the needs of new financial technology applications for data volume and concurrency, the new generation of financial database must adopt distributed architecture to meet this kind of challenges. Distributed architecture stores large amounts of data evenly in multiple physical devices to avoid bottlenecks caused by a single device. At the same time, the flexible expansion ability of distributed database provides flexible capacity and performance support for the growth of financial business, and has obvious technical advantages in large-scale data applications.

We take the Giant Sequoia distributed architecture as an example, whether data or file system and other metadata should be distributed storage, while the management of metadata should also be distributed, highly available, no single point of failure. Distributed architecture must have flexible expansion and linear growth in performance. Colleague distributed architecture can effectively reduce TCO and overall application costs. Distributed architecture has good management ability, which can reduce the cost of development, operation and maintenance.

2) Multi-schema data management-unstructured data management

Nowadays, under the trend of "Internet" and "retail" of financial business, financial institutions begin to provide users with more personalized and customized products and services. Unstructured data, in particular, is growing the fastest.

Generally speaking, structured data refers to the data storage structure of form type, and typical applications include traditional businesses such as bank core transactions; while semi-structured data is widely used in scenarios such as user portraits, Internet of things device log collection, application clickstream analysis, and unstructured data corresponds to a large amount of picture, video, and document processing business, which is growing rapidly with the development of financial technology.

In order to realize the unified management and data fusion of financial business data, the new database needs to have the ability of multi-mode (Multi-Model) data management and storage to meet the application requirements for the management of structured, semi-structured and unstructured data.

Multi-mode data management capability enables financial-level databases to uniformly store and manage cross-departmental and cross-business data, achieve multi-business data fusion, and support diversified financial services.

3) Standard data access and mixed load

According to the latest definition of Gartner, mixed load (HTAP Hybrid Transactional/Analytical Processing) not only retains the original online trading function, but also emphasizes the native computing and analysis ability of the database. The database supporting mixed load can avoid a large amount of data interaction between online and offline databases in the traditional architecture, and can also carry out real-time statistical analysis of the latest business data.

In order to avoid resource interference between online real-time read-write and batch jobs, mixed-loaded databases are usually implemented using read-write separation or memory processing technology. Generally speaking, the multi-replica architecture of distributed database naturally supports read-write separation technology, while the database based on traditional architecture is often implemented by memory processing technology.

4) data security

With the continuous improvement of internal value in the enterprise, data has become the lifeline and core asset of financial enterprises. As the database that carries the key data of the enterprise, its security, reliability and stability have always been the core value of the financial database.

An important concept in the field of data security is disaster resilience. The CBRC requires the banking industry to meet the requirements of two places and three centers. This is actually the idea of multiple copies of data, and we have other copies that can support the needs of data management and data services if any copy is lost. This is particularly important for financial companies.

Application case of Financial Grade Database

1) distributed image platform for banking industry

The case of banking image platform is implemented in a large joint-stock bank. The bottom of the platform is based on Giant Sequoia database and has been put into production.

Giant sequoia database is suitable for structured, unstructured and semi-structured data storage. Provide external image file management service at the application level. There are two or more application servers with load balancing and high availability. The servers are connected to the internal business systems of the bank. When you need to check unstructured data, you can access the image management platform. Giant sequoia database supports PB-level data storage and high availability at the same time.

In addition, the giant sequoia database supports multi-index and millisecond real-time data access, which still provides such a large amount of data, and the overall application cost can be reduced by 1/3 compared with the previous image platform. this is determined by the distributed architecture of the entire Giant Sequoia database.

2) Securities ultra-high concurrent data access

The main feature of securities trading is its high frequency, and there may be hundreds of millions of trading data every day. Securities trading scenarios are generally structured data, a large number of structured data into the system to improve the structural ability of high concurrency.

This system can help users to query all the historical transaction details of securities trading, and the return speed of the query is still very high, and the query range may be less than 100 milliseconds in the case of massive data.

Achieve the result:

An average of more than 200 million records are written per day

During peak hours, there are more than 10 billion levels of data to be retrieved and called.

The system keeps all transactions and held data within 3 years

Peak concurrency is more than 10000

During peak hours, the query return time is less than 100ms.

3) massive data management of banks

With regard to the management platform of the massive data of the bank, in fact, the structured data of the bank multi-business system form a unified query platform, through which users can query the business without the need to query the original business system. in this way, the load of the database of the original business system is reduced. The original business system database only stores that part of the data that needs to be traded online, and all the other data is stored in the Giant Sequoia database.

SequoiaDB uses its mechanism of horizontal expansion, supporting standard SQL and dual engines to provide online query and analysis capabilities while storing large amounts of historical data, which enables banks to nearline traditional offline data and effectively use cold data.

A number of bank customers of Giant Sequoia database use SequoiaDB to provide highly concurrent data query and access functions, so that bank customers can query all the transaction history since the opening of accounts anytime and anywhere on the counter, online banking and mobile banking. At the same time, the platform can provide the ability of judicial inquiry, so that the bank IT department does not have to run between the historical belt database and the database for complex and changeable query requests.

4) other cases

In the government industry, the giant sequoia database can centrally store and query electronic documents, which can help administrative service halls or other government departments to query information and improve the efficiency of the work.

In the field of transportation, a large number of pictures and video data captured by cameras in real time need to be stored, and now there is an increase in real-time processing and analysis of license violations and other behaviors, which also requires a powerful data storage management query or storage engine to support huge amounts of data. Giant Sequoia database can effectively meet this demand.

Enmo College, a subsidiary of Yunhe Enmo (Beijing) Information Technology Co., Ltd., is committed to providing professional and high-level oracle database and big data training services, mining and training big data and database talents. Enmo College provides a full range of big data and database technology training, including individual practical skills training, personal certification training, and enterprise internal training. ACE-level super teachers, equipped with professional laboratories, immersion learning and training, professional laboratories, equipped with professional teaching assistants to guide training. Can quickly integrate into the circle of experts, rich in resources in the industry, and quickly accumulate workplace contacts. Oracle database courses include: Oracle DBA practical class, Oracle OCM examination, Oracle OCP examination and so on.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.