Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Production practice of Dragonfly-based Unified File Distribution platform for Zhejiang Mobile Container Cloud

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

In November 2018, Dragonfly, a cloud native image distribution system originated from Alibaba, was unveiled in KubeCon Shanghai as a CNCF sandbox project (Sandbox Level Project).

Dragonfly mainly solves the image distribution problem of distributed application choreography system based on Kubernetes. In 2017, open source became one of Alibaba's core infrastructure technologies. Over the past year since open source, Dragonfly has landed in many industries.

DCOS is the container cloud platform of Zhejiang Mobile. At present, there are 185 application systems running on the platform, including mobile phone business hall, CRM application and other core systems. This article will mainly introduce that Zhejiang Mobile Container Cloud (DCOS) platform uses Dragonfly as a "sharp weapon" for reform, successfully solves the problems of low distribution efficiency, low success rate and difficult network bandwidth control in large-scale cluster scenarios of operators, and feeds the community by upgrading Dragonfly at the level of Dragonfly interface function and production high availability deployment.

DCOS Container Cloud in production Environment

Challenges encountered

With the continuous improvement of Zhejiang Mobile Container Cloud (DCOS) platform, the number of hosting applications is increasing, and the number of containers in operation is close to 10000. The distribution service architecture based on the traditional Client-Server structure has become increasingly unable to adapt to the scenarios of large-scale distributed applications in code package release and file transfer:

The computing node fails to download the code package due to network anomalies and other reasons, which affects the integrity and consistency of the application code package.

In the case of multi-user and high concurrency, TB-level file transfer may occur, and a single point of performance bottleneck increases the application release time.

Introduction to Dragonfly

P2P (Peer-To-Peer) is a peer-to-peer network technology, through the interconnection of each node, the resources and services in the network are scattered on each node. The transmission of information and the implementation of services are carried out directly between nodes, which avoids the possible single point bottleneck of the traditional Cpact S structure.

CNCF open source file distribution service solution Dragonfly is based on P2P and CDN technology to build a distribution system suitable for container images and files, which effectively solves the problems of file and image distribution efficiency, low success rate and network bandwidth control in enterprise large-scale cluster scenarios.

Core components of Dragonfly:

SuperNode: a super node that downloads files from a file source and produces seed data blocks in a passive CDN manner. In P2P networks, it acts as a network controller to schedule block data transmission between nodes.

Dfget proxy: an agent deployed in computing nodes, which is responsible for data block download and data sharing among P2P nodes.

How Dragonfly distribution works (take mirror distribution as an example):

Different from ordinary files, container images are composed of multiple layers of storage, and downloads are also hierarchical downloads, not a single file. The mirror file of each layer is divided into block data blocks and used as seeds. After the download, the image is reassembled into a complete image through the unique ID and sha256 algorithms for each layer of image. Ensure the consistency of the download process.

The process of Dragonfly image download mode is shown below:

Dfget-proxy intercepts the image download request (docker pull) initiated by the client docker and converts it into a dfget download request to SuperNode

SuperNode downloads the image from the mirror source repository and splits the image into multiple block seed blocks

Dfget downloads data blocks and shares the downloaded data blocks. SuperNode records data block downloads and directs subsequent download requests to download data blocks in a P2P manner between nodes.

Dokcer daemon's mirror pull mechanism will eventually form a complete image of the image file.

According to the above characteristics of Dragonfly, combined with the actual production, Zhejiang Mobile Container Cloud platform decided to introduce Dragonfly technology to transform the current code package release mode, to share the transmission bandwidth bottleneck of a single file server through the P2P network, and to use Docker's own mirror pull mechanism to ensure the consistency of image files in the whole release process.

Solution: unified distribution platform

Architecture design and implementation

Functional architecture design

On the basis of Dragonfly technology, combined with the container cloud production practice of Zhejiang Mobile, the overall design goal of the unified distribution platform is as follows:

By using Dragonfly technology and file download and verification function, the problems of inconsistent release of application code package and long release time in the process of production and release are solved.

Support client interface, shield background command line details, simplify operation flow, more efficient

Support the distribution of Mesos, K8s, Host, VM and other cloud environments, and realize the independent discovery of the cluster, and support users to unify the management of the target cluster through the unified distribution platform.

Increase user rights control and task bandwidth restrictions to support multi-tenant and multi-task distribution

Optimize the deployment mode of P2P Agent to support faster P2P networking of computing nodes.

Based on the above objectives, the overall architecture of the unified distribution platform is designed as follows:

P2P network layer is a distribution network composed of multiple computing nodes that supports the access of different heterogeneous clusters (host cluster, K8s cluster, Mesos cluster).

The distribution service layer is composed of functional modules and storage modules, which is the core architecture of the whole general distribution system. Among them, the user access authentication module provides the system login audit function; the distribution control module implements P2P task distribution based on Dragonfly; the flow control module provides the tenant's bandwidth setting function for different tasks; the configuration information database is responsible for recording basic information such as the target cluster and task status of the network layer; and the user can transparently control the execution progress of the distribution task through the status query module.

The user operation layer consists of any number of interfaced user clients.

Technical architecture implementation

According to the above platform design objectives and overall architecture analysis, the container cloud team has carried out the secondary development of the platform features on the basis of open source components, including:

Develop an interface user client Client

Harbor open source image repository is introduced for image storage, and Minio object storage service for file storage.

MySQL and Redis are used as CMDB, and MySQL is responsible for managing cluster status, user information, etc., which provides support for cluster-oriented "one-click" task creation. Save distribution task status information through Redis and provide status query service with high concurrency and low latency

Both the platform core services layer (Docktrans) and the API services gateway layer (Edgetrans) are stateless, cluster-oriented, and dynamically scalable core groups:

The API gateway encapsulates the internal architecture of the system, which is mainly responsible for receiving and forwarding task requests initiated by Client, realizing user access authentication to each functional module, and providing customized API calling services.

The core service layer is the engine of business logic processing of each functional module of the platform. In the process of distribution, the core service layer will simultaneously initiate a download request to the P2P agent node through a unified remote call to complete the "one-to-many" distribution process of the client-task cluster.

Both df-master and df-client are Dragonfly components. Df-master is the super node SuperNode,df-client in Dragonfly, which is the peer-to-peer agent dfget proxy in P2P network.

Technical features

Df-client mirrors the container. The efficiency of networking is accelerated through lightweight container deployment. The cluster host nodes with new access to the network layer can be downloaded and started by mirrors, and the P2P Agent nodes can be started in seconds.

The core interface layer (Docktrans) shields the underlying command line details of dfget, provides interface functions, and simplifies user operations. The unified remote call method is used to send to multiple P2P task nodes, which solves the problem that users need to download dfget and other operations one by one, and simplifies the "one-to-many" task initiation mode.

Core function module | Interactive flow of distribution control interface

As shown in the following figure, the workflow of the core module of the unified distribution platform for task distribution is as follows:

Users create images or file distribution tasks through Client

The distribution module first determines whether the user has the authority of the distribution function through the authentication function of the platform API Service Gateway (Edgetrans).

After authentication, the user sets the distribution task parameters and provides the cluster ID, and the platform reads the cluster configuration information from the MySQL database to realize the independent discovery of cluster nodes. Users can also specify multiple node IP as custom cluster parameters

According to the distribution type, the core service layer (Docktrans) distribution function module converts different front-end distribution requests into dfget (file) or Docker pull (image) commands, and sends the commands to multiple nodes df-client for corresponding processing through unified remote invocation of Docker Service services.

In the process of a task, the task progress and task event log are written into Redis and MySQL database respectively to provide users with the ability to query the status of the task.

Results of transformation of production environment

Up to now, the production has run more than 200 business systems and more than 1700 application modules, all of which have been optimized for mirror release mode. Release time and release success rate have been significantly improved:

After peer-to-peer image publishing, the monthly release success rate of multiple business applications is stable at 98%.

After April, the container cloud platform began to replace the code package release of the traditional distribution system with P2P image publishing. The time spent on one more centralized online release is significantly lower than that before the transformation, with an average decrease of 67%.

At the same time, the container cloud platform selects several application clusters to test the effect of P2P image release and transformation of single application. It can be seen that the time taken for a single application to release is significantly lower than that before the transformation, with an average decrease of 81.5%.

Follow-up promotion

The unified file distribution platform has effectively solved the problem of efficiency and consistency of Zhejiang Mobile container cloud application in the process of code release, and has become an important part of the platform. At the same time, it also supports more efficient file distribution scenarios in large-scale clusters. It can be extended to: cluster batch installation media distribution and cluster batch configuration file update.

Community co-building | display of interface features

The demands of the community born after the direct introduction of Dragonfly

Lack of graphical interface, high user cost and low operation efficiency

Lack of user rights management and distribution audit function, no distribution control ability

The one-to-many cluster operation mode of users is not supported. In cloud environment, users usually need to distribute to the cluster they manage at the same time, but the existing mode only supports users to distribute at a single node.

The traditional deployment method of Agent application package is inefficient, which is not conducive to the rapid expansion of large-scale clusters. As a system software, it increases the invasion to the host system.

At present, the client interface development work has been basically completed and has entered production testing and deployment. The four core functions of the overall planning of the distribution platform: task management, goal management, rights management and system analysis, the first three functions have been opened.

Rights management interface

Rights management, that is, user management, provides personalized rights management features for different users, as follows:

Support for different roles (super administrator, task cluster administrator, task administrator) users to create, delete, modify

Support customized combination of different permission sets (role creation), user rights empowerment

Support external system user access and authorization (not yet open).

Target management interface

Target management, that is, the target cluster node management for users to distribute tasks, and provides users with the health functions of P2P networking and cluster node status information for managing the cluster, as follows:

Support the creation and deletion of different user clusters

Support the rapid addition and deletion of P2P network nodes and the monitoring of node status in container automation Agent deployment under the cluster managed by users.

It supports the access of different types, such as host (virtual machine, physical machine) cluster, K8s cluster and Mesos cluster. At the same time, it supports direct reading of K8s and Mesos cluster node information and batch access to P2P network layer.

Task management

Task management provides the creation, deletion, stop, information viewing and other functions of file or image distribution tasks, as shown below:

Image warm-up mode is supported (scheduled distribution tasks can be set, images or files can be released in advance to each node)

Support the distribution of multi-format files such as container images

Support multi-node "one-click" task creation, execution, deletion, termination and "one-click replication" of executed tasks in a specified task cluster.

Support the creation and deletion management of release file versions

Support the view of the distribution task status and task log.

System analysis (planned opening)

It is expected that the system analysis function will be opened to provide platform administrators and users with data such as task distribution time, success rate and statistical charts of task execution efficiency, which can effectively support the evolution of the platform to intelligent direction through data statistics and prediction.

Community co-building | production of high-availability deployment

The master / slave disaster recovery deployment of the mirror database, and the data consistency between the master and backup is maintained through mirror synchronization.

P2P publishing consists of df-master and df-client (blue part). Df-master pulls images from the mirror library to form P2P seeds. Each computer room is configured with two df-master to form high availability.

P2P distribution is only distributed in this computer room to avoid cross-room traffic.

Each data center is equipped with two mirror (backup mirror library). When P2P distribution does not work, the computing node will automatically upload images to mirror, and mirror will achieve high availability through load balancing.

At present, we plan to contribute the interface function to the CNCF Dragonfly community and further enrich the ecology around the CNCF Dragonfly community. In the future, we hope that more people will participate and contribute to the prosperity of the community.

The author of this article:

Chen Yuanzheng, Zhejiang Mobile Cloud Computing architect

Wang Yexin, Zhejiang Mobile Cloud Computing architect

Dragonfly community sharing

Tai Yun, a contributor to the Dragonfly community, shared on Dragonfly Meetup:

"at present, Dragonfly has become a CNCF Sandbox project with a total of 2,700 Star. Many enterprise users are using Dragonfly to solve the problems they encounter in image or file distribution. In the future, we will continue to improve and improve Dragonfly to provide more rich, powerful and simple distribution tools for cloud native applications. We look forward to working with you to make Dragonfly a CNCF graduation project as soon as possible."

Project address

Https://github.com/dragonflyoss/Dragonfly

Dragonfly Roadmap

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report