Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Run 300000 container instances per week-containerization practice of Netflix

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

Who's Netflix?

Netflix is the largest online video provider in Europe and the United States, surpassing Youtube. Every day in more than 190 countries, more than 100 million members watch 120 million hours of movies, TV dramas and documentaries on Netflix. Netflix has also produced popular TV dramas like House of Cards.

In order to cope with the huge concurrent traffic, Netflix took 7 years, and the website architecture evolved from the traditional boulder application to the industry's leading micro-service architecture. Currently, there are 500 microservices running on Netflix's cloud platform, and 1000 changes are deployed to the online environment every day. The online environment is deployed in Amazon's 3 Region,9 availability zones to provide stable online video services for users around the world.

Why does Netflix use container technology?

Netflix has run tens of thousands of virtual machines on Amazon EC2, and their virtual machine cluster is very stable, realizing the transformation of cloud native applications and very easy to expand, so why do they need container technology?

The use of container technology has accelerated the innovation speed of Netflix. The author also mentioned in the previous article that in the culture of Netflix R & D team, innovation ranks first, followed by product stability. So Netflix is very fault-tolerant for applications, so that even if innovative applications fail, the scope of failure can be kept to a minimum, and developers will be more dare to try to innovate. Chaos Monkey is a tool released by the Netflix team that specializes in catastrophic testing.

Using containers can better manage applications and their dependencies, and packaging all content in containers is easy to manage and publish. When there is a problem with the online container, you can Pull down and debug it locally at any time, allowing developers to reproduce the problem more quickly.

Use the UseCase of the container

1. Video media transcoding

In the past, it took one month to transcode a large number of video streaming media using a virtual machine, and only one week to use a container for high concurrency transcoding.

two。 Unified code building, packaging and testing system

Before, after the Netflix team released the package, it only ran the test cases of the package itself, and did not do the integration test of the upstream package (UpStream), so often releasing the package would destroy the testing of the upstream package, and upstream developers need to debug repeatedly to fix the problem. After using the container, Netflix will run the test case of the upstream package in the container, and if the upstream test case fails, the package release will also fail, which is called Fail Fast.

3. Developers do not need to care about the environment

For example, NodeJs developers, they do not care what the system is running, and do not want to care, so the use of containers allows these developers to focus on development, reducing the cost of APP operation and maintenance.

Netflix Container Management platform Titus

Netflix does not use Docker at the beginning, but uses CGruop, and then uses Mesos for container management. at present, it has launched a self-developed Titus container management platform to achieve container management.

There are already many container management platforms on the market, why do you need to develop your own container management platform? As most of the platforms on the market focus on the construction of data centers and support hybrid cloud solutions, and the current solutions on the market can not meet the large-scale container use of Netflix.

Titus was originally used to manage a large number of Batch tasks within Netflix, including video transcoding, watermarking, and generating daily CDN network traffic reports. These tasks have some common characteristics, which take a long time, are computing resource-intensive tasks, and do not depend on the platform, so they are more suitable to run with containers.

Later, Titus began to take over Services,Services tasks with much higher requirements than Batch tasks, requiring real-time capacity expansion and uninterrupted operation, services with more state management requirements, difficult to upgrade, and so on. In response, Titus has made a lot of container management improvements.

Network management

Use Docker in VPC networks. Through the Titus executor, you can create a Namespace and create and start a Pod root container, similar to the Pods of K8S. Then create Veth, routing rules, iptables, and so on.

Metadata Proxy

Metadata Proxy is a unified network management module implemented by Titus based on Amazon's Metadata Service. It solves the security problem of the container. It uniformly manages the "whoami" information of the container, such as the container's IP, permissions, container host ID, domain name, and so on. The container itself cannot know more information about the host, thus ensuring the container's network security. Put the VPC network and Metedata Proxy together, and this is the picture below:

Integrate with Spinnaker CI/CD

Titus is integrated with Spinnaker, Netflix's existing CI/CD tool. When creating a cluster for Spinnaker, Spinnaker provides the creation of a cluster of AWS virtual machines and a cluster of Titus containers.

When a container Push goes to the Artifactory Docker registry, the tag will be displayed in real time in the Spinnaker deployment task for the user to select the version of the Docker image.

Titus also integrates Chaos Monkey. When performing online environment destructive testing in Chaos Monkey, select Titus, and containers running online will be randomly destroyed.

Fenzo task scheduling framework

Netflix open source task scheduling framework Fenso, which is an extensible task scheduling framework for managing the life cycle of all scheduled tasks.

As long as you give Fenzo a to-do list and give Fenzo a bunch of available computing resources, Fenzo will make its own decisions and execute the tasks. If you give too few computing resources to perform all the tasks, Fenzo will tell you how much computing resources you need and how to expand.

Integration of Titus and Fenzo

In order to maximize the utilization of computing resources, the production environment of Netflix and Batch tasks share container resources. Some Batch tasks will take up a lot of computing resources, so that when the APP in the production environment needs to be expanded, we have to wait for the completion of the Batch task to obtain resources.

To solve this problem, Fenzo divides the container into the core layer and the Flex layer, and the application of the core layer will ensure the capacity expansion. The Flex layer queues low-priority tasks for execution.

The present situation of Titus

Previously, Titus managed containers with the platform of Mesos within Netflix, but now ECS has replaced Mesos as the platform for container orchestration.

Titus supports the request processing of OutBound and Inbound, and the container can be started and stopped in ECS from OutBound. If the container stops, CloudWatch will detect the change in ECS and notify the change in the status of the task in Titus.

In the future, Titus will focus on:

1. Automatic expansion of services and container load balancing across data centers.

two。 To achieve the SLA guarantee of the core layer tasks, when the computing resources of the core layer reach the upper limit, it will seize the resources of the Flex layer to ensure the SLA of the core layer tasks.

3. Better support dynamic expansion.

Author: Wang Qing is currently the chief architect of JFrog China. She has done research and development and architecture for IBM,HPE, iqiyi, Sina, VIPKID and other companies. She is an Internet veteran with more than 10 years of development experience, focusing on software lifecycle management, micro-service architecture, cloud native applications, containerization and other fields.

Welcome to reprint, but please indicate the author and source. Thank you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report