How to implement a flexible K8S Infrastructure 07/13 Update SLTechnology News&Howtos

How to implement a flexible K8S Infrastructure

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces you how to achieve a flexible K8S infrastructure, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Kubernetes is the most popular open source container orchestration platform, and it has become the first choice for many enterprises to build infrastructure. In this article, we will explore the best way to build an infrastructure for your use case and the decisions you may have to make based on various constraints.

Architecture design

Your architecture should be designed largely around your use cases, so you need to be very careful in the design process to ensure that the infrastructure supports your use cases and enlist the help of external professional teams if necessary. It's important to go in the right direction at the beginning of architectural design, but that doesn't mean there won't be mistakes, and as new technologies or research emerge every day, you can see that change has become the norm, and your architectural thinking is likely to be out of date.

This is why I strongly recommend that you adopt the principles of Architect for Chang and make your architecture a modular architecture so that you can have the flexibility to make changes internally when needed in the future.

Let's take a look at how to achieve the goal of system architecture with the client-server model in mind.

Entry point: DNS

In any typical infrastructure, whether cloud native or not, a message request must first be parsed by the DNS server and returned to the server's IP address. Setting up your DNS should be based on the availability you need. If you need higher availability, you may want to distribute your server to multiple regions or cloud providers, based on the level of availability you want to achieve.

Content delivery Network (CDN)

In some cases, you may need to serve users with as little delay as possible while reducing the load on the server. This is where content delivery networks (CDN) play an important role. Does Client often request a set of static assets from the server? Do you want to increase the speed of delivering content to users while reducing the load on the server? In this case, using edge CDN to serve a set of static assets may actually help reduce user latency and server load.

Is all your content dynamic? Can you provide users with delayed content to some extent to reduce complexity? Or does your application receive very low traffic? In this case, using CDN may not make much sense, you can send all the traffic directly to the global load balancer. It is important to note, however, that having CDN does have the advantage of allocating traffic, which is helpful when your server is attacked by DDOS.

CDN providers include Cloudfare CDN, Fastly, Akamai CDN, and Stackpath. In addition, your cloud provider may also provide CDN services, such as Google Cloud platform Cloud CDN, AWS CloudFront, Microsoft Azure Azure CDN, etc.

Load Balancer

If there is a request that cannot be served by your CDN service, the request will next be sent to your load balancer. These can be regional IP or global Anycast IP. In some cases, you can also use a load balancer to manage internal traffic.

In addition to routing and proxying traffic to appropriate back-end services, load balancer can also assume the responsibilities of SSL termination, integration with CDN, and even managing some aspects of network traffic.

Although there are hardware load balancers, software load balancers provide powerful flexibility, cost reduction, and elastic scalability.

Similar to CDN, your cloud provider should also be able to provide you with a load balancer (such as GCP's GLB, AWS's ELB, Azure's ALB, etc.), but what's more interesting is that you can provision these load balancers directly from Kubernetes. For example, creating an Ingress in GKE will also create a GLB for you at the backend to receive traffic. Other features such as CDN and SSL redirection can also be set by configuring your ingress. Visit the following link to view details:

Https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features

Although it always starts small, load balancers allow you to gradually scale to architectures of the following sizes:

Network and Security Architectur

The next thing to pay attention to is the Internet. If you want to improve security, you may need a private cluster. There, you can adjust inbound and outbound traffic, mask IP addresses behind NATs, isolate networks with multiple subnets on multiple VPC, and so on.

How you set up the network usually depends on the degree of flexibility you pursue and how to achieve it. To set up the right network is to reduce the attack surface as much as possible while maintaining normal operation.

Protecting your infrastructure by setting up the right network usually also involves firewalls set with the right rules and restrictions to restrict the flow of traffic from various back-end services, including inbound and outbound.

In many cases, these private clusters can be protected by setting up fortress hosts and tunneling all operations in the cluster, because what you need to expose to the public network is the fortress (also known as Jump host), usually set up in the same network as the cluster.

Some cloud providers also provide customized solutions to achieve zero-trust security. For example, GCP provides an identity-aware agent (IAP) for its users, which can be used instead of a typical VPN implementation.

After all is taken care of, the next step is to set up the network within the cluster itself according to your use case. This involves the following tasks:

Set up service discovery within the cluster (which can be handled by CoreDNS)

If necessary, set up a service grid (such as LinkerD, Istio, Consul, etc.)

Set up Ingress controller and API gateways (for example: Nginx, Ambassador, Kong, Gloo, etc.)

Set up a network plug-in using CNI to facilitate networking within the cluster

Set network policies, regulate communication between services, and expose services using various service types as needed

Set up inter-service communication between different services using protocols and tools such as GRPC, Thrift, or HTTP

Setting up the A _ A _ B test can be easier to implement if you use a service grid like Istio or Linkerd

If you want to see some sample implementations, I suggest you take a look at this repo (https://github.com/terraform-google-modules/cloud-foundation-fabric), which helps users set up all these different network models in GCP, including hub and spoke through VPN, DNS and Google Private Access for internal use, shared VPC that supports GKE, etc., all of which use Terraform. What's interesting about the network in cloud computing is that it's not limited to cloud service providers in your region, but to multiple service providers that can span multiple regions as needed. This is where a project like Kubefed or Crossplane can help.

If you want to explore some of the best practices for setting up VPC, subnets, and the overall network, I suggest you visit the following page, and the same concept applies to any cloud provider you join:

Https://cloud.google.com/solutions/best-practices-vpc-design

Kubernetes

If you are using managed clusters such as GKE, EKS, or AKS, Kubernetes is automatically managed, thus reducing the complexity of user operations.

If you manage Kubernetes yourself, you need to handle a lot of things, such as backing up and encrypting etcd storage, building networks between nodes in the cluster, patching your nodes with the latest version of the operating system on a regular basis, and managing cluster upgrades to keep up with upstream Kubernetes versions. Based on this, this is recommended only if you have a dedicated team to maintain these things.

Site Reliability Engineering (SRE)

When you maintain a complex infrastructure, it is important to have the right observability stack so that you can detect errors and predict possible changes before the user notices them, and then identify anomalies and have the spare power to delve deeper into the problem.

Now, this requires that you have agents that expose metrics to specific tools or applications for collection and analysis (which can follow the pull or push mechanism). If you are using a service grid with sidecars, they tend to have their own metrics without the need for custom configuration.

In any scenario, you can use a tool like Prometheus as a timing database, collect all metrics for you, and use the built-in exporter to expose metrics from applications and various tools with tools like OpenTelemetry. With tools such as Alertmanager, notifications and alerts can be sent to multiple channels, and Grafana will provide visual dashboards to provide users with complete visibility of the entire infrastructure.

To sum up, this is the Prometheus observability solution: source: https://prometheus.io/docs/introduction/overview/

With such a complex system, you also need to use a log aggregation system so that all logs can flow to one place for debugging. Most companies tend to use ELK or EFK stacks, and Logstash or FluentD do log aggregation and filtering for you according to your limitations. But there are also new players in the logging field, such as Loki and Promtail.

The following figure shows how a log aggregation system like FluentD simplifies your architecture: source: https://www.fluentd.org/architecture

But what if you want to track requests across multiple microservices and tools? This is where distributed tracking comes into play, especially given the complexity of microservices. Tools like Zipkin and Jaeger have always been pioneers in this field, and the most recent new tool to enter this field is Tempo.

Although log aggregation gives information from a variety of sources, it does not necessarily give the context of the request, which is where tracing is really helpful. Keep in mind, however, that adding a trace to your stack adds a lot of overhead to your request, because the context must be propagated between services along with the request.

The following figure shows a typical distributed tracking architecture: source: https://www.jaegertracing.io/docs/1.21/architecture/

However, the reliability of websites is not limited to monitoring, visualization and alerts. You must be prepared to deal with any failure in any part of the system and back up and failover regularly so that data loss can at least be minimized. You can do this with a tool like Velero.

Velero helps you maintain regular backups of various components in the cluster, including your workload, storage, etc., by taking advantage of the same Kubernetes architecture you use. The structure of Velero is as follows: as you have observed, there is a backup controller that backs up objects periodically and pushes them to specific destinations according to the schedule you set, and the frequency is based on the schedule you set. This can be used for failover and migration because almost all objects are backed up.

Deposit and storage

There are many different stored procedures and file systems available, which can vary greatly between cloud providers. This requires standards such as Container Storage Interface (CSI), which can help most external plug-ins in volume, making it easy to maintain and develop without becoming a core bottleneck.

The following figure shows the CSI architecture, which usually supports various volume plug-ins: source: https://kubernetes.io/blog/2018/08/02/dynamically-expand-volume-with-csi-and-kubernetes/

What about the clustering, expansion and other problems brought about by distributed storage? At this time, a file system like Ceph has proved its ability, but considering that Ceph is not built around Kubernetes, and there are some difficulties in deployment and management, you can consider projects like Rook at this time.

Although Rook is not coupled to Ceph and supports other file systems, such as EdgeFS, NFS, and so on, Rook and Ceph CSI seem to be a match made in heaven. The architecture of Rook and Ceph is as follows:

Source: https://rook.io/docs/rook/v1.5/ceph-storage.html

As you can see, Rook is responsible for the installation, configuration, and management of Ceph in the Kubernetes cluster. Automatically allocate the following storage according to the user's preferences. None of this will expose the application to any complex situation.

Image warehouse

Image Warehouse provides you with a user interface where you can manage various user accounts, push / pull images, manage quotas, get event notifications through webhook, scan for vulnerabilities, sign pushed images, handle images or copy images in multiple image repositories.

If you are using a cloud provider, it is likely that they have already provided image repositories as a service (such as GCR, ECR, ACR, etc.), which eliminates a lot of complexity. If your cloud provider does not provide it, you can also choose a third-party image repository, such as Docker Hub, Quay, etc.

But what if you want to host your own mirror repository?

If you want to deploy an image repository within the enterprise, you want to have more control over it, or you want to reduce the cost of operations such as vulnerability scanning, you may need to host it.

If this is the case, it will be helpful to choose a private image repository like Harbor. The Harbor architecture is as follows: source: https://goharbor.io/docs/1.10/install-config/harbor-ha-helm/

Harbor is an OCI-compliant image repository made up of various open source components, including Docker image repository V2, Harbor UI, Clair, and Notary.

CI/CD architecture

Kubernetes can host all workloads at any size, but it also requires a standard way to deploy applications and a streamlined CI/CD workflow. The following figure shows a typical CI/CD pipeline:

Some third-party services such as Travis CI, Circle CI, Gitlab CI, or Github Actions include their own CI runners. You only need to define the steps in the pipeline you want to build. This usually includes building an image, scanning the image for possible vulnerabilities, running tests and pushing them to the image repository, and in some cases a preview environment for approval.

Now, although the steps usually remain the same if you manage your own CI runners, you need to configure them to be set inside or outside the cluster with appropriate permissions to push assets to the image repository.

We have introduced the architecture of the Kubernetes-based cloud native infrastructure. As we have seen above, various tools solve different problems of infrastructure. They are like Lego bricks, each focusing on a specific problem at hand, abstracting a lot of complexity for you.

This allows users to gradually get started with Kubernetes in a gradual way. And you can use only the tools you need throughout the stack, depending on your use case.

On how to achieve flexible K8S infrastructure to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.