What is the custom auto scaling of Fluid? 07/08 Update SLTechnology News&Howtos

What is the custom auto scaling of Fluid?

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about the custom flexibility of Fluid. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

* * introduction: * * Auto scaling is one of the core competencies of Kubernetes, but it always revolves around this stateless application load. On the other hand, Fluid provides the flexible scaling ability of distributed cache, which can flexibly expand and shrink data cache. It provides performance metrics such as cache space and existing cache ratio based on Runtime, and provides data cache scalability on demand, combined with its own capacity expansion and scaling of Runtime resources.

Background

As more and more data-intensive applications such as big data and AI begin to deploy and run in the Kubernetes environment, the differences between the design concept of the data-intensive application computing framework and the cloud native flexible application choreography have led to data access and computing bottlenecks. Cloud native data orchestration engine Fluid provides applications with the ability to accelerate data access through the abstraction of data sets, the use of distributed cache technology and the scheduler.

Auto scaling is one of the core competencies of Kubernetes, but it always revolves around this stateless application load. On the other hand, Fluid provides the flexible scaling ability of distributed cache, which can flexibly expand and shrink data cache. It provides performance metrics such as cache space and existing cache ratio based on Runtime, and provides data cache scalability on demand, combined with its own capacity expansion and scaling of Runtime resources.

This capability is very important for big data applications in Internet scenarios, because most big data applications are realized through end-to-end pipelining. This pipeline consists of the following steps:

Data extraction: the use of Spark,MapReduce and other big data technology to preprocess the original data.

Model training: use the first stage to generate feature data for machine learning model training, and generate the corresponding model.

Model evaluation: evaluate and test the second phase generation model through a test set or validation set.

Model reasoning: the model verified in the third stage is finally pushed online to provide reasoning services for the business.

You can see that end-to-end pipelining will contain many different types of computing tasks, for each computing task, there will be appropriate professional systems to deal with in practice (TensorFlow,PyTorch,Spark, Presto); but these systems are independent of each other, usually with the help of external file systems to transfer data from one stage to the next. However, the frequent use of file system to achieve data exchange will bring a lot of Icano overhead, which will often become the bottleneck of the whole workflow.

Fluid is very suitable for this scenario. Users can create a Dataset object, which has the ability to separately cache data in Kubernetes computing nodes as a medium for data exchange, which avoids remote writing and reading of data and improves the efficiency of data use. But the problem here is the resource estimation and reservation of temporary data caching. Before data production and consumption, accurate data volume estimation is more difficult to meet, too high estimate will lead to waste of resource reservation, too low estimate will increase the possibility of data writing failure. Or expanding and reducing capacity on demand is more user-friendly. We hope to achieve a page cache-like effect, this layer is transparent to the end user, but the cache acceleration effect is real.

We introduce cache auto scaling through custom HPA mechanism and Fluid. The condition for auto scaling is that when the amount of existing cache data reaches a certain proportion, the auto expansion will be triggered to expand the cache space. For example, if the trigger condition is set to account for more than 75% of the cache space, the total cache space is 10 GB. When the data has reached 8 GB of cache space, the expansion mechanism will be triggered.

Let's use an example to help you experience the automatic expansion and reduction capability of Fluid.

prerequisite

It is recommended to use Kubernetes 1.18 or above, because before 1.18, HPA cannot customize the capacity expansion policy, which is implemented by hard coding. After 1.18, users can customize the expansion policy, such as the cooling time after a capacity expansion.

Specific step 1. Installing the jq tool makes it easy to parse json. In this example, we use the operating system centos, and you can install jq through yum. Yum install-y jq2. Download and install the latest version of Fluid. Git clone https://github.com/fluid-cloudnative/fluid.gitcd fluid/chartskubectl create ns fluid-systemhelm install fluid fluid3. Deploy or configure Prometheus.

Here, the Metrics exposed by AlluxioRuntime's cache engine is collected through Prometheus. If there is no prometheus in the cluster:

$cd fluid$ kubectl apply-f integration/prometheus/prometheus.yaml

If there is a prometheus in the cluster, you can write the following configuration to the prometheus configuration file:

Scrape_configs:-job_name: 'alluxio runtime' metrics_path: / metrics/prometheus kubernetes_sd_configs:-role: endpoints relabel_configs:-source_labels: [_ meta_kubernetes_service_label_monitor] regex: alluxio_runtime_metrics action: keep-source_labels: [_ meta_kubernetes_endpoint_port_name] regex: web action: keep-source_labels: [_ meta_kubernetes_namespace] target_label: namespace replacement: $1 action: replace-source_labels: [_ _ meta_kubernetes_service_label_release] target_label: fluid_runtime replacement: $1 action: replace-source_labels: [_ _ meta_kubernetes_endpoint_address_target_name] target_label: pod replacement: $1 action: replace4. Verify that Prometheus is installed successfully. $kubectl get ep-n kube-system prometheus-svcNAME ENDPOINTS AGEprometheus-svc 10.76.0.2 kube-system prometheus-svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT 9090 6m49s $kubectl get svc-n kube-system prometheus-svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGEprometheus-svc NodePort 172.16.135.24 9090:32114/TCP 2m7s

If you want to visualize the monitoring metrics, you can install Grafana to verify the monitoring data. For more information, please see the documentation.

5. Deploy metrics server.

Check whether the cluster includes metrics-server, execute kubectl top node has the correct output to display memory and CPU, then the cluster metrics server configuration is correct.

Kubectl top nodeNAME CPU (cores) CPU% MEMORY (bytes) MEMORY2.168.1.204 93m 2% 1455Mi 102.168.1.205 125m 3% 1925Mi 132.168.1.206 96m 2% 1689Mi 11%

Otherwise, execute the following command manually:

Kubectl create-f integration/metrics-server6. Deploy custom-metrics-api components.

To extend based on custom metrics, you need to have two components:

The first component is to collect metrics from the application and store them in the Prometheus time series database.

The second component uses the collected metrics to extend the Kubernetes custom metrics API, or k8s-prometheus-adapter.

The first component is deployed in step 3, and the second component is deployed below.

If you have already configured custom-metrics-api, add the dataset-related configuration to the configmap configuration of adapter:

ApiVersion: v1kind: ConfigMapmetadata: name: adapter-config namespace: monitoringdata: config.yaml: | rules:-seriesQuery:'{_ _ name__=~ "Cluster_ (CapacityTotal | CapacityUsed)", fluidroomruntimestones = "", instancebones = "", job= "alluxio runtime", namespacewings = "" Podholders = ""} 'seriesFilters:-is: ^ Cluster_ (CapacityTotal | CapacityUsed) $resources: overrides: namespace: resource: namespace pod: resource: pods fluid_runtime: resource: datasets name: matches: "^ (. *)" as: "capacity_used_rate" metricsQuery: ceil (Cluster_CapacityUsed {} * 100 / (Cluster_CapacityTotal {}))

Otherwise, execute the following command manually:

Kubectl create-f integration/custom-metrics-api/namespace.yamlkubectl create-f integration/custom-metrics-api

Note: because custom-metrics-api docks the access address of Prometheous in the cluster, please replace prometheous url with the Prometheous address you actually use.

Check the custom metrics:

$kubectl get-- raw "/ apis/custom.metrics.k8s.io/v1beta1" | jq {"kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "custom.metrics.k8s.io/v1beta1", "resources": [{"name": "pods/capacity_used_rate", "singularName": "", "namespaced": true, "kind": "MetricValueList" "verbs": ["get"]}, {"name": "datasets.data.fluid.io/capacity_used_rate", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": ["get"]}, {"name": "namespaces/capacity_used_rate" "singularName": "," namespaced ": false," kind ":" MetricValueList "," verbs ": [" get "]}} 7. Submit the Dataset used for the test. $cat

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.