How to configure TensorFlow Serving in Kubernetes 07/03 Update SLTechnology News&Howtos

How to configure TensorFlow Serving in Kubernetes

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to configure TensorFlow Serving in Kubernetes". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

About TensorFlow Serving

The following is an architectural diagram of TensorFlow Serving:

For more basic concepts of TensorFlow Serving, please refer to the official documentation. The translation is not as good as that written in the original.

Here, I have summed up the following knowledge points, which I think are more important:

TensorFlow Serving uses Model Version Policy to configure multiple versions of models and serving simultaneously.

Only latest version of model is loaded by default

Support automatic model discovery and loading based on file system

Low delay in request processing

Stateless, support for scale-out

You can test different Version Model using A _ A _ B

Support for scanning and loading TensorFlow models from the local file system

Support for scanning and loading TensorFlow models from HDFS

Provides a gRPC interface for client calls

TensorFlow Serving configuration

When I rummaged through the official documents of TensorFlow Serving, I still couldn't find a complete model config configuration, which was frustrating. No way, the development is too fast, the document can not keep up with too normal, can only play with the code.

In the main method of model_servers, we see the complete configuration item and description of tensorflow_model_server as follows:

Tensorflow_serving/model_servers/main.cc#L314int main (int argc, char** argv) {... Std::vector flag_list = {tensorflow::Flag ("port", & port, "port to listen on"), tensorflow::Flag ("enable_batching", & enable_batching, "enable batching"), tensorflow::Flag ("batching_parameters_file", & batching_parameters_file, "If non-empty" Read an ascii BatchingParameters "" protobuf from the supplied file name and use the "" contained values instead of the defaults. "), tensorflow::Flag (" model_config_file ", & model_config_file," If non-empty, read an ascii ModelServerConfig "" protobuf from the supplied file name " And serve the "" models in that file. This config file can be used to "" specify multiple models to serve and other advanced "" parameters including non-default version policy. (If "" used,-- model_name,-- model_base_path are ignored. "), tensorflow::Flag (" model_name ", & model_name," name of model (ignored "" if-- model_config_file flag is set "), tensorflow::Flag (" model_base_path ", & model_base_path) "path to export (ignored if-model_config_file flag"is set, otherwise required)"), tensorflow::Flag ("file_system_poll_wait_seconds", & file_system_poll_wait_seconds) "interval in seconds between each poll of the file"system for new model version"), tensorflow::Flag ("tensorflow_session_parallelism", & tensorflow_session_parallelism, "Number of threads to use for running a"Tensorflow session. Auto-configured by default. " "Note that this option is ignored if"--platform_config_file is non-empty."), tensorflow::Flag ("platform_config_file", & platform_config_file, "If non-empty, read an ascii PlatformConfigMap protobuf", "from the supplied file name, and use that platform"config instead of the Tensorflow platform. (If used, ""-- enable_batching is ignored.)};...}

Therefore, we see that the configuration of model version config is all configured in-- model_config_file. Here is the complete structure of model config:

Tensorflow_serving/config/model_server_config.proto#L55// Common configuration for loading a model being served.message ModelConfig {/ / Name of the model. String name = 1; / / Base path to the model, excluding the version directory. / / E.g > for a model at / foo/bar/my_model/123, where 123 is the version, the / / base path is / foo/bar/my_model. / / (This can be changed once a model is in serving, * if* the underlying data / / remains the same. Otherwise there are no guarantees about whether the old / / or new data will be used for model versions currently loaded.) String base_path = 2; / / Type of model. / / TODO (bhand 31336131): DEPRECATED. Please use 'model_platform' instead. ModelType model_type = 3 [deprecated = true]; / / Type of model (e.g. "tensorflow"). / (This cannot be changed once a model is in serving.) String model_platform = 4; reserved 5; / / Version policy for the model indicating how many versions of the model to / / be served at the same time. / / The default option is to serve only the latest version of the model. / (This can be changed once a model is in serving.) FileSystemStoragePathSourceConfig.ServableVersionPolicy model_version_policy = 7; / / Configures logging requests and responses, to the model. / (This can be changed once a model is in serving.) LoggingConfig logging_config = 6;}

We see model_version_policy, which is the configuration we are looking for, which is defined as follows:

Tensorflow_serving/sources/storage_path/file_system_storage_path_source.protomessage ServableVersionPolicy {/ / Serve the latest versions (i.e. The ones with the highest version / / numbers), among those found on disk. / This is the default policy, with the default number of versions as 1. Message Latest {/ / Number of latest versions to serve. (The default is 1) Uint32 num_versions = 1;} / / Serve all versions found on disk. Message All {} / / Serve a specific version (or set of versions). / This policy is useful for rolling back to a specific version, or for / / canarying a specific version while still serving a separate stable / / version. Message Specific {/ / The version numbers to serve. Repeated int64 versions = 1;}}

Therefore, model_version_policy currently supports three options:

All: {} means to load all discovered model

Latest: {num_versions: n} indicates that only the latest n model are loaded, which is also the default option

Specific: {versions: M} indicates that only the model of the specified versions is loaded, which is usually used for testing

Therefore, when starting through tensorflow_model_server-port=9000-model_config_file=, a complete model_config_file format can be referenced as follows:

Model_config_list: {config: {name: "mnist", base_path: "/ tmp/monitored/_model", mnist model_platform: "tensorflow", model_version_policy: {all: {} Config: {name: "inception", base_path: "/ tmp/monitored/inception_model", model_platform: "tensorflow" Model_version_policy: {latest: {num_versions: 2}, config: {name: "mxnet", base_path: "/ tmp/monitored/mxnet_model" Model_platform: "tensorflow", model_version_policy: {specific: {versions: 1}} TensorFlow Serving compilation

In fact, the compilation and installation of TensorFlow Serving has been clearly written in the github setup documentation. I just want to emphasize one point here, and it is very important, that is, what is mentioned in the document:

Optimized buildIt's possible to compile using some platform specific instruction sets (e.g. AVX) that can significantly improve performance. Wherever you see bazel build' in the documentation, you can add the flags-copt-- copt=-msse4.1-- copt=-msse4.2-- copt=-mavx-- copt=-mavx2-- copt=-mfma-- copt=-O3 (or some subset of these flags). For example:bazel build-copt-copt=-msse4.1-copt=-msse4.2-copt=-mavx-copt=-mavx2-copt=-mfma-copt=-O3 tensorflow_serving/...Note: These instruction sets are not available on all machines, especially with older processors, so it may not work with all flags. You can try some subset of them, or revert to just the basic'- c opt' which is guaranteed to work on all machines.

This is very important. At the beginning, we did not add the corresponding copt option to compile. The test found that the performance of the compiled tensorflow_model_server is very poor (at least it does not meet our requirements), and the latency of client concurrent requests tensorflow serving is very high (basically all requests have a delay greater than 100ms). When these copt options are added, the same concurrency test is carried out on the same model, and the results show that 99.987% of the latency is within the 50ms.

As for the use of-- copt=O2 or O3 and its meaning, please see the description of gcc optimizers, which will not be discussed here. Because I don't understand either.

So, are they all compiled according to the exact same copt options given by the authorities? The answer is no! It depends on the cpu configuration of the server on which you are running TensorFlow Serving. Check / proc/cpuinfo to know which compiled copt configuration items you should use:

Matters needing attention

Since TensorFlow supports serve multiple versions of model at the same time, it is recommended that client specify the model and version that you want to call when calling gRPC, because different version corresponds to different model, and the predicted value may be quite different.

When copying and importing the trained model into model base path, compress it into a tar package as far as possible, copy it to base path and then decompress it. Because the model is large, the copy process takes some time, which may cause the exported model file to be copied, but the corresponding meta file has not been copied, and if TensorFlow Serving starts to load the model and cannot detect the meta file, the server will not be able to load the model successfully and will stop trying to load the version again.

If you use protobuf version

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.