What is the implementation principle of the .NET Core distributed link tracing framework? 04/30 Update SLTechnology News&Howtos

What is the implementation principle of the .NET Core distributed link tracing framework?

2025-04-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

In this article, the editor introduces in detail "what is the implementation principle of the .NET Core distributed link tracking framework". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "what is the implementation principle of the .NET Core distributed link tracking framework" can help you solve your doubts.

Distributed tracking what is distributed tracking distributed system

When we use Google or Baidu search, the query service will distribute keywords to multiple query servers, each server searches within its own index, and the search engine can get a large number of accurate search results in a short time; at the same time, according to the keywords, the advertising subsystem will push appropriate relevant advertisements and obtain the website weight from the bidding ranking subsystem. Usually a search may require thousands of servers to participate in, need to go through many different systems to provide services.

Multiple computers form a huge system through the network, which is a distributed system.

In micro-service or cloud native development, it is generally believed that distributed systems are connected through a variety of middleware / service grids, which provide shared resources, functions (API, etc.), files, etc., so that the whole network can work as a computer.

Distributed tracking

In a distributed system, a request from a user is distributed to multiple subsystems, processed by different services, and the result is returned to the user. The time for a user to make a request and get a result is a request cycle.

When we shop, we only need a very simple process:

Get coupons-> place order-> pay-> wait for receipt

However, in the background system, each link needs to go through multiple subsystems to cooperate, and there is a strict process. For example, when placing an order, you need to check whether there is a coupon, whether the coupon can be used for the current commodity, whether the current order meets the conditions for using the coupon, and so on.

The following figure shows the flow of the system processing the request after a user request.

There are many arrows in the figure that point to the service / subsystem to flow through next, and these arrows make up the link network.

In a complex distributed system, the poor performance of any subsystem will affect the whole request cycle. Based on the picture above, we imagine:

1. New services may be added or old services may be deleted every day in the system, or upgrades may be carried out. When there is an error in the system, how can we locate the problem?

two。 When the user requests, the response is slow, how to locate the problem?

3. Services may be developed in different programming languages. Is the way 1 and 2 locate the problem suitable for all programming languages?

What's the use of distributed tracking?

With the rise of micro-services and cloud native development, more and more applications are developed based on distribution, but after large-scale applications are split into micro-services, the dependencies and invocations between services become more and more complex. These services are developed by different teams and in different languages, and deployed on different machines, the interfaces provided between them may be different (gRPC, Restful api, etc.).

In order to maintain these services, the idea of Observability has emerged in the software field, in which the maintenance of micro services is divided into three parts:

Metrics (Metrics): for monitoring and alarm

Distributed tracking (Tracing): used to record all tracking information in the system

Logging: record information that can only be discrete in each service

These three parts are not independent. For example, Metrics can monitor whether Tracing and Logging services are running properly. Tacing and Metrics services generate logs during operation.

In recent years, the emergence of APM system, APM called application performance management system, can be used for software performance monitoring and performance analysis. APM is a kind of Metrics, but now there is a tendency to integrate Tracing.

Back to the point, what is the use of a distributed tracking system (Tracing)? Here is an example of Jaeger, which can:

Distributed tracking information transmission

Distributed transaction monitoring

Service dependence analysis

Show the cross-process call chain

Positioning problem

Performance optimization

Jaeger needs to be combined with the backend for result analysis. Jaeger has a Jaeger UI, but it does not have many functions, so it also needs to rely on the Metrics framework to visualize the results from presentation, as well as custom monitoring and alarm rules, so it is natural that Metrics will also do the things of Tracing.

Dapper

Dapper is a distributed link tracking system used internally by Google and is not open source.

Dapper user Interface:

Implementation of distributed tracking system

The following figure is a distributed system initiated by user X request and passing through multiple services. A, B, C, D, E represent different subsystems or processes.

In this figure, An is the front end, B and C are the middle layer, and D and E are the back end of C. These subsystems are connected through the rpc protocol, such as gRPC.

The implementation of a simple and practical distributed link tracking system is to collect tracking identifiers (message identifiers) and timestamps (timestamped events) for each request and response on the server.

The tracking system of distributed services needs to record information about all the work done in the system after a specific request. User requests can be parallel, there may be a large number of actions to be processed at the same time, and a request will pass through multiple services in the system, and all kinds of tracking information are generated all the time in the system. Tracking information generated by a request in different services must be associated.

In order to associate all record entries with a given initiator X and record all information, there are now two solutions, black box (black-box) and annotation-based-based monitoring.

Black box scheme:

Assuming that there is no additional information to be tracked beyond the above information, statistical regression techniques are used to infer the relationship between the two.

Dimension-based scheme:

Rely on the application or middleware to explicitly mark a global ID to connect each record and initiator's request.

Advantages and disadvantages:

Although black-box schemes are lighter than tagging schemes, they need more data to achieve sufficient accuracy because they rely on statistical inferences. The main drawback of the annotation-based approach is that, obviously, code implantation is required. In our production environment, because all applications use the same threading model, control flow and RPC system, we find that code implantation can be limited to a small general component library, thus realizing that the application of the monitoring system is effectively transparent to developers.

Dapper is an annotation-based scheme, and next we will introduce some conceptual knowledge in Dapper.

Trace Tree and span

Formally, the Dapper tracking model uses a tree structure, Span, and Annotation.

In the previous picture, we can see that the whole request network is a tree structure, and the user request is the root node of the tree. In the trace tree structure of Dapper, the tree node is the basic unit of the whole architecture.

Span is called span, and when a node receives a request and completes the request, a span,span records all kinds of information generated in this process. Each node generates a unique span id when processing each request. When A-> C-> D, multiple consecutive span will have a parent-child relationship, so a span not only saves its own span id, but also needs to associate the parent and child span id. The generation of span id must be high-performance and be able to express the chronological order clearly, which will be covered later in the introduction of Jaeger.

Annotation is translated into comments, and in a span, you can add more trace details to span, which can help us monitor the behavior of the system or help debug problems. Annotation can add anything.

So far, some knowledge of distributed tracking and Dapper are briefly introduced, but these are not enough to strictly explain the knowledge and concepts of distributed tracking. Readers are advised to read Dapper papers when they have time.

To achieve Dapper, we also need code burying points, sampling, tracking collection, and so on. I won't talk about it in detail here, which will be described later, and readers can also take a look at the paper.

Jaeger and OpenTracingOpenTracing

OpenTracing is a distributed system-independent API and a tool for distributed tracking. It not only provides a unified standard API, but also devotes itself to various tools to help developers or service providers develop programs.

OpenTracing provides access to SDK for standard API and supports these languages: Go, JavaScript, Java, Python, Ruby, PHP, Objective-C, Cellular, C #.

Of course, we can also package SDK by ourselves according to the communication protocol.

Next we need to clarify some concepts and knowledge points in OpenTracing little by little. Since jaeger is the best implementation of OpenTracing, Jaeger is Opentracing later, and there is no need to make a strict distinction between the two.

Jaeger structure

The first is the JAEGER part, this part is the code burying point and other processes, which are processed in the distributed system, when a trace is completed, the data is pushed to jaeger-collector through jaeger-agent. Jaeger-collector is responsible for handling tracking information pushed from all directions, and then storing it to the back end, which can be stored in ES, database, and so on. Jaeger-UI will allow users to see the analyzed tracking information on the interface.

OpenTracing API is encapsulated into the SDK (jaeger-client) of the programming language, such as .dll in C # and .jar in Java. The application code is buried by calling API.

Jaeger-Agent is a network daemon that listens to receive span data on the UDP port and sends the data in batches to collector.

OpenTracing data model

In OpenTracing, tracking information is divided into two cores: Trace and Span, which store tracking information according to a certain structure, so they are the core of the data model in OpenTracing.

Trace is a complete trace, and Trace consists of multiple Span. The following figure is an example of a Trace consisting of eight Span.

Tracing:

A Trace can be thought of as a directed acyclic graph (DAG) of Spans.

It is a bit difficult to translate, which probably means that Trace is a directed acyclic graph composed of multiple Span.

In the above example, a Trace passes through eight services, A-> C-> F-> G is in strict order, but in terms of time, B and C can be parallel. To accurately represent the temporal relationship of these Span, we can use the following figure:

It is important to note that A-> C-> F does not mean that An execution ends, and then C starts execution, but that A depends on C, while C depends on F. Therefore, when the process of A relying on C is completed, it finally returns to A to continue its execution. So A has the largest span in the above picture.

Span format

To learn more, you must first understand Span. Readers are asked to carefully compare the following pictures with Json:

Json address: https://github.com/whuanle/DistributedTracing/issues/1

Later, we will focus on this picture and Json to illustrate Span-related knowledge.

Trace

A simplified Trace is as follows:

Note: field names are different in different programming languages, and so are the formats of gRPC and Restful API.

"traceID": "790e003e22209ca4", "spans": [...], "processes": {.}

As mentioned earlier, in OpenTracing, Trace is a directed acyclic graph, then Trace must have one and only one starting point.

This starting point creates a Trace object that initializes trace id and process,trace id as a 32-length string, which is a timestamp, and process is the information of the host where the starting process is located.

Here's a little bit about how trace id is generated. Trace id is made up of 32 strings, but only 16 are actually used, so let's understand the process in terms of 16 characters.

First of all, get the current timestamp, for example, 1611467737781059, a total of 16 numbers, in microseconds, indicating a time of 2021-01-24 13:55:37. Units below seconds are not given here, but time is clearly indicated.

In C #, the code that converts the current time to this timestamp:

Public static long ToTimestamp (DateTime dateTime) {DateTime dt1970 = new DateTime (1970, 1, 1, 0, 0, 0, 0); return (dateTime.Ticks-dt1970.Ticks) / 10;} / / result: 1611467737781059

If we use Guid generation or string storage directly, we will consume some performance and memory, while using long can just represent a timestamp and save memory.

After you get this timestamp, you need to transfer it to Jaeger Collector and convert it to byet data. It is not clear why you want to do this, just transfer it as required.

Turn long into a byte array:

Var bytes = BitConverter.GetBytes (time); / / if (BitConverter.IsLittleEndian) {Array.Reverse (bytes);}

Long occupies 8 bytes, and each byte value is as follows:

0x00 0x05 0xb9 0x9f 0x12 0x13 0xd3 0x43

Then transfer it to Jaeger Collector, so you get a string of binary, how to express as a string of trace id?

You can first restore to long, and then output the long as a hexadecimal string:

To a string (this is C#):

Console.WriteLine (time.ToString ("x016"))

Results:

0005b99f1213d343

The same is true of Span id, where each id is unique in time and the resulting string is unique because it is associated with a timestamp.

This is the trace id in trace, and trace process is the information of the machine that initiated the request, which is stored in the form of Key-Value in the following format:

{"key": "hostname", "type": "string", "value": "Your-PC"}, {"key": "ip" "type": "string", "value": "172.6.6.6"}, {"key": "jaeger.version", "type": "string" "value": "CSharp-0.4.2.0"}

Trace id and process in Ttace are done here, and then let's talk about trace's span.

Span

Span consists of the following information:

An operation name: operation name, which is required

A start timestamp: start timestamp, must be

A finish timestamp: end timestamp, must be

The Span Tags.:Key-Value form represents the requested label, optional

Span Logs:Key-Value form, record simple, structured log, must be a string type, optional

SpanContext: span contexts, pass in different span, build relationships

Referencest: other Span referenced

If the span is a parent-child relationship, you can use SpanContext to bind this relationship. There are two representations of father-son relationship: ChildOf and FollowsFrom. ChildOf means that the parent Span depends on the child Span to some extent, while FollowsFrom means that the parent Span does not depend on the result of the child Span at all.

The simplified information for a Span is as follows (regardless of the case of the field name):

{"traceID": "790e003e22209ca4", "spanID": "4b73f8e8e77fe9dc", "flags": 1, "operationName": "print-hello", "references": [], "startTime": 1611318628515966, "duration": 259 "tags": [{"key": "internal.span.format", "type": "string", "value": "proto"}] "logs": [{"timestamp": 1611318628516206, "fields": [{"key": "event" "type": "string", "value": "WriteLine"}]} OpenTracing API

In OpenTracing API, there are three main objects:

Tracer

Span

SpanContext

Tracer can create Spans and learn how to Inject (serialize) and Extract (deserialize) their metadata across process boundaries. It has the following functions:

Start a new Span

Inject one SpanContext to one vector

Extract a SpanContext from the carrier

A Tracer is created by the starting process, and then the startup process initiates a request, and each action produces a Span. If there is a parent-child relationship, the Tracer can associate them. When the request is completed, Tracer pushes the trace information to Jaeger-Collector.

SpanContext conveys information in different Span, and SpanContext contains simple Trace id, Span id and other information.

Let's continue with the following figure as an example.

A creates a Tracer, then creates a Span to represent itself (A), creates two more Span, represents B and C respectively, and then transmits some information to B, C, B and C through SpanContext. After receiving the message from A, An also creates a Tracer, which is used to Tracer.extract (...), where B has no follow-up and can directly return the result, while C's Tracer continues to create two Span to pass SpanContext to D and E.

After reading this, the article "what is the implementation principle of the .NET Core distributed Link tracking Framework" has been introduced. If you want to master the knowledge of this article, you still need to practice and use it before you can understand it. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.