Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the control flow and optimizer in Tensorflow?

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces what the control flow and optimizer in Tensorflow refer to. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

Control flow

If you know a little bit about tensorflow, you should know that graph is the most basic structure of tensorflow. All of Tensorflow's calculations are based on graphs. The nodes of a graph represents some basic mathematical operations, such as addition, convolution, pool, etc. Node uses protoBuf to describe, including the name of node, op,input, etc. Details can be found in the node_def.proto file in tensorflow. The op corresponding to Node is implemented using C++. The edges in the graph represent the direction of data flow and the dependencies between nodes. For example, A-> B means that B must be executed after A has finished. The following is the graphical structure of the inception network.

When we understand some of the basic op of tensorflow, we will have such questions. For the part that requires branch jump, loop, how tensorflow is implemented. How are these statements such as tf.cond,tf.while_loop expressed at the bottom? Tensorflow defines some basic control primitives, which can be implemented in a high-level control language through certain combinations, such as statements like a=op?C:D.

The design principle of tensorflow control flow is to introduce the least number of control modules, which can be used to represent many complex and widely used control processes. These control modules can also adapt to concurrent and distributed operations, and can realize automatic differentiation at the same time. In tensorflow, a compute node executes in the execution frame (execution frame, the stack frame of the analogy process). The control flow primitive is responsible for creating and managing execution. Intuitively, the TF runtime establishes one execution frame after another, and executes all the compute nodes belonging to that execution frame in the execution frame. Execution frames can be nested (parent-child relationships). Computing nodes that come from different execution frames and have no dependencies can be calculated in parallel. Here are the five most basic control primitives.

1 switch

According to the control condition p, the input data d is selectively propagated to the two output terminals.

2 merge

The Merge operator passes an available input to the output, and switch can be executed as long as any input is available.

3 enter

The Enter operator passes the input to the corresponding execution frame according to the unique identification name of the execution frame. The Enter operator is used to pass an tensor from an execution frame to a child execution frame.

4 exit

The Exit operator is used to transfer the data of the child execution frame to the parent execution frame.

5 nextIteration

The netIteration operator can pass its input to the next iteration of the currently executing frame. The runtime of Tensorflow can track the iteration in the execution frame at any time. Any op is identified by a unique iteration ID.

Now let's take a look at how these atomic instructions perform conditional judgment and loop.

The pseudo code of conditional judgment cond (pre, fn1, fn2) implementation in Tensorflow is as follows:

First create a conditional control context, which invokes two different calculation graphs. Which calculation graph to use is determined by the condition pre. Finally, the result of calling the two calculation graphs is output to the next calculation graph through the merge node. The merge node is used to ensure that as long as a graph has a result, it can be immediately sent to the next node for subsequent calculation. The picture is described as follows:

For loop statements, use the following pseudocode in tensorflow to complete:

First create a loop control context. Then create an enter and merge node to import the loop body variable. The use of the enter node is performed by identifying the loop body by the frame name. Merge is to pass the loop variable to the judgment condition graph to determine the cycle. The added switch node is used to select the calculation graph of the result of the loop condition judgment. The calculation result inside the loop body needs to be cycled many times, so it goes into the nextIteration node. The false output of Switch is used to terminate the loop, so entering the exit node will output the final result.

With these control nodes, tensorflow can split a graph into multiple sub-graphs and deploy them to multiple hardware execution devices. At the segmentation of the two subgraphs, send and receive nodes are added for data communication between different devices. Tensorflow has no restrictions on how nodes are allocated, as long as the node can be executed on this device. Without these control nodes, one node in a graph can only be executed once. With these control nodes, the calculation graph can be calculated in more ways. A node can be executed multiple times in a loop, and can also be assigned to different devices for execution.

Tensorflow can support automatic differentiation. When the user establishes the calculation graph and defines the loss function, tensorflow will establish the reverse propagation graph according to the structure of the calculation graph. Given a computing node, the differential can be obtained by mapping to the calculation formula. Thus, the representation of the node whose back propagation is found can be found. For the control node, the back propagation node of enter is exit,switch and the back propagation node is merge (for cond) or nextIteration+merge (for while_loop). The backpropagation node of Merge is switch. The backpropagation node of nextIteration is identity. The backpropagation node of Enter is exit. With these corresponding relations, the reverse propagation map can be inferred automatically. So we can get the gradient. And it can be calculated and allocated on multiple devices.

For example, for cond conditional judgment, if it is not conditional judgment in loop, then the mapping relationship between forward propagation graph and back propagation graph is:

Optimizer

The optimizer is optimized on the basis of the original calculation diagram to improve the efficiency of calculation in hardware. Optimization has several main goals: to simplify the graph structure, to reduce the maximum hardware storage utilization, and to carry out hardware-friendly graph conversion. There are many graph optimization methods, some of which have nothing to do with hardware and some related to the details of hardware implementation. High-level optimization is a simplification of the graph, which is transparent to the hardware. Some redundant calculations can be removed by simplification. Such as constant folding, removal of redundant control nodes, and so on. Others simplify the formula by using the law of association and the law of distribution, such as:

1) the simplification of the graph can remove some redundant calculations and replace the graph with the final equivalent result. For example, a process of establishing a tensor:

Merge the shape creation and data creation of tensor and replace them directly with constants. This removes the shape creation process.

2) constant folding can replace more than two constants with one constant, which requires some calculations by the optimizer. For example:

3) Algebraic optimization uses the properties of arithmetic to carry out certain transformation. For example:

AddN is equivalent to a parallel computing unit that can be supported on hardware and can calculate multiple inputs at a time. So you can replace three consecutive additions with one parallel addition.

The second one uses the arithmetic distribution law and the association law to extract the three multipliers with the same multiplier. The last one makes an equivalent transformation of the logic, thus reducing the number of computing nodes.

When this matrix+scalar, you need to broadcast the scalar first, and then add it. After the conversion, the number of broadcasts is reduced.

These two eliminate redundant computing.

4) op fusion fuses multiple computing nodes into one node to calculate. This is related to hardware, for example, a hardware computing unit can complete conv+batch_norm, then such computing integration can be achieved, and there is no need for a separate computing unit. Common op convergences are:

5) the purpose of storage optimization is to reduce the off-chip access frequency, which can improve the efficiency of data operation and reduce the waiting time for data loading.

So much for sharing what the control flow and optimizer refer to in Tensorflow. I hope the above content can be helpful to you and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report