DeepMind releases a new model design tool Tracr: building models in reverse from interpretable logic 07/16 Update SLTechnology News&Howtos

DeepMind releases a new model design tool Tracr: building models in reverse from interpretable logic

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Original title: "think like Transformer!" DeepMind releases a new model design tool Tracr: building models in reverse from interpretable logic.

The code is compiled directly into the Transformer model, and doing experiments has never been so easy!

"interpretability" has always been a difficult problem in deep learning, and users can not understand the operation mechanism of the model, so they can not safely apply the model to the actual scene.

Recently, researchers from the Federal Institute of Technology Zurich and DeepMind have proposed a new model construction tool, Tracr, which is directly written by people according to the "known mechanism" for different tasks, and then compiled by Tracr into the weight of the model, making the interpretation of the model easier!

Links to papers: https://arxiv.org/ pdf / 2301.05062.pdf

Code link: https://github.com/ deepmind / tracr

The input to Tracr is code written in the domain-specific language RASP, and the output is the weight of a standard GPT-like Transformer structure that contains only decoders.

In the experimental part, the researchers created a series of ground truth Transformers using Tracr, and implemented programs including calculating token frequency, sorting, and Dyck-n parenthesis checking.

How to explain the model? Interpretability is an important means to understand the machine learning model, but because the actual operation mode of the model is still not clear, so most of the current research results are difficult to evaluate.

One of the working mechanisms Mechanistic interpretability attempts to reverse engineer the neural network (reverse engineering) to give a mechanism explanation of the algorithm implemented by the model, and has made progress in a series of tasks, including convolution neural network for image classification, Transformer language model and so on.

However, there are still some problems in this method, such as the lack of relevant tools, the lack of deep explanation of the model mechanism, and the need for researchers to make creative explanations.

The standard method of evaluation mechanism interpretation combines the evidence of many temporary experiments. However, because of the high cost, many methods can only be evaluated in the toy model, or on a small number of unimportant circuits in the real model.

Tracr's solution is to directly solve the problem of lack of basic mechanism explanation by "compiling" human-readable code into the weights of neural networks.

In other words, Tracr actually acts like a compiler.

There are three main components involved in Tracr:

1. RASP code RASP, namely Restricted Access Sequence Processing Language, is a language proposed in 2021 to express Transformer computing. It can be used as a computing model to describe Transformers, and is equipped with a corresponding interpreter to run RASP code.

You can think of the RASP program as a computational graph, and each node on the graph takes a specific value according to a given input token sequence.

The RASP language includes two basic node types: sequence operations (Sequence Operations,s-op), token sequences and indices sequences that return input values, element operations (Elementwise operations), selection-aggregation operations, and so on.

In most cases, RASP operations can be mapped directly to components of the Transformer model, including embedding, MLP layers, and Attention layers.

two。 Modification to RASP language although the operation of RASP can be directly mapped to Transformers, it still needs to be modified to translate the model weight.

3. The assembly language of craft,Transformer if RASP is a compiled high-level language, then craft is the assembly language, which provides more abstraction than operating on a pure weight matrix.

Craft can represent a vector space with a base dimension (basis dimensions) and its corresponding operations, and the label of the base direction can be used to define projection or other linear operations. Importantly, craft abstracts the need for filling in the tracking weight matrix.

Tracr: the code of the Transformer compiler Tracr is written in Python, and the RASP implementation is embedded in Python, so that the RASP program can be written directly in Python, and the variable coding (variable encoding) can be easily annotated.

In Tracr, the RASP program is a data structure that is built step by step by passing dependencies to each operation, while making some basic simplifications to the RASP program.

Tracr's process of translating RASP programs into Transformer weights consists of six steps:

1. Build a calculation graph to track the entire program to create a directed graph that represents the computing process. For the output s-op, the figure includes the source node representing tokens and indices, and the sink node representing the output s-op.

two。 Inference s-op values for each s-op, you need to decide how to embed it in the remaining stream; in order to use category coding, you need to know which values an s-op can take.

Because the calculation is deterministic, all nodes have a limited set of output values based on a limited input vocabulary and context size.

So the main operation of the second step is to traverse the graph and mark its possible output for each node; the annotation uses a simple heuristic to ensure that a superset of the s-op value set is found.

3. Independent translation s-ops independently considers each node in the calculation diagram and converts it into a craft component; element operations are translated into MLP blocks, and select-aggregate operations are translated into attention blocks.

Use manually designed MLP and attention module libraries to approximate arbitrary functions of digital and classified inputs and outputs; use MLPs with classified inputs and outputs as lookup tables; and MLP with digital inputs and outputs use explicit structures based on general function approximation theorems.

For the attention layer, translate the selector into the 𝑊 _ {𝑄𝐾} operator and the corresponding aggregation operation into the 𝑊 _ {𝑂𝑉} operator.

Currently, only attention to classification input is supported.

4. Assign components to layers in order to build a Transformer model, you need to assign all the craft components in the calculation diagram to each layer.

Ideally, the goal is to find the smallest model to carry out the required calculations, but it can generally be expressed as a combinatorial optimization problem with several constraints: the Transformer structure has alternating attention and MLP layers, and all interdependent calculations need to be in the correct order.

For the sake of scope, this problem can be solved by heuristic method.

First, the longest path from input to a given node is calculated, and the length of the path is an upper limit of the number of layers that can be assigned to the node, and then additional heuristics are applied to combine layers with blocks that can be calculated in parallel.

This method can return a correct but sometimes suboptimal layer allocation.

5. A craft model is constructed to directly sum the input and output spaces of the model components as the constructed residual flow space (residual stream space).

In other words, embed each s-op into its own orthogonal subspace, which is reserved for its use only throughout the network.

Then iterate through the compute graph in the order determined by layer allocation, and stack the components to get a complete Transformer represented by craft.

6. Finally, the Transformer weight is assembled, and the craft representation of the model is transformed into the specific model weight.

First, the parallel MLP layer is merged into one layer, and then the parallel attention heads are merged into one layer. In the attention layer, the 𝑊 _ {𝑄𝐾} and 𝑊 _ {𝑂𝑉} matrices are divided into 𝑊𝑞, 𝑊𝑘, 𝑊𝑜 and 𝑊𝑣 weight matrices.

Then adjust the shape of the ownership weight and connect it to the Transformer schema to infer the model configuration (depth, layer width, residual flow size, etc.) to accommodate the elements created.

As long as you re-implement step 6, you can directly extend Tracr to support any other Transformer implementation.

The application of Tracr in interpretable research can accelerate the verification process of controlled experiments to test specific hypotheses about the computational structure of Transformer; in this way, it is also equivalent to becoming an experimental platform for interpretable research.

The researchers wrote RASP programs for examples such as token counting and sorting.

The model of test case compilation of interpretable tools can naturally be used as the basis for testing "interpretive fidelity" and provides a way to forge interpretations given by interpretable techniques.

In the end, these models can be used to build a test case base of interpretable tools, which in turn can achieve quantitative evaluation indicators.

Another way to replace model components to assess understanding of how the model works is to replace parts of the model with hand-coded components.

For example, some researchers test their understanding of how Transformer implements modular addition by replacing the components of the model with their own idealized implementations. The results show that this idea can improve the performance of downstream tasks, which strongly proves that the proposed explanation is correct.

Although Tracr compiles an algorithm into a complete Transformer model, it can also be adjusted to compile only part of the training model by modifying the code, making it easier to evaluate the understanding of large models.

To understand model phenomena and develop new technologies, in addition to evaluation, compiled models can be used as a test bed for studying circuits-level phenomena and developing new ways to interpret Transformer models.

Reference:

Https://arxiv.org/pdf/2301.05062.pdf

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.