How to use PyTorch to train Leela Zero neural network 04/26 Update SLTechnology News&Howtos

How to use PyTorch to train Leela Zero neural network

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to use PyTorch for neural network training of Leela Zero". In daily operation, I believe many people have doubts about how to use PyTorch for neural network training of Leela Zero. Xiaobian consulted all kinds of materials and sorted out simple and easy operation methods. I hope to help you answer the doubts of "how to use PyTorch for neural network training of Leela Zero"! Next, please follow the small series to learn together!

Lately, I've been looking for ways to speed up my research and manage my experiments, especially around writing training pipelines and managing experiment profiles, and I've discovered two new projects called PyTorch Lightning and Hydra. PyTorch Lightning helps you write training pipes quickly, while Hydra helps you manage profiles efficiently.

PyTorch Lightning：https://github.com/PyTorchLightning/pytorch-lightning

Hydra：https://hydra.cc/

To practice using them, I decided to write a training pipeline for Leela Zero(https://github.com/leela-zero/leela-zero). I did this because it was a wide-ranging project involving training large networks on large data sets using multiple GPUs, and it was arguably a very interesting technical challenge. Also, I've implemented a smaller version of AlphaGo chess before (https://medium.com/@peterkeunwoo/beating-my-brother-in-chess-cb17739ffe2), so I thought this would be an interesting amateur project.

In this blog, I will explain the main details of this project so that you can easily understand the work I have done. You can read my code here: https://github.com/yukw777/leela-zero-pytorch

Leela Zero

The first step is to figure out how Leela Zero works internally. I quoted heavily from Leela Zero's documentation and its Tensorflow training pipeline.

neural network structure

Leela Zero's neural network consists of a ResNet "tower" with two "heads," the "policy head" responsible for policy and the "value head" responsible for calculating value, as described in the AlphaGo Zero paper (https://deepmind.com/blog/article/alphabago-zero-starting-scratch). As described in the paper, the convolutional filters starting with the policy "head" and value "head" are all 1x1, and all other convolutional filters are 3x3. The game and board features are encoded as tensors of [batch size, board width, board height, number of features] shapes, first entered through the residual tower. The tower then extracts abstract features and inputs them through each "head" to calculate the probability distribution of the strategy for the next move and the value of the game to predict the winner of the game.

You can find the details of the network implementation in the code snippet below.

weight format

Leela Zero uses a simple text file to save and load network weights. Each line in the text file has a series of numbers that represent the weight of each layer of the network. First the residual tower, then the policy header, then the value header.

Convolutional layers have 2 weight rows:

Convolution weights with [output, input, filter size, filter size] shapes

channel deviation

Batchnorm layer has 2 weight rows:

Batchnorm mean

Batchnorm variance

The inner product (fully connected) layer has 2 weighted rows:

Layer weight with [output, input] shape

output deviation

I wrote unit tests to make sure my weight files were correct. Another simple integrity check I use is to calculate the number of layers and compare it to Leela Zero after loading my weight file. The number of layers is:

n_layers = 1 (version number) + 2 (input convolution) + 2 (input batch norm) + n_res (number of residual blocks) * 8 (first conv + first batch norm + second conv + second batch norm) + 2 (policy head convolution) + 2 (policy head batch norm) + 2 (policy head linear) + 2 (value head convolution) + 2 (value head batch norm) + 2 (value head first linear) + 2 (value head second linear)

So far, this seems simple, but you need to pay attention to one implementation detail. Leela Zero actually uses the deviation of the convolutional layer to represent the learnable parameters (gamma and beta) of the next batch norm. This is done so that the format of the weight file (with only one line representing layer weights and another line representing deviations) does not have to be changed when adding normalized layers.

Currently, Leela Zero uses only the beta term of the normalization layer, setting gamma to 1. So how do we actually use convolution bias to produce the same results as applying learnable parameters in the normalized layer? Let's first look at the equation for the normalized layer:

y = gamma * (x - mean)/sqrt(var - eps) + beta

Since Leela Zero sets gamma to 1, the equation is:

y = (x - mean)/sqrt(var - eps) + beta

Now, let x_conv be the output of the convolutional layer without bias. Then, we want to add some bias to x_conv, so that when you run it in the normalized layer without beta, the result is the same as running x_conv in the normalized layer equation with beta only:

(x_conv + bias - mean)/sqrt(var - eps) = (x_conv - mean)/sqrt(var - eps) + beta x_conv + bias - mean = x_conv - mean + beta * sqrt(var - eps) bias = beta * sqrt(var - eps)

So if we set the convolution bias to beta * sqrt(var - eps) in the weight file, we get the expected output, which is what LeelaZero does.

So how do we achieve it? In Tensorflow, you can tell the normalization layer to ignore the gamma term by calling tf.layers.batch_normalization(scale=False) and then use it.

Unfortunately, in PyTorch you cannot set the normalization layer to ignore gamma only, you can only ignore gamma and beta by setting the affine parameter to False: BatchNorm2d(out_channels, affine=False). So, I set the normalization layer to ignore both, and then simply add a tensor to the back, which represents beta. The convolution bias of the weight file is then calculated using the formula bias = beta * sqrt(var - eps).

training pipeline

After figuring out the details of Leela Zeros's neural network, it was time to tackle the training pipeline. As I mentioned, I wanted to practice using two tools: PyTorch Lightning and Hydra to speed up writing training pipelines and efficiently manage lab configurations. Let's look at how I use them.

PyTorch Lightning

Writing the training pipeline was my least favorite part of the study: it involved a lot of repetitive boilerplate code and was difficult to debug. Because of this, PyTorch Lightning is like a clean stream to me, it's a lightweight library, PyTorch doesn't have a lot of auxiliary abstractions, and it handles most of the boilerplate code when writing training pipelines. It allows you to focus on the more interesting parts of your training pipeline, such as model architecture, and makes your research code more modular and debuggable. In addition, it supports out-of-the-box training for multi-GPU and TPU!

In order to use PyTorch Lightning as my training pipeline, the most coding I need to do is write a class I call NetworkLightningModule, which inherits from LightningModule to specify the details of the training pipeline and pass them to the trainer. For details on how to write your own Lightning Module, refer to PyTorch Lightning's official documentation.

Hydra

Another part I've been working on is experimental management. When you do research, you inevitably have to run lots of different experiments to test your hypotheses, so it's important to track them in a scalable way. So far, I have relied on profiles to manage my experimental releases, but using flat profiles quickly became unmanageable. Using templates is one solution to this problem. However, I found that templates can also end up getting messy because when you overlay multiple layers of value files to render your profile, it's hard to keep track of which values came from which value file.

Hydra, on the other hand, is a component-based configuration management system. Instead of using separate templates and value files to render the final configuration, you can combine multiple smaller configuration files to make up the final configuration. It's not as flexible as template-based configuration management systems, but I find component-based systems strike a good balance between flexibility and maintainability. Hydra is one such system tailored specifically for research scripts. Its invocation is a bit awkward because it requires you to use it as the main entry point for scripts, but actually I think with this design, it's easy to integrate with your training scripts. In addition, it allows you to manually override configuration via the command line, which is useful when running different versions of your experiment. I often use Hydra to manage network architectures and training pipeline configurations at different scales.

assessment

To evaluate my training network, I used GoMill (https://github.com/mattheww/gomill) to hold Go tournaments. It is a library of games running on Go Text Protocol (GTP) engines, Leela Zero being one of them. You can find the match configuration I used here (https://github.com/yukw777/leela-zero-pytorch/blob/master/eval/bg-vs-sm.ctl).

conclusion

By using PyTorch-Lightning and Hydra, it is possible to greatly speed up the writing of training pipelines and effectively manage experimental profiles. I hope this project and blog post will help you with your research as well. You can view the code here: https://github.com/yukw777/leela-zero-pytorch

Original link: towardsdatascience.com/training-neural-networks-for-leela-zero-using-pytorch-and-pytorch-lightning-bbf588683065

At this point, the study of "how to train Leela Zero neural network with PyTorch" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.