How to use occlusion analysis for interpretable instructions for DNN models 07/15 Update SLTechnology News&Howtos

How to use occlusion analysis for interpretable instructions for DNN models

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the explanatory instructions on how to use occlusion analysis for DNN model. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.

There are many interpretation methods of deep neural network, each of which has its own advantages and disadvantages. In most cases, we are interested in the local interpretation, that is, the interpretation of the network output of a particular input, because the DNNs is often too complex to be interpreted globally (independent of the input).

Generally speaking, all local interpretation methods share a common goal: to reliably (that is, accurately) represent the function f to be interpreted (such as DNN), which can at least partially explain the relationship between their inputs and inputs.

Of course, such an explanation must also be understandable to be useful. The easiest way to achieve this is to add an important score for each input dimension, that is, to create an attribution graph. The attribution method assigns the weight of the model output to each dimension of a given input.

In this short article, I will introduce a basic attribution technique: occlusion analysis. The basic concept is simple: for each input dimension of input x, we evaluate the model without that dimension and observe how the output changes. In particular, if | | f (x)-f (x_without_i) | | large, then the dimension must be important, because deleting it will change the output.

Occlusion analysis calculates the importance of each patch by observing the change of the model output y after removing patch. Individual results can be combined into a single attribution diagram.

Advantages of occlusion analysis

If the dimensions are independent, the occlusion analysis is completely reliable because you accurately measure the marginal effects of each dimension.

Unfortunately, in most cases, such as image data, this is not the case. Here, it is recommended that you delete the entire color block instead of a single pixel. The idea is that usually the information of a single pixel can be reconstructed from its adjacent pixels. Therefore, if you have an image of a cat, deleting a cat pixel will never have much impact on the output, while deleting a patch that covers the ear may result in a significant decrease in the model's prediction of the cat.

Another advantage of occlusion analysis is that it is a post-hoc method. This means that it can be used to explain any (trained) model. There is no need to train again. This model can even be a non-differentiable black box. Occlusion analysis can be used as long as you can enter input and receive output.

Compared with the gradient-based interpretation method, another advantage of occlusion analysis is that it can even deal with locally flat functions with no or only small gradients.

Some questions.

But what does it actually mean to delete a size? After all, our model always uses the same size of input. Deleting a size means setting it to a value with 0 Information. The value depends on the dataset. For image data, we usually use the average RGB value. For other data types, the dimension is usually set to 0. We will see other considerations in the future.

As you may have guessed, occlusion analysis has a big warning: we must input each occlusion into the model and evaluate it. If what you enter has many sizes, for example, if the image is 256x256 pixels, you must run 256x256 = 65.536 (! The model can get a complete analysis. In most cases, this is very expensive, especially if you want to run an analysis on the entire dataset.

A method of reducing the computational cost of adopting multiple features and deleting them together (for example, a 8x8 square in a picture). This makes sense only for certain dimensions that are so interdependent that they semantically belong to data types that are together.

Distributed displacement (Distribution Shift)

Occlusion analysis has another problem, which does not discuss much: distributed displacement. If we think about it carefully, the output changes we observed in the analysis have another reason in addition to the deletion of information: the disturbed input is no longer in the data distribution on which our training model is based.

In machine learning, we usually assume that the model will be evaluated based on data from the same distribution as the training samples. If this is not the case (that is, if we remove pixels), then the model output may be wrong. Although the effect of removing a single pixel is usually negligible, the distance between the whole block and the training data manifold is larger, so it has a greater impact on the output.

But there are some ways to alleviate the problem. The basic idea is to delete information while still keeping close to the data distribution. This means using more complex information deletion techniques to make the image still look like a natural image.

One way is to blur the patch you want to "delete". It is not the most effective method, but it should at least delete fine-grained texture information, and it is easy to implement.

A better approach is to use the repair algorithm: just use another model to guess (that is, inpaint) the content of the missing part. No information is actually added, because the repair depends only on the remaining pixels of the image, but the result still looks close to the normal image and therefore closer to the training data. You can use complex algorithms designed by Yu et al., or you can use easily accessible libraries, such as openCV.

The problem with using the repair algorithm is:

1) it makes the process more expensive in calculation

2) you must run it first

3) if you do not use a standard benchmark dataset, you may have to retrain it.

Because of its computational cost, occlusion analysis is certainly not a tool for any situation, but it must have some uses. Occlusion analysis can be great, especially if your data is small, or if you just want something that is easy to implement and reliable (just pay attention to the size of the patch). The closely related and more complex method is the Shapley value. Unfortunately, they are more expensive to calculate. If you are using a differentiable model, the simple method second only to it is a gradient-based interpretation.

This is the end of the explanatory instructions on how to use occlusion analysis for DNN model. I hope the above can be helpful and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.