Interpretation of KDD 2019 papers: model interpretability under multi-classification 02/14 Update SLTechnology News&Howtos

Interpretation of KDD 2019 papers: model interpretability under multi-classification

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

A few days ago, the paper "Axiomatic Interpretability for Multiclass Additive Models" written by Alibaba research intern Zhang Xuezhou and Ant Financial Services Group senior algorithm expert Lou Yin was selected into the top global data mining conference KDD 2019. This paper is a detailed interpretation of this paper. Paper address: https://www.kdd.org/kdd2019/a... Preface

Model interpretability is an important topic in machine learning. The object we study here is the generalized additive model (Generalized Additive Models, referred to as GAMs). GAM has been widely used in medical and other scenarios that require high interpretation [1].

As a completely white-boxed model, GAM provides better model representation than (generalized) linear model (GLMs): GAM can transform single-feature and double-feature crossover (pairwise interaction) nonlinearly. GAM with pairwiseinteraction is often called GA2M. The following is GA2

Mathematical expression of M model:

Where g is linkfunction,fi and fij is called shape function, which is the feature transformation function that the model needs to learn respectively. Because fi and fij are functions at low latitudes, each function in the model can be visualized, making it easy for modelers to understand how each feature affects the final prediction. For example, in [1], the effect of age on the fatality rate of pneumonia can be shown in a picture.

Because of the nonlinear transformation of features by GAM, GAM can often provide more powerful modeling ability than linear models. In some studies, the effect of GAM is often close to that of Boosted Trees or Random Forests [1,2,3].

The contradiction between Visualization Image and Model Prediction Mechanism

This paper first discusses the contradiction between the visual image of the traditional interpretable algorithm (such as logical regression, SVM) and the prediction mechanism of the model under the multi-classification problem. If we understand the model prediction mechanism directly through these unprocessed visual images, it may cause modelers to misinterpret the model prediction mechanism. As shown in figure 1, on the left is the shape function of age under a multi-category GAM. At first glance, this picture shows that the risk of Diabetes I increases with age. However, when we look at the actual prediction probability (right), the risk of Diabetes I should actually decrease with age.

In order to solve this problem, this paper proposes a post-processing method (AdditivePost-Processing for Interpretability, API), which can process the GAM trained by any algorithm, so that the visual image of the processed model is consistent with the prediction mechanism of the model without changing the model prediction, so that modelers can safely observe and understand the prediction mechanism of the model through traditional visualization methods. Without being misled by the wrong visual information.

The model under multi-classification can be explained.

The design philosophy of API comes from two interpretable theorems (Axioms of Interpretability) obtained from the long-term use of GAM. We want a GAM model to have the following two properties:

The shape of any shape function fik (corresponding to feature I and class k) must be consistent with the shape of the real prediction probability competition, that is, we do not want to see a shape function is increasing, but in fact the prediction probability is decreasing.

Shape function should avoid any unnecessary unsmoothness. Non-smooth shape function can make it difficult for modelers to understand the predictive trend of the model.

Now that we know what properties the model we want needs to meet, how can we find such a model without changing the prediction of the original model? An important property of the softmax function is used here.

For a softmax function, if the same function is added to each input, the resulting model is completely equivalent to the original model. In other words, the prediction results of the two models are the same in all cases. Based on this property, we can design a g function so that the model after adding g function satisfies the properties we want.

In this paper, we prove mathematically that the above optimization problem always has a unique global optimal solution, and we give the analytical form of this solution. Our post-processing method based on this design consumes almost no computing resources, but can transform the misleading GAM model into an interpretable model that can be safely observed.

On a data that predicts the cause of infant death (12 categories of classification problems), we use API to deal with shapefunction, so that they can truly reflect the trend of predictive probability changes. As you can see here, before API, model visualization provided information that all causes of death were negatively correlated with infant weight and Apgar values. But after using API, we found that the relationship between different causes of death and infant weight and Apgar value.

It is different: some of the causes of death are positively correlated, some are negatively correlated, and others have the highest mortality rates when the baby's weight and Apgar reach a certain value. API enables medical staff to get more accurate prediction information through the model.

Summary

In many mission-critical scenarios (medical, financial, etc.), model interpretability is often more important than the accuracy of the model itself. As a highly accurate and completely white-boxed model, the generalized additive model is expected to land in more application scenarios.

Reference

[1] Caruana et al. Intelligible Modelsfor HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. In KDD2015.

[2] Lou et al. Intelligible Models for Classification and Regression. In KDD2012.

[3] Lou et al. Accurate Intelligible Models withPairwise Interactions. In KDD 2013.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.