Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to install and use CatBoost

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to install and use CatBoost". In daily operation, I believe many people have doubts about how to install and use CatBoost. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts about "how to install and use CatBoost". Next, please follow the editor to study!

Introduction

CatBoost not only builds the most accurate model on any dataset you provide to it, which requires minimal data preparation. It also provides the best open source interpretation tools to date, as well as a way to quickly generate models.

CatBoost brought about a revolution in machine learning. Learning to use it will improve your skills. But more interestingly, CatBoost poses a threat to the status quo of data scientists, such as myself, because I think it's tedious to build a high-precision model given a given dataset. But CatBoost is changing that. It allows everyone to use highly accurate models.

Build a high-precision model and install it at a very fast speed

Have you tried installing XGBoost on your laptop? Then you must know how troublesome it is. But installing and running CatBoost at the other end is a piece of cake.

Pip install catboost

So it's installed.

Data preparation

Unlike most machine learning models currently available, CatBoost requires minimal data preparation. It can handle:

Missing values of numerical variables

Uncoded classification variable

Note: for classified variables, the missing values must be handled in advance. Replace with the new category "missing" or the most commonly used category.

For GPU users, it can also handle text variables.

Unfortunately, I can't test this feature because I'm working on a laptop without GPU.

Build a model

Like XGBoost, you have familiar sklearn syntax and some CatBoost-specific additional features.

From catboost import CatBoostClassifier # or CatBoostRegressormodel_cb = CatBoostClassifier () model_cb.fit (X_train, y_train)

Or, if you want a visual interface about how the model is learned and whether it has started to fit, use plot=True and insert the test suite in the eval_set parameter:

From catboost import CatBoostClassifier # or CatBoostRegressormodel_cb = CatBoostClassifier () model_cb.fit (X_train, y_train, plot=True, eval_set= (X_test, y_test)

Note that you can display multiple metrics at the same time, or even more humanized metrics, such as accuracy or accuracy. The supported metric is listed here: https://catboost.ai/docs/concepts/loss-functions-classification.html.

See the following example:

You can even use cross-validation to observe the average and standard deviation of model accuracy on different segments:

Fine tuning

CatBoost is very similar to XGBoost. To fine-tune the model appropriately, first set the early_stopping_rounds (such as 10 or 50), and then start adjusting the parameters of the model.

Training speed without GPU

From their benchmark, you can see that CatBoost is faster than XGBoost and is relatively similar to LightGBM. As we all know, LightGBM's training speed is very fast.

There is GPU.

But when it comes to GPU, the real magic comes.

Even with a relatively older GPU, such as K40 (released in 2013), training time will be at least four times faster, while other newer CPU can be up to 40 times faster.

Interpretation of the model

One thing the authors of CatBoost understand is that this is not just a game of accuracy. Why use CatBoost when XGBoost and LightGBM are available. So, in terms of interpretability, CatBoost provides out-of-the-box functions.

Feature importance

CatBoost provides three different methods: PredictionValuesChange, LossFunctionChange, and InternalFeatureImportance. Here is the detailed documentation: https://catboost.ai/docs/concepts/fstr.html

Local understandability

For local understandability, CatBoost with SHAP,SHAP is generally considered to be the only reliable method.

Shap_values = model.get_feature_importance (Pool (X, y), type='ShapValues')

There is also an official tutorial: https://github.com/catboost/tutorials/blob/master/model_analysis/shap_values_tutorial.ipynb. You can use to perform local understandability operations and obtain the importance of features.

Marginal effect

So far, this is my favorite thing. With the commercialization of high precision (especially with the rise of AutoML), it is becoming more and more important to understand these high precision models at a deeper level.

Based on experience, the following chart has become the standard for model analysis. CatBoos provides it directly in its package.

On this icon, you observe

The green one is data distribution.

The blue one is the average target for each box.

The orange one is the average forecast for each box.

The red one is the partial dependency graph (Partial Dependence)

Using CatBoost model in production

It is very easy to implement your model in production. Here is how to export the CatBoost model.

The following help documents are available using the .save _ model () method:

Export model_cb.save_model for Python and C++ ('model_CatBoost.py', format='python', pool=X_train)

After execution, you will have a generated .py file in your repo, as shown below:

By this time the model is ready for production! And you don't need to set up a specific environment on the machine to get new scores. All you need is Python 3!

Binary file output

Binary is obviously the fastest way to get new data. Change the code to the output .cbm file.

Reload the model when loading using the following code:

From catboost import CatBoostmodel = CatBoost () model.load_model ('filename', format='cbm') other useful tips Verbose = 50

Most models usually have detailed input to see how your process is progressing. CatBoost also has it, but it's a little better than the others. For example, using verbose=50 will show training errors every 50 iterations instead of every iteration, because this can be annoying if you have many iterations.

The same model was trained with verbose=10. The examination is much better.

Note that the remaining time will also be displayed.

Model comparison

Fine-tuning the model takes time. Usually, you may have several good parameter lists. In order to improve the results, you can even use different parameter set learning models to compare to help you make a decision on the final list of parameters to be selected.

At this point, the study on "how to install and use CatBoost" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report