Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use modin, a pandas computing acceleration artifact that can be used all over the platform

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to use the pandas computing acceleration artifact modin, which can be used on the whole platform. I believe most people don't know much about it, so share this article for your reference. I hope you will get a lot after reading this article. Let's learn about it together.

1 introduction

Modin is a Python library dedicated to calling multi-core computing resources and parallelizing the computing process of pandas under the premise of changing the least amount of code, and with a series of recent content updates, modin begins to support the Windows system based on Dask, so that we only need to change one line of code to achieve considerable computational efficiency improvement of some pandas functions on all platforms.

Figure 1

2 acceleration of pandas operation based on modin

Modin supports Windows, Linux and Mac systems. The modin versions of Linux and Mac platform can work based on parallel computing frameworks Ray and Dask, while the Windows platform version only supports Dask as the computing backend (because Ray does not have a Win version). It is very easy to install. You can use the following three commands to install modin with different backends:

Pip install modin [dask] # install dask backend pip install modin [ray] # install ray backend (not supported by windows) pip install modin [all] # recommended method to automatically install all backends supported by the current system

This article demonstrates the function of modin on Win10 system and executes the command:

Pip install modin [all]

After successfully installing modin+dask, when using modin, we only need to change our accustomed import pandas as pd to import modin.pandas as pd. Next, let's take a look at the differences in pandasVSmodin performance in some common features.

First of all, we use pandas and modin to read a 1.1G csv file esea_master_dmg_demos.part1.csv, from kaggle (https://www.kaggle.com/skihikingkevin/csgo-matchmaking-damage/data), to record some player behavior data about the popular game CS:GO. Because the size is too large, please download it by yourself:

Figure 2

To distinguish between them, temporarily name the modin.pandas mpd on import:

Figure 3

You can see that because it is a Win platform, the computing backend used is Dask. First of all, it takes time to read in the files and view them:

Figure 4

With the plug-in for recording the computing time of jupyter notebook, you can see that the native pandas takes 14.8 seconds, while modin takes only 5.32 seconds. Then let's try the concat operation:

Figure 5

You can see that when pandas took 8.78 seconds to complete the task, modin achieved amazing efficiency gains in only 0.174 seconds. Next, let's perform the common task of checking for missing columns:

Figure 6

Although the time-consuming gap is not as large and considerable as that of concat operation, modin is after all a tool in the iterative stage of rapid development, and its parallelization transformation for pandas has not yet covered all the functions, such as packet aggregation.

For this part of the function, modin will check whether it supports it when executing the code. For those functions that are not yet supported, modin will automatically switch to pandas single-core backend to perform operations. However, because the form of organizing data in modin is different from that of pandas, you need to undergo a transition:

Figure 7

In this case, the operation of modin will be much slower than that of pandas:

Figure 8

Therefore, my attitude towards modin is that when dealing with large data sets, some application scenarios can replace pandas with it, that is, the pandas function that has been reliably parallelized. You can check the supported and unsupported functions in the corresponding interface (https://modin.readthedocs.io/en/latest/supported_apis/index.html) of the official website, because modin is still in the stage of rapid development. Many features that are not currently supported may be added to modin in the near future:

Figure 9

The above is all the contents of the article "how to use modin, a pandas computing acceleration artifact that can be used all over the platform". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report