What are the ways in which Pandas encodes data 07/02 Update SLTechnology News&Howtos

What are the ways in which Pandas encodes data

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of "what are the ways of Pandas data coding". The editor shows you the operation process through an actual case. The operation method is simple, fast and practical. I hope this article "what are the ways of Pandas data coding" can help you solve the problem.

Recently I saw such a problem on Zhihu.

For ease of understanding, let's create a sample DataFrame

Numerical data

Let's first discuss the conversion of continuous data, that is, add a column label based on the value of the Score column, that is, if the score is greater than 90, it is marked as A, the score is marked as B in 80-90, and so on.

Custom function + loop traversal

First of all, of course, the simplest and stupidest way is to write a function yourself and iterate through it with a loop, which must be a def plus a for.

Df1 = df.copy () def myfun (x): if x > 90: return 'A'elif x > = 80 and x, 70 and x, 60 and x 90: return'A 'elif x > = 80 and x, 70 and x, 60 and x 90 else ('B'if 90 > x > = 80 else (' C'if 80 > x > = 70 else ('D'if 70 > x > x > = 60 else' E')

The result is the same as above, except that it is easy to be beaten.

Use pd.cut

Now, let's move on to the more advanced pandas function, which still encodes Score, uses pd.cut, and specifies the partition, which can help you group directly.

Df4 = df.copy () bins = [0,59,70,80,100] df4 ['Score_Label'] = pd.cut (df4 [' Score'], bins)

You can also use the labels parameter directly to change the name of the corresponding group, isn't it much more convenient?

Df4 ['Score_Label_new'] = pd.cut (df4 [' Score'], bins, labels= ['low',' middle', 'good',' perfect'])

Using sklearn binarization

Since it is related to machine learning, sklearn will definitely not run away. If you need to add a new column and determine whether the score is passed or not, you can use the Binarizer function. The code is simple and easy to understand.

Df5 = df.copy () binerize = Binarizer (threshold = 60) trans = binerize.fit_transform (np.array (df1 ['Score']). Reshape (- 1) df5 [' Score_Label'] = trans

Text-based data

What's more common is to transform and tag text data. For example, a new column is added to mark male and female as 0 and 1, respectively.

Use replace

First of all, I will introduce replace, but it should be noted that the methods related to custom functions mentioned above are still feasible.

Df6 = df.copy () df6 ['Sex_Label'] = df6 [' Sex'] .replace (['Male','Female'], [0meme 1])

The above operation is for gender. Since there are only men and women, you can specify 0 and 1 manually, but if there are many categories, you can also use pd.value_counts () to automatically specify tags, such as grouping Course Name columns.

Df6 = df.copy () value = df6 ['Course Name']. Value_counts () value_map = dict ((v, I) for I in enumerate (value.index)) df6 [' Course Name_Label'] = df6.replace ({'Course Name':value_map}) [' Course Name']

Use map

Additional emphasis is that a new column must be able to think of map

Df7 = df.copy () Map = {elem:index for index,elem in enumerate (set (df ["Course Name"]))} df7 ['Course Name_Label'] = df7 [' Course Name'] .map (Map)

Use astype

This method should be unknown to many people. This belongs to the Zhihu problem mentioned above. There are too many ways to realize it.

Df8 = df.copy () value = df8 ['Course Name'] .astype (' category') df8 ['Course Name_Label'] = value.cat.codes

Use sklearn

Like the numerical type, sklearn must have a way to encode the classified data with LabelEncoder, which is a classic operation in machine learning.

From sklearn.preprocessing import LabelEncoderdf9 = df.copy () le = LabelEncoder () le.fit (df9 ['Sex']) df9 [' Sex_Label'] = le.transform (df9 ['Sex']) le.fit (df9 [' Course Name']) df9 ['Course Name_Label'] = le.transform (df9 [' Course Name'])

It is also possible to convert two columns at once.

Df9 = df.copy () le = OrdinalEncoder () le.fit (df9 ['Sex','Course Name']]) df9 [[' Sex_Label','Course Name_Label']] = le.transform (df9 [['Sex','Course Name']]) using factorize

Finally, to introduce a niche but easy-to-use pandas method, we need to note that in the above method, automatically generated Course Name_Label columns, although a data corresponds to a language, because to avoid writing custom functions or dictionaries, it can be automatically generated, so most of them are unordered.

If we want it to be orderly, that is, Python corresponds to 0 and Java corresponds to 1, is there any elegant way other than to specify it ourselves? You can use factorize, which is encoded according to the order in which it appears.

Df10 = df.copy () df10 ['Course Name_Label'] = pd.factorize (df10 [' Course Name']) [0]

Combined with anonymous functions, we can do sequential transcoding of multiple columns.

Df10 = df.copy () cat_columns = df10.select_dtypes (['object']). Columnsdf10 [[' Sex_Label', 'Course Name_Label']] = DF10 [cat _ columns] .apply (lambda x: pd.factorize (x) [0])

This is the end of the content about "what are the ways in which Pandas encodes data". Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.