Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the operating methods of DataFrame

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what are the operating methods of DataFrame". In the daily operation, I believe that many people have doubts about the operating methods of DataFrame. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what are the operating methods of DataFrame?" Next, please follow the editor to study!

Pandas provides a variety of DataFrame operations, but many of them are complex and seem unapproachable. Ame operating methods, which cover almost all the operational functions that data scientists need to know. Each method will include instructions, visualization, code, and techniques for remembering it.

Pivot

The PivotTable creates a new PivotTable that projects existing columns in the data to the elements of the new table, including indexes, columns, and values. The columns in the initial DataFrame will be indexed, and these columns will be displayed as unique values, while the combination of the two columns will be displayed as values. This means that Pivot cannot handle duplicate values.

The code to rotate the DataFrame named df is as follows:

Remember: Pivot-- is outside the world of data processing-- a shift around some object. In sports, people can "rotate" around their feet: the rotation of giant pandas is similar. The state of the original DataFrame rotates to a new element around the central element of the DataFrame. Some elements are actually rotated or transformed (for example, the column "bar"), so it's important.

Melt

Melt can be considered "invisible" because it converts matrix-based data (with two dimensions) into list-based data (columns represent values and rows represent unique data points), while pivots do the opposite. Consider a two-dimensional matrix with one dimension "B" and "C" (column name) and another dimension "a", "b" and "c" (row index).

We select an ID, a dimension, and a column / column that contains values. The column that contains the value is converted to two columns: one for the variable (the name of the value column) and the other for the value (the number contained in the variable).

The result is that each combination of the values of the ID column (a _ rem _ b ~ C) and the value column (B ~ ~ C) and their corresponding values, organized in a list format.

You can perform Mels operations as you would on DataFrame df:

Remember: to Melt like a candle is to turn a solidified composite object into several smaller individual elements (wax droplets). Fusing 2D DataFrame decompresses its solidified structure and records its fragments as entries in the list.

Explode is a useful way to get rid of data lists. When a column explodes, all the lists in it will be under the same index as new rows (to prevent this, simply call .reset _ index ()). Non-list items such as strings or numbers are not affected, and empty lists are nan values (you can clear them with .dropna ()).

The Explode column "A" in DataFrame df is very simple:

Remember: Explode something releases all its internal contents-the Explode list separates its elements.

Stack

The stack takes an DataFrame of any size and "stacks" the column as a child of the existing index. Therefore, the resulting DataFrame has only one column and two-level indexes.

Stacking a table named df is as simple as df.stack ().

To access the height value of the dog, you only need to call index-based retrieval twice, such as df.loc ['dog']. Loc ['height'].

Remember: in appearance, the stack takes the two-dimensionality of the table and makes the column stack a multi-level index.

Unstack

Unstacking takes the multi-index DataFrame and stacks it, converting the index of the specified level into a column of the new DataFrame with the corresponding value. Calling the stack after calling the stack on the table does not change the stack (because there is a "0").

The parameter in the stack is its level. In a list index, an index of-1 returns the last element. This is the same as the level. Level-1 indicates that the last index level (the rightmost one) will be unstacked. As another example, when the level is set to 0 (the first index level), the value in it becomes a column, and the subsequent index level (the second index level) becomes the index of the transformed DataFrame.

Stacking can be performed in the same way as stacking, but with the level parameter: df.unstack (level =-1).

Merge

To merge two DataFrame is to combine them in columns (horizontally) between shared keys. This key allows tables to be merged, even if they are sorted differently. The completed merge DataFrame adds the suffixes _ x and _ y to the value column by default.

To merge two DataFrame df1 and df2 (where df1 contains leftkey and df2 contains rightkey), call:

Merging is not a function of pandas, but is attached to DataFrame. It is always assumed that the DataFrame where the merge is located is the "left table" and the DataFrame called as an argument in the function is the "right table" with the corresponding key.

By default, the merge function performs an inner join: if the key name of each DataFrame is not listed in another key, the key is not included in the merged DataFrame. On the other hand, if a key is listed twice in the same DataFrame, each combination of values for the same key is listed in the merge table. For example, if df1 has three key foo values and df2 has two values of the same key, there will be six entries in the final DataFrame, where leftkey = foo and rightkey = foo.

Remember: merging data frames is like merging lanes while driving horizontally. Imagine that each column is a lane on the highway. In order to merge, they must be merged horizontally.

Join

In general, a join is preferable to a merge because it has a simpler syntax and is more likely to join two DataFrame horizontally. The syntax for the connection is as follows:

When using joins, public key columns (similar to right_on and left_on in a merge) must be named the same name. The how parameter is a string that represents one of the four concatenation methods and can merge two DataFrame:

'left': includes all elements of df1 and contains elements of df2 only if their key is the key of df1. Otherwise, the missing portion of the merged DataFrame of df2 will be marked as NaN.

'right': 'left', but on another DataFrame All elements that include df2 contain elements of df1 only if their key is the key of df2.

"outer": includes all elements from DataFrames, even if the key does not exist in other-missing elements are marked as NaN.

"inner": the key that contains only the component exists in two data frame keys (intersection). Merge by default.

Remember: if you have used SQL, the word "join" should be immediately associated with adding by column. If not, "join" and "merge" have very similar meanings in terms of definition.

Concat

Merge and join work horizontally, concatenated or concat for short, while DataFrame is connected by line (vertical). For example, consider using pandas.concat ([df1,df2]) concatenated two DataFrame df1 and df2 with the same column name:

Although you can use concat for column joins by setting the axis parameter to 1, it is easier to use joins.

Note that concat is a pandas function, not one of the DataFrame. Therefore, it accepts a list of DataFrame to connect to.

If another column of a DataFrame is not included, that column is included by default, and the missing value is listed as NaN. To prevent this, add an additional parameter, join = 'inner', which concatenates only two columns common to DataFrame.

Remember: in lists and strings, you can concatenate other items. Concatenation is attaching additional elements to an existing principal rather than adding new information (just like a column-by-column join). Because each index / row is a separate item, concatenation adds other items to the DataFrame, which can be thought of as a list of rows.

Append is another way to combine two DataFrame, but it performs the same function as concat, is inefficient, and has a wide range of uses.

At this point, the study of "what are the operating methods of DataFrame" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report