In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Pandas MultiIndex multi-level index is how to use, I believe that many inexperienced people are helpless about this, this article summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.
MultiIndex
MultiIndex, an index with multiple levels, some similar to grouping by index. With multi-level indexing, we can manipulate the data of the entire index group using high-level indexes. By grouping indexes into categories, group data can be manipulated.
1. 1.1.1. First: Multidimensional Array
When we create a Series or DataFrame, we can build a multidimensional index by passing a multidimensional array to the index (columns) parameter.
[Elements corresponding to each dimension in the array form each index value]
Multidimensional index can also set name (name), attribute value is a one-dimensional array, the number of elements needs to be the same as the number of layers of index (each layer index needs to have a name).
1.2. Second: Multiindex
We can create a MultiIndex object in advance through the related methods of the MultiIndex class, and then use it as the index (or columns) parameter value in Series and DataFrame. At the same time, you can specify the name of the multi-level index through the names parameter.
from_arrays: Accepts a multidimensional array parameter, with high dimensions specifying the high level index and low dimensions specifying the low level index.
from_tuples: Receives a list of tuples, each tuple specifying each index (high dimensional index, low dimensional index).
from_product: Receives a list of iterable objects and creates an index based on the Cartesian product of multiple iterable object elements.
From_product is relatively simple to implement compared to the first two methods, but there are limitations.
1.3. Create a case:
import numpy as npimport pandas as pdimport warningswarnings.filterwarnings ('ignore ')#Multidimensional indexes are constructed by passing multidimensional arrays to the index (columns) parameter #Multidimensional indexes specify a multidimensional array. In a multidimensional array, the value of each index is given step by step. s = pd.Series([1, 2, 3, 4], index=[["A", "A", "B", "B"], ["a", "b", "c", "d"])#More than multiple levels of index, each with a name. s.index.names = ["index1", "index2"]display(s)display(s.loc["A"].loc["a"])
df=pd.DataFrame(np.arange(9).reshape(3, 3), columns=[["X", "X", "Y"], ["x1", 'x2', 'y1']],index=[["A", "B", "B"], ["a", 'a', 'b']])display(df)display(df.loc["B"]) display(df["X"])display(df.loc["B"].loc["a"]["X"]["x1"])
#Created by methods of the MultiIndex class.# Create by way of lists. (Each embedded list element specifies the index of the hierarchy,[[index of level 0], [index of level 1],……[index of level n]]) array1 =pd.MultiIndex.from_arrays([["A","A","B"],["a","b","a"])df=pd.DataFrame(np.random.rand(3,3),index= array1)display(df)#Created by forming a list of tuples. [(high level, low level), (high level, low level), …]tuple1 = pd.MultiIndex.from_tuples([("A", "a"), ("A", "b"), ("B", "a")])df2 = pd.DataFrame(np.random.random((3, 3)), index=tuple1)display(df2)#Created by multiplication (Cartesian product). product1 = pd.MultiIndex.from_product([["A", "B"], ["a", "b"]])df3 = pd.DataFrame(np.random.random((4, 3)), index=product1)display(df3)
2. multilevel indexing operation
For multi-level index, it also supports related operations of single-level index, such as index elements, slices, index array selection elements, etc. We can also select elements hierarchically according to multilevel indexes.
Advantages of multi-level indexes: By creating multi-level indexes, we can use high-level indexes to manipulate the data of the entire index group. Format:
s[operation]
s.loc[operation]
s.iloc[operation]
Where, the operation can be index, slice, array index, Boolean index.
2.1.Series Multilevel Index
The loc (label index) operation allows you to retrieve a set of values corresponding to a multi-level index.
With the iloc (location index) operation, the element value of the corresponding location is obtained (regardless of whether there is a multi-level index).
The behavior through s[operation] is somewhat strange, so it is recommended not to use it.
For index (single level), first select by label, if label does not exist, select by position.
For multi-level indexes, the selection is done by label.
For slices, select by position if integer is provided, otherwise select by label.
For array indexing, index by position if array elements are integers, otherwise index by label.
2.2. DataFrame Multi-level Indexing
The loc (label index) operation allows you to retrieve a set of values corresponding to a multi-level index.
With the iloc (positional index) operation, a row corresponding to the position is retrieved (regardless of whether there is a multi-level index).
The behavior through s[operation] is somewhat strange, so it is recommended not to use it.
For an index, the corresponding columns are obtained according to the label (or multiple columns if the index is multi-level).
For an array index, get the corresponding column according to the label (if it is a multilevel index, you can get multiple columns).
For slices, index by label first and then by position (row fetch).
2.3. Swap index df.swaplevel(i=-2, j=-1, axis=0)
We can swap two hierarchical indexes by calling the swaplevel method of the DataFrame object. This method swaps the penultimate layer with the penultimate layer by default. We can also specify the layers to swap. The hierarchy starts at 0 and increases from outside to inside (or from top to bottom), or you can specify a negative value for the n-th layer from the bottom. In addition, we can also use the name of the hierarchical index to exchange.
df = pd.DataFrame (np.random.rand(4, 4), index=[["A", "A", "B", "B"], ["a1", "a1", "b1", "c1"], ["a2", "b2", "c2","c2"]])df.index.names = ["layer1", "layer2", "layer3"]display(df)#Multi-level index, numbering from outside to inside, 0, 1, 2, 3. Simultaneously, Negative index numbers are also supported.# Negative values mean from inside out,-1,-2,-3. -1 represents the innermost layer. display(df.swaplevel()) #By default, the penultimate level is swapped with the penultimate level display(df.swaplevel(0, 2))#When swapping multi-level indexes, we can specify the name of the index hierarchy in addition to the number of the hierarchy. display(df.swaplevel("layer1", "layer3"))
2.4. index ordering
We can sort the index using the sort_index method.
Signature: df.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, by=None)
level: Specifies which level to sort by, default is the outermost (upper) level. The value can be a numeric value, an index name, or a list of both.
inplace: whether to modify in place. Default is False.
display(df.sort_index()) #Default index sort #Custom sort hierarchy. display(df.sort_index(level=1))display(df.sort_index(level=2))#You can also sort by index name. display(df.sort_index(level="layer1"))display(df.sort_index(level="layer2"))
The stack method of the DataFrame object enables index stacking, which converts columns at a given level into rows.
level: Specifies the level of conversion, default is-1.
#Stack Columns-> Rows Unstack rows-> columns df.stack() df.unstack()2.6. Unstack
The DataFrame object's unstack method allows you to unstack indexes, converting rows at a specified level into columns. level: Specifies the level of conversion, default is-1. fill_value: Specifies the fill value. Default is NaN.
df = pd.DataFrame(np.random.rand(4, 4), index=[["A", "B", "B", "A"], ["b", "b", "a", "c"], ["b2", "c2", "a2", "c2"]])df.index.names = ["layer1", "layer2", "layer3"]display(df)
#Unstack, if there is no matching data, display null NaN. display(df.unstack())
#We can specify values to fill NaN (null). df.unstack(fill_value=11)
# unstack will unstack the innermost layer by default, we can also specify the hierarchy ourselves. display(df.unstack(0))
#Stack Columns-> Rows Unstack rows-> columns df.stack() # df.unstack()# stack Stack can also specify hierarchy.# Stacks can also be manipulated by index names. df.stack(0)
2.7. sets the index
In DataFrame, if we need to use an existing column (s) as an index column, we can call the set_index method to do so.
Signature: df.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
drop: Whether to drop columns as new indexes, default is True.
append: Whether to set index by appending. Default is False.
inplace: whether to modify in place, default is False.
df = pd.DataFrame({"pk":[1, 2, 3, 4], "age":[15, 20, 17, 8], "name":["n1", "n2", "n3", "n4"]})display(df)df1 = df.set_index("pk", drop=False)display(df1)
2.8. reset index
The index can be reset by calling reset_index on the DataFrame object. This is the exact opposite of set_index.
Signature: df.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
level: Reset the level of the index. By default, all levels of the index are reset. If all indexes are reset, a default integer sequence index is created.
drop: Whether to drop the reset index column, default is False.
inplace: whether to modify in place, default is False.
df = pd.DataFrame({"pk":[1, 2, 3, 4], "age":[15, 20, 17, 8], "name":["n1", "n2", "n3", "n4"]})#display(df)df1 = df.set_index("pk", drop=False)#display(df1)df2 = df1.reset_index(0, drop=True)display(df2)
After reading the above content, do you know how to use Pandas MultiIndex? If you still want to learn more skills or want to know more related content, welcome to pay attention to the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.