In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article will explain in detail how Pandas splits the text in a column into multiple lines. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.
In the process of data processing, the following types of data are often encountered:
In the same column, the data that should have been filled into multiple rows is filled in one row, but when analyzing, it needs to be split into multiple rows.
In the figure above, the values are "UK/Australia" and "UK/Netherland" in cells with columns named "Country" and index of 4 and 5.
Today, we introduce several ways to split content that contains multiple values into multiple lines.
Load data
PS: you can view the code by swiping left and right
1. Import pandas as pd
two。
3. Df = pd.DataFrame ({'Country': [' China','US','Japan','EU','UK/Australia', 'UK/Netherland']
4. 'Number': [100,150,120,90,30,2]
5. 'Value': [1, 2, 3, 4, 5, 6]
6. 'label': list (' abcdef')})
7. Df
8.
9. Out [2]:
10. Country Number Value label
11. 0 China 100 1 a
12. 1 US 150 2 b
13. 2 Japan 120 3 c
14. 3 EU 90 4 d
15. 4 UK/Australia 30 5 e
16. 5 UK/Netherland 2 6 f
1 Method-1
It is divided into the following steps:
Split the columns containing multiple values, then transform them by the stack () method, and complete them by setting index.
Delete a column with multiple values from the DataFrame with the drop () method
Then use the join () method to merge
1. Df.drop ('Country', axis=1) .join (df [' Country'] .str.split ('/', expand=True) .stack () .reset_index (level=1, drop=True) .rename ('Country'))
2. Out [3]:
3. Number Value label Country
4. 0 100 1 a China
5. 1 150 2 b US
6. 2 120 3 c Japan
7. 3 90 4 d EU
8. 4 30 5 e UK
9. 4 30 5 e Australia
10. 5 2 6 f UK
11. 5 2 6 f Netherland
Step by step introduction of the process
1. Df ['Country'] .str.split (' /', expand=True) .stack ()
2. Out [4]:
3. 0 0 China
4. 1 0 US
5. 2 0 Japan
6. 3 0 EU
7. 4 0 UK
8. 1 Australia
9. 5 0 UK
10. 1 Netherland
11. Dtype: object
twelve。
13. Df ['Country'] .str.split (' /', expand=True) .stack () .reset_index (level=1, drop=True)
14. Out [5]:
15. 0 China
16. 1 US
17. 2 Japan
18. 3 EU
19. 4 UK
20. 4 Australia
21. 5 UK
twenty-two。 5 Netherland
23. Dtype: object
24.
25. Df ['Country'] .str.split (' /', expand=True) .stack () .reset_index (level=1, drop=True) .rename ('Country')
twenty-six。 Out [6]:
twenty-seven。 0 China
twenty-eight。 1 US
twenty-nine。 2 Japan
thirty。 3 EU
thirty-one。 4 UK
thirty-two。 4 Australia
thirty-three。 5 UK
thirty-four。 5 Netherland
thirty-five。 Name: Country, dtype: object
thirty-six。
thirty-seven。 Df.drop ('Country', axis=1)
thirty-eight。 Out [7]:
thirty-nine。 Number Value label
forty。 0 100 1 a
forty-one。 1 150 2 b
forty-two。 2 120 3 c
forty-three。 3 90 4 d
forty-four。 4 30 5 e
forty-five。 5 2 6 f
2 Method-2
The idea of this method is basically the same as that of Method-1, except that there are some differences in specific details. The code is as follows:
1. Df ['Country'] .str.split (' /', expand=True). Stack (). Reset_index (level=0). Set_index ('level_0') .rename (columns= {0virtual Country`}) .join (df.drop (' Country', axis=1))
2. Out [8]:
3. Country Number Value label
4. 0 China 100 1 a
5. 1 US 150 2 b
6. 2 Japan 120 3 c
7. 3 EU 90 4 d
8. 4 UK 30 5 e
9. 4 Australia 30 5 e
10. 5 UK 2 6 f
11. 5 Netherland 2 6 f
The process is described step by step as follows:
1. Df ['Country'] .str.split (' /', expand=True) .stack () .reset_index (level=0)
2. Out [9]:
3. Level_0 0
4. 0 0 China
5. 0 1 US
6. 0 2 Japan
7. 0 3 EU
8. 0 4 UK
9. 1 4 Australia
10. 0 5 UK
11. 1 5 Netherland
twelve。
13. Df ['Country'] .str.split (' /', expand=True) .stack () .reset_index (level=0) .set_index ('level_0')
14. Out [10]:
15. 0
16. Level_0
17. 0 China
18. 1 US
19. 2 Japan
20. 3 EU
21. 4 UK
twenty-two。 4 Australia
23. 5 UK
24. 5 Netherland
25.
twenty-six。 Df ['Country'] .str.split (' /', expand=True). Stack (). Reset_index (level=0). Set_index ('level_0') .rename (columns= {0RAPH Country`})
twenty-seven。 Out [11]:
twenty-eight。 Country
twenty-nine。 Level_0
thirty。 0 China
thirty-one。 1 US
thirty-two。 2 Japan
thirty-three。 3 EU
thirty-four。 4 UK
thirty-five。 4 Australia
thirty-six。 5 UK
thirty-seven。 5 Netherland
thirty-eight。
thirty-nine。 Df.drop ('Country', axis=1)
forty。 Out [12]:
forty-one。 Number Value label
forty-two。 0 100 1 a
forty-three。 1 150 2 b
forty-four。 2 120 3 c
forty-five。 3 90 4 d
forty-six。 4 30 5 e
forty-seven。 5 2 6 f
This is the end of the article on "how Pandas divides the text in a column into multiple lines". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.