Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Pandas split text in a column into multiple lines

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how Pandas splits the text in a column into multiple lines. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

In the process of data processing, the following types of data are often encountered:

In the same column, the data that should have been filled into multiple rows is filled in one row, but when analyzing, it needs to be split into multiple rows.

In the figure above, the values are "UK/Australia" and "UK/Netherland" in cells with columns named "Country" and index of 4 and 5.

Today, we introduce several ways to split content that contains multiple values into multiple lines.

Load data

PS: you can view the code by swiping left and right

1. Import pandas as pd

two。

3. Df = pd.DataFrame ({'Country': [' China','US','Japan','EU','UK/Australia', 'UK/Netherland']

4. 'Number': [100,150,120,90,30,2]

5. 'Value': [1, 2, 3, 4, 5, 6]

6. 'label': list (' abcdef')})

7. Df

8.

9. Out [2]:

10. Country Number Value label

11. 0 China 100 1 a

12. 1 US 150 2 b

13. 2 Japan 120 3 c

14. 3 EU 90 4 d

15. 4 UK/Australia 30 5 e

16. 5 UK/Netherland 2 6 f

1 Method-1

It is divided into the following steps:

Split the columns containing multiple values, then transform them by the stack () method, and complete them by setting index.

Delete a column with multiple values from the DataFrame with the drop () method

Then use the join () method to merge

1. Df.drop ('Country', axis=1) .join (df [' Country'] .str.split ('/', expand=True) .stack () .reset_index (level=1, drop=True) .rename ('Country'))

2. Out [3]:

3. Number Value label Country

4. 0 100 1 a China

5. 1 150 2 b US

6. 2 120 3 c Japan

7. 3 90 4 d EU

8. 4 30 5 e UK

9. 4 30 5 e Australia

10. 5 2 6 f UK

11. 5 2 6 f Netherland

Step by step introduction of the process

1. Df ['Country'] .str.split (' /', expand=True) .stack ()

2. Out [4]:

3. 0 0 China

4. 1 0 US

5. 2 0 Japan

6. 3 0 EU

7. 4 0 UK

8. 1 Australia

9. 5 0 UK

10. 1 Netherland

11. Dtype: object

twelve。

13. Df ['Country'] .str.split (' /', expand=True) .stack () .reset_index (level=1, drop=True)

14. Out [5]:

15. 0 China

16. 1 US

17. 2 Japan

18. 3 EU

19. 4 UK

20. 4 Australia

21. 5 UK

twenty-two。 5 Netherland

23. Dtype: object

24.

25. Df ['Country'] .str.split (' /', expand=True) .stack () .reset_index (level=1, drop=True) .rename ('Country')

twenty-six。 Out [6]:

twenty-seven。 0 China

twenty-eight。 1 US

twenty-nine。 2 Japan

thirty。 3 EU

thirty-one。 4 UK

thirty-two。 4 Australia

thirty-three。 5 UK

thirty-four。 5 Netherland

thirty-five。 Name: Country, dtype: object

thirty-six。

thirty-seven。 Df.drop ('Country', axis=1)

thirty-eight。 Out [7]:

thirty-nine。 Number Value label

forty。 0 100 1 a

forty-one。 1 150 2 b

forty-two。 2 120 3 c

forty-three。 3 90 4 d

forty-four。 4 30 5 e

forty-five。 5 2 6 f

2 Method-2

The idea of this method is basically the same as that of Method-1, except that there are some differences in specific details. The code is as follows:

1. Df ['Country'] .str.split (' /', expand=True). Stack (). Reset_index (level=0). Set_index ('level_0') .rename (columns= {0virtual Country`}) .join (df.drop (' Country', axis=1))

2. Out [8]:

3. Country Number Value label

4. 0 China 100 1 a

5. 1 US 150 2 b

6. 2 Japan 120 3 c

7. 3 EU 90 4 d

8. 4 UK 30 5 e

9. 4 Australia 30 5 e

10. 5 UK 2 6 f

11. 5 Netherland 2 6 f

The process is described step by step as follows:

1. Df ['Country'] .str.split (' /', expand=True) .stack () .reset_index (level=0)

2. Out [9]:

3. Level_0 0

4. 0 0 China

5. 0 1 US

6. 0 2 Japan

7. 0 3 EU

8. 0 4 UK

9. 1 4 Australia

10. 0 5 UK

11. 1 5 Netherland

twelve。

13. Df ['Country'] .str.split (' /', expand=True) .stack () .reset_index (level=0) .set_index ('level_0')

14. Out [10]:

15. 0

16. Level_0

17. 0 China

18. 1 US

19. 2 Japan

20. 3 EU

21. 4 UK

twenty-two。 4 Australia

23. 5 UK

24. 5 Netherland

25.

twenty-six。 Df ['Country'] .str.split (' /', expand=True). Stack (). Reset_index (level=0). Set_index ('level_0') .rename (columns= {0RAPH Country`})

twenty-seven。 Out [11]:

twenty-eight。 Country

twenty-nine。 Level_0

thirty。 0 China

thirty-one。 1 US

thirty-two。 2 Japan

thirty-three。 3 EU

thirty-four。 4 UK

thirty-five。 4 Australia

thirty-six。 5 UK

thirty-seven。 5 Netherland

thirty-eight。

thirty-nine。 Df.drop ('Country', axis=1)

forty。 Out [12]:

forty-one。 Number Value label

forty-two。 0 100 1 a

forty-three。 1 150 2 b

forty-four。 2 120 3 c

forty-five。 3 90 4 d

forty-six。 4 30 5 e

forty-seven。 5 2 6 f

This is the end of the article on "how Pandas divides the text in a column into multiple lines". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report