Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the Pandas tips

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the Pandas tips", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "what are the Pandas tips"?

Create a data box with the clipboard

It is well known that Pandas can easily read data from CSV, JSON or even directly from databases using SQLAlchemy, but did you know that Pandas can also read data from the clipboard of your operating system? Suppose you have an Excel file that contains multiple data tables. Now you need to process some of the data from one of the tables in Python. What do you usually do?

Copy the data from the data table that needs to be processed in Python.

Paste it into another data table.

Saves the current worksheet to a CSV file.

Gets the path to the new CSV file.

In Python, use pd.read_csv ('path/to/csv/file') to read files into a Pandas data framework.

Of course, there is actually an easier way-- pd.read_clipboard ().

Copy the required data area

In Python, use pd.read_clipboard ()

As shown above, if you just want to load some data into Pandas, you don't need a separate file for CSV or Excel.

There are also some tricks in this function. For example, when you encounter data with a date format, it may not load correctly, as follows:

The trick is to indicate for Pandas which column is the date format that needs to be parsed.

Df = pd.read_clipboard (parse_dates= ['dob'])

Using test methods to generate virtual data

Sometimes you may need to generate some sample data frames, and the most common method should be to use NumPy to generate an array with random values, and then generate data frames from that array.

This method must be used if the data needs to have a certain distribution, such as a normal distribution. However, in most cases, it does not matter whether the data is normally distributed or not, as long as there is data. In this case, there is an easier way to generate sample data frames using the pandas.util.testing test package.

Pd.util.testing.makeDataFrame ()

The index of the data frame will be generated using a random string, with 4 columns and 30 rows by default.

If you need an equal number of rows and columns, you can define testing.N as the number of rows and testing.K as the number of columns.

Pd.util.testing.N = 10 pd.util.testing.K = 5 pd.util.testing.makeDataFrame ()

Output data frames to a compressed file

Image source: unsplash

Data frames can be easily output to a file, such as df.to_csv (), df.to_json (), and so on. Sometimes, however, you need to compress the file in order to save disk space or use it for other purposes. For example, as a data engineer, in order to output Pandas data frames to an CSV file and transfer them to a remote server, you need to compress the file before sending it to save space and bandwidth.

In general, the consistent solution is to take one more step in the scheduling tool used, such as Airflow or Oozie, but Pandas can output compressed files directly. Therefore, the solution can be completed in a few steps and is more concise and clear.

First use the second tip to generate random data frames:

Pd.util.testing.N = 100000 pd.util.testing.K = 5 df = pd.util.testing.makeDataFrame ()

In this example, only one data framework is needed, in which the values can be completely ignored. Now, save the data frame to an CSV file and check its size.

Import osdf.to_csv ('sample.csv') os.path.getsize (' sample.csv')

You can then try to output the same data frame to a compressed file and check the size of the file.

Df.to_csv ('sample.csv.gz', compression='gzip') os.path.getsize (' sample.csv.gz')

As you can see, the compressed file is less than half of the normal CSV file.

This may not be a good example because there are no duplicate values in the random data frame. In practice, if there is a classification value, the compression ratio will be very high! By the way, as you might expect, Pandas can read a compressed file directly into a data frame without decompressing it in the file system.

Df = pd.read_csv ('sample.csv.gz', compression='gzip', index_col=0)

Gzip is a priority because it exists by default on most Linux systems. Pandas also supports other compression formats, such as "zip" and "bz2".

Multiple columns to get DateTime (time date)

Image source: unsplash

In Pandas, you must have used the pd.to_datetime () method to convert a string to DateTime format, which is usually used to deal with format strings such as% Y%m%d. However, it is sometimes possible to use the data framework shown below as raw data.

Df = pd.DataFrame ({'year': np.arange (2000, 2012),' month': np.arange (1,13), 'day': np.arange (1,13),' value': np.random.randn (12)})

In the data framework, it is common to separate the year, month, and day as separate columns, which can be converted to DateTime columns in one step using pd.to_dateframe ().

Df ['date'] = pd.to_datetime (df [[' year', 'month',' day']])

At this point, I believe you have a deeper understanding of "what are the Pandas tips"? you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report