Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the little-known Pandas skills?

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces the little-known Pandas skills, which have a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

Pandas creates a high-level operating environment for Python, as well as data structures and analysis tools that are easy to operate.

For novices to pandas, Pandas creates a high-level operating environment for the Python programming language, as well as data structures and analysis tools that are easy to operate. The name Pandas is derived from "panel data" (panel data), an econometric term that is a dataset made up of results observed by the same individual over multiple time periods.

1. Data range

When fetching data from an external application program interface (API) or database, you usually need to determine a data range. Pandas can solve this problem very well, and its data_range function can produce dates that increase by day, month, year, and so on.

Suppose you now need a set of data ranges that are incremented by days.

Date_from = "2019-01-01" date_to = "2019-01-12" date_range = pd.date_range (date_from, date_to, freq= "D") date_range

Convert the output date_range to start and end dates, which can be done with a subsequent function (subsequentfunction).

For I, (date_from, date_to) inenumerate (zip (date_range [:-1], date_ range [1:]): date_from = date_from.date (). Isoformat () date_to = date_to.date (). Isoformat () print ("% d. Date_from:% s"% (I, date_from, date_to)) 1. Date_from: 2019-01-01

2. Date_from: 2019-01-02, date_to: 2019-01-03

3. Date_from: 2019-01-03, date_to: 2019-01-04

4. Date_from: 2019-01-04, date_to: 2019-01-05

5. Date_from: 2019-01-05, date_to: 2019-01-06

6. Date_from: 2019-01-06, date_to: 2019-01-07

7. Date_from: 2019-01-07, date_to: 2019-01-08

8. Date_from: 2019-01-08, date_to: 2019-01-09

9. Date_from: 2019-01-09, date_to: 2019-01-10

10. Date_from: 2019-01-10, date_to: 2019-01-11

11. Date_from: 2019-01-11, date_to: 2019-01-12

two。 Use instructions to comply with and

Merging two datasets is the process of turning them into one dataset, which requires aligning each row according to their common properties or columns.

There are many arguments in the merge function (corresponding to the class array objects of the parameters passed to the function), in which the indicator (indicator) argument can be mainly applied to the merge process, adding the _ merge column to the left, right, or both data frame (DataFrame) functions, which shows "where the data rows come from." It is useful to use the _ merge column to handle a larger dataset, especially if you need to check the accuracy of the merge operation.

Left = pd.DataFrame ({"key": ["key1", "key2", "key3", "key4"], "value_l": [1,2,3,4]}) right = pd.DataFrame ({"key": ["key3", "key2", "key1", "key6"], "value_r": [3,2,1,6]})

Df_merge = left.merge (right,on='key', how='left',indicator=True)

The _ merge column can be used to check whether we have got the expected number of rows, and it reflects the expected values from both data frameworks.

Df_merge._merge.value_counts () both 3 left_only 1 right_only 0 Name: _ merge, dtype: int64

3. Recent merger (Nearest merge)

When dealing with financial data such as stocks or cryptocurrencies, you also need to combine quotes (price changes) with actual transactions. Now, suppose the goal is to merge each transaction with a quote generated in previous milliseconds. Pandas has a merge_asof function that merges data frames through the most recent key (timestamp in this article). Datasets about quotes and deals are available from the pandas instance.

The quotation data framework contains the price changes of different stocks. Usually, there are much more quotations than deals.

Quotes = pd.DataFrame (["2016-05-2513 30 GOOG 00.023", "GOOG", 720.50, 720.93], ["2016-05-2513 30 Switzerland 00.023", "MSFT", 51.95,51.96], ["2016-05-2513 30 Swiss 00.030", "MSFT", 51.97,51.98], ["2016-05-2513 30 Swiss 00.041", "MSFT", 51.99,52.00] ["2016-05-2513 30 GOOG 00.048", "GOOG", 720.50, 720.93], ["2016-05-2513 30 GOOG 00.072", "MSFT", 52.01,52.03],], columns= ["timestamp" "ticker", "bid", "ask"],) quotes ['timestamp'] = pd.to_datetime (quotes [' timestamp'])

The trading data framework contains trading information for different stocks.

Trades = pd.DataFrame (["2016-05-2513 30 MSFT 00.023", "MSFT", 51.95,75], ["2016-05-2513 30 Soviet 00.038", "MSFT", 51.95,155], ["2016-05-251313 30 Swiss 00.048", "GOOG", 720.77], ["2016-05-2513 3030 Swiss 00.048", "GOOG", 720.92, 100] ], columns= ["timestamp", "ticker", "price", "quantity"],) trades ['timestamp'] = pd.to_datetime (trades [' timestamp'])

Transaction and quote information can be combined through the Stock Price report (tickers), which may be only 10 milliseconds later than the deal. If the time difference of the quote is longer than 10 milliseconds, or if there is no quote, any bids and requests for quotations are invalid (take the Apple share price report * as an example).

* Apple share price report: AAPL ticker.

Pd.merge_asof (trades,quotes, on= "timestamp", by='ticker', tolerance=pd.Timedelta ('10ms'), direction='backward')

4. Create an Excel report

The simultaneous use of Pandas and XlsxWriter libraries can help us create Excel reports based on the data framework. This saves a lot of time, instead of saving the data framework in csv format before importing it into Excel typesetting. You can also directly add a variety of charts and other convenient operations.

Df = pd.DataFrame ([[1jue 2,3], [4,5,6], [7,8,9]), columns= ["a", "b", "c"])

The following short piece of code creates an Excel report. To store a data framework in an Excel file, you need to uncomment the line writer.save ().

Report_name = 'example_report.xlsx' sheet_name=' Sheet1'writer = pd.ExcelWriter (report_name,engine='xlsxwriter') df.to_excel (writer, sheet_name=sheet_name, index=False) # writer.save ()

As mentioned earlier, this database also supports adding charts to Excel reports. This requires determining the type of chart (linear in this article) and the data series reflected in the chart, which should be located in Excel's spreadsheet program (spreadsheet).

# define the workbook workbook= writer.book worksheet = writer.sheets [sheet _ name] # create a chart lineobject chart = workbook.add_chart ({'type':' line'}) # configurethe series of the chart from the spreadsheet # using a list of values instead of category/value formulas: # [sheetname, first_row, first_col,last_row, last_col] chart.add_series ({'categories': [sheet_name, 1,0,3d0],' values': [sheet_name, 1,1,3] 1],}) # configure the chart axes chart.set_x_axis ({'name':' Index', 'position_axis':' on_tick'}) chart.set_y_axis ({'name':' Value', 'major_gridlines': {' visible':False}}) # place the chart on the worksheet worksheet.insert_chart ('E2steps, chart) # output the excel file writer.save ()

5. Save disk space

When dealing with several data science projects at the same time, there are usually a lot of preprocessed data sets obtained from different experiments. In this way, the solid-state hard drives of laptops will soon be filled with this data. Pandas plays a role in saving datasets, compressing data and reading it in decompressed form.

You might as well create a large Pandas data framework with random numbers.

Df = pd.DataFrame (pd.np.random.randn (50000300))

If you save this file in csv format, it will take up 300MB space on your hard drive.

Df.to_csv ('random_data.csv',index=False)

With a compression='gzip'argument, you can reduce the file size to 136MB.

Df.to_csv ('random_data.gz',compression='gzip', index=False)

At the same time, it is easy to read gzipped data on the data framework, so there is no loss of functionality.

Df = pd.read_csv ('random_data.gz') Thank you for reading this article carefully. I hope the article "what are the little-known Pandas skills" shared by the editor will be helpful to you. At the same time, I also hope that you will support us and follow the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report