How to build your own quantitative Analysis Database in Python 07/12 Update SLTechnology News&Howtos

How to build your own quantitative Analysis Database in Python

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

What this article shares with you is about how to build your own quantitative analysis database in Python. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Hand in hand to teach you to build your own quantitative analysis database

Introduction:

Data is an important basis for financial quantitative analysis, including historical stock trading data, fundamental data of listed companies, macro and industry data and so on. With the increasing expansion of information flow, learning to obtain, query and process data and information is becoming more and more important. For people who tinker with quantitative transactions, how can they say they don't know how to play with databases? At present, the commonly used open source (free) databases are MySQL, Postgresql, Mongodb and SQLite (included in Python), which are among the top 10 in the DB-Engines rankings from 2018 to 2019 (see figure below), indicating their high usage and popularity. Each of these databases has its own characteristics and applicable environment, and there are a lot of relevant information about which or how to learn online. This paper briefly introduces how to use Python to operate Postgresql database (other databases are similar), use psycopg2 and sqlalchemy to realize the interaction between postgresql and pandas dataframe, and build your own quantitative analysis database step by step.

Installation and use of PostgreSQL

Install PostgreSQL. To its official website to choose a version suitable for your computer configuration to download and install, the installation process in addition to setting the password (this article is set to "123456"), the other can choose all the default, such as really can not refer to the article on CSDN: PostgreSQL installation detailed steps (windows). After installation, you can also see pgAdmin4 in the installation directory, which is a built-in database graphical tool. The latest version is the Web application, which is similar to Python's Jupyter Notebook and can be used to view and manipulate postgresql databases.

Install the psycopg2 and sqlalchemy libraries on Python. Psycopg2 is the interface of Python to connect to PostgreSQL database, sqlalchemy is more widely used, it can connect to database (MySQL, SQLite, PostgreSQL), especially for pandas dataframe data, it is very convenient to operate. There are a lot of introductions about these two python libraries online, which are not expanded in detail here, but can be installed using pip install xxx on cmd.

Example application

First of all, using tushare to get more than 3000 stock market data to the local, using psycopg2 and sqlalchemy as the interface, the data will be stored in the local PostgreSQL database to facilitate further query and operation.

# first introduce the libraries that may be used, such as later analysis and visualization, such as import tushare as tsimport pandas as pd import numpy as npimport matplotlib.pyplot as plt# to display the Chinese character and the minus sign from pylab import mplmpl.rcParams ['font.sans-serif'] = [' SimHei'] mpl.rcParams ['axes.unicode_minus'] = False# setting tokentoken=' input your token'pro = ts.pro_api (token)

Data acquisition function, the default time can be changed at any time.

# if you report an error, upgrade tushare to the latest def get_data (code,start='20190101',end='20190425'): df=ts.pro_bar (ts_code=code, adj='qfq', start_date=start, end_date=end) return df stock code acquisition function to get the code of the latest trading day. # get the latest stock symbol and abbreviation def get_code (): codes = pro.stock_basic (list_status='L') .ts_code.values return codes for the current trading day

Insert PostgreSQL database operation, the use of try...except...pass in the function is to avoid some data errors lead to program crash.

From sqlalchemy import create_engineimport psycopg2engine = create_engine ('postgresql+psycopg2://postgres:123456@localhost:5432/postgres') def insert_sql (data,db_name,if_exists='append'): # using try...except..continue to avoid errors, run crash try: data.to_sql (db_name,engine,index=False,if_exists=if_exists) # print (code+' writes successfully to the database') except: pass

Due to the large amount of market data and slow download, download daily transactions between 20190101 and 20190425 first.

The data will be updated continuously later.

# downloading 20190101-20190425 data and inserting it into database stock_data# is a time-consuming step About 25-35 minutes for code in get_code (): data=get_data (code) insert_sql (data,'stock_data') # read the entire table data df=pd.read_sql ('stock_data',engine) print (len (df)) # output result: 27099A selects ts_code=000001.SZ stock data df=pd.read_sql ("select * from stock_data where ts_code='000001.SZ'", engine) print (len (df))

Build a data update function that can download and insert data from other time periods. From January 1, 2018 to April 25, 2019, the number has reached 1.08 million.

# update data or download other period data def update_sql (start,end,db_name): from datetime import datetime,timedelta for code in get_code (): data=get_data (code,start,end) insert_sql (data,db_name) print (f'{start}: {end} data updated successfully') # download data during 20180101-20181231 # you only need to run it once, you can comment out the download data after it is no longer run # download data is slow It takes about 20-35 minutes for start='20180101'end='20181231'db_name='stock_data'# data to be downloaded and stored in the database update_sql (start,end,db_name) # use pandas's read_sql to read data df_all_data=pd.read_sql ('stock_data' Engine) print (len (df_all_data)) # output result: 108705 check the transaction code and the number of transaction dates print (len (df_all_data.ts_code.unique () print (len (df_all_data.trade_date.unique () # output result: 3604 319d=df_all_data.trade_date.unique () print (d.max ()) print (d.min ()) 2019-04-25T00:00:00.0000000002018-01-02T00:00:00.000000000# get data pd.read_sql ("select * from stock_data where trade_date='2019-04-25'", engine). Head ()

Build data query and visualization functions: def plot_data (condition,title): from pyecharts import Bar from sqlalchemy import create_engine engine = create_engine ('postgresql+psycopg2://postgres:123456@localhost:5432/postgres') data=pd.read_sql ("select * from stock_data where+" + condition,engine) count_=data.groupby (' trade_date') ['ts_code'] .count () attr=count_.index v1=count_.values bar=Bar (title,title_text_size=15) bar.add ('', attr,v1 Is_splitline_show=False,linewidth=2) return bar query stock price data distribution of less than 2 yuan C1 = "close9.5" T2 = "stock price increase of more than 9.5% time distribution" plot_data (c2memt2)

Query the daily stock price decline of more than-9.5% of the data distribution:

C3 = "pct_chg

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.