How to realize the Analysis and Visualization of Urban Public Transportation Network by Python 07/08 Update SLTechnology News&Howtos

How to realize the Analysis and Visualization of Urban Public Transportation Network by Python

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about how to achieve urban public transport network analysis and visualization by Python. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

I. data viewing and preprocessing

The data are obtained from Amap API, including the names of bus lines and stations in Tianjin and their longitude and latitude data.

Import pandas as pddf = pd.read_excel ('site_information.xlsx') df.head ()

Field description:

Route name: the name of the bus line

Uplink and downlink: 0 for uplink; 1 for downlink

Station serial number: the serial number of a bus line that passes through the station in turn.

Site name: site name

Longitude (minutes): longitude of the site

Latitude (minutes): latitude of the site

There are few data fields and the structure is relatively simple, so let's fully understand our data and preprocess it.

There are 30396 total data, 5 missing station names, 1 missing latitude (min) and 38 missing longitude (min). In order to facilitate processing, the rows with missing values are deleted directly.

Latitude and longitude data are 7031.982 and 2348.1016, which need to be converted to degrees.

Df2 = df1.copy () df2 ['longitude (min)'] = df1 ['longitude (min)'] .apply (float) / 60df2 ['latitude (min)'] = df1 ['latitude (min)'] .apply (float) / 60df2.head ()

In the processed data, there are 618 bus lines and 4851 stations.

Resave as processed data

Df2.to_excel ("processed data .xlsx", index=False) II. Data analysis

Analysis of the distribution of bus stops in Tianjin

#-*-coding: UTF-8-*-"" @ Author: Ye Tingyun @ official account: practice Python@CSDN: https://yetingyun.blog.csdn.net/"""import pandas as pdimport matplotlib.pyplot as pltimport matplotlib as mplimport randomdf = pd.read_excel ("processed data .xlsx") x_data = df ['longitude (minutes)'] y_data = df ['latitude (minutes)] colors = [' # FF0000','# 0000CDbands,'# 00BFFFF' '# 008000for i in range,' # FF1493','# FFD700','# FF4500','# 00FA9A,'# 191970mm,'# 9932CC'] colors = [random.choice (colors) for i in range (len (x_data))] mpl.rcParams ['font.family'] =' SimHei'plt.style.use ('ggplot') # set size plt.figure (figsize= (12,6)) Dpi=200) # drawing scatter map longitude and latitude to set the size of color point plt.scatter (x_data, y_data, marker= "o", swatch 9., c=colors) # add description information x axis y axis title plt.xlabel ("longitude") plt.ylabel ("latitude") plt.title ("Tianjin bus stop distribution") plt.savefig ('longitude and latitude scatter plot .png') plt.show ()

The results are as follows:

Through matplotlib drawing scatter diagram to visualize the distribution of Tianjin bus stations, it is easy to see the distribution area of bus hotspots in Tianjin. In order to analyze the bus route network more vividly, we can visualize the data on the actual map and use the BMap of Pyecharts.

#-*-coding: UTF-8-*-"" @ Author: Ye Tingyun @ official account: Xiu Python@CSDN: https://yetingyun.blog.csdn.net/"""import pandas as pdfrom pyecharts.charts import BMapfrom pyecharts import options as optsfrom pyecharts.globals import CurrentConfig# refers to local js resources to render CurrentConfig.ONLINE_HOST = 'D:/python/pyecharts-assets-master/assets/'df = pd.read_excel (' processed data .xlsx' Encoding='utf-8') df.drop_duplicates (name of subset=' station', inplace=True) longitude = list (df ['longitude']) latitude = list (df ['latitude']) datas = [] a = [] for I, j in zip (longitude, latitude): a.append ([I, j]) datas.append (a) print (datas) BAIDU_MAP_AK = "change to your Baidu map AK" c = (init_opts=opts.InitOpts (width= "1200px") Height= "800px") .add _ schema (baidu_ak=BAIDU_MAP_AK, # applied BAIDU_MAP_AK center= [117.20, 39.13], # Tianjin Latitude and Longitude Center zoom=10, is_roam=True,) .add (", type_=" lines ", is_polyline=True, data_pair=datas Linestyle_opts=opts.LineStyleOpts (opacity=0.2, width=0.5, color='red'), # if it is not the latest version, you can annotate the following parameters (progressive=200, progressive_threshold=500,)) c.render ('bus network map .html')

The results are as follows:

As can be seen on the map, Heping District and Nankai District have a dense network of bus lines and convenient transportation.

In the bus line network, the I node represents the I line, in which the degree of node I is defined as the number of lines that can be transferred to the line I, and the degree of the line network reflects the degree of connectivity between the bus line and other lines. an algorithm is constructed to analyze the distribution of bus line network degree.

#-*-coding: UTF-8-*-"@ Author: Ye Tingyun @ official account: repair Python@CSDN: https://yetingyun.blog.csdn.net/"""import xlrdimport matplotlib.pyplot as pltimport pandas as pdimport matplotlib as mpldf = pd.read_excel (" site_information.xlsx ") # use pandas operation to get the name of each line loc = df ['line name'] .unique () # get each line name A list of line names line_list = list (loc) print (line_list) # Open the Excel table data = xlrd.open_workbook ("site_information.xlsx") # print (data) # in memory # get a specific Sheet index of 0, that is, the first table table = data.sheets () [0] # from scratch # which site dictionary should be used for each line pair site_dic = { K: [] for k in line_list} site_list = [] for i in range (1 Table.nrows): # each row of data returns a list x = table.row_values (I) if x [1] = "0": # uplink site data which sites should be added to the list site_dic [x [0]] .append (x [3]) site_list.append (x [3]) else : continue# print (len (site_dic)) # 618lines # print (len (site_list)) # 15248 station data print (f "there are {len (line_list)} lines in the bus network") # 618lines # initialize a list that counts the degree of each node and the index in the route name list corresponds to node_count = [m * 0 for m in range (len (line_list))] # take each line as a node line name as the key value as a list containing all the sites through which each route passes uplink sites = [site for site in site_dic.values ()] # print (sites) for j in range (len (sites)): # similar to the bubble method sort and compare the number of for k in range (j Len (sites)-1): # push back one for each comparison until if len (sites [j]) > len (sites [k + 1]): for x in sites [j]: if x in sites [j] and x in sites [k + 1]: # as long as these two lines have public station node degrees plus 1 Node_ count[j] Node_ Count [k + 1] = node_ Count [j] + 1 Node_ count [k + 1] + 1 break # two lines correspond to the value of the list index plus 1. The comparison ends with else: for x in sites [k + 1]: if x in sites [j] and x in sites [k + 1]: # as long as the two lines have public station node degrees plus 1 Node_ count[j] Node_ Count [k + 1] = node_ Count [j] + 1 Node_ count [k + 1] + 1 break # two lines correspond to the value of the list index plus 1. The end of the comparison of the two lines # print (node_count) # the node number corresponds to the node degree index corresponding to node_number = [y for y in range (len (node_count))] # the maximum value of the linear network degree 175print (the maximum value of the degree of the f "line network is: {max (node_count)} ") the minimum degree of print (f" line network is: {min (node_count)} ") the average degree of print (f" line network is: {sum (node_count) / len (node_count)} ") # set the pixel of the size map # set font matplotlib does not support displaying Chinese native settings plt.figure (figsize= (10) 6), dpi=150) mpl.rcParams ['font.family'] =' SimHei'# draws the degree distribution of each node plt.bar (node_number, node_count, color= "purple") # add description information plt.xlabel ("node number n") plt.ylabel ("node degree K") plt.title ("degree distribution of each node in the line network", fontsize=15) plt.savefig ("degree size of each node in the line network .png") plt.show ()

The results are as follows:

There are 618 lines in the public transport network.

The maximum degree of the line network is 175

The minimum degree of a line network is: 0

The average degree of the line network is 55.41423948220065

Import xlrdimport matplotlib.pyplot as pltimport pandas as pdimport matplotlib as mplimport collectionsdf = pd.read_excel ("site_information.xlsx") # use the pandas operation to retrieve the name of each line loc = df ['line name'] .unique () # get a list of each line name line_list = list (loc) print (line_list) # Open the Excel table data = xlrd.open_workbook ("site_information.xlsx") # print (data) # in memory # get a specific Sheet index of 0, that is, the first table table = data.sheets () [0] # from scratch # which site dictionary should each line pair have? site_dic = {k: [] for k in line_list} site_list = [] for i in range (1) Table.nrows): # each row of data returns a list x = table.row_values (I) if x [1] = "0": # uplink site data which sites should be added to the list site_dic [x [0]] .append (x [3]) site_list.append (x [3]) else : continue# print (len (site_dic)) # 618 lines # print (len (site_list)) # 15248 site data # first initialize a list that counts the degree of each node and the index in the line name list corresponds to node_count = [m * 0 for m in range (len (line_list))] # take each line as a node line name as the key value For a list of all sites sites = [site for site in site_dic.values ()] # print (sites) for j in range (len (sites)): # sort the number of times for k in range (j) similar to the bubble method Len (sites)-1): # push back one for each comparison until if len (sites [j]) > len (sites [k + 1]): for x in sites [j]: if x in sites [j] and x in sites [k + 1]: # as long as these two lines have public station node degrees plus 1 Node_ count[j] Node_ Count [k + 1] = node_ Count [j] + 1 Node_ count [k + 1] + 1 break # two lines correspond to the value of the list index plus 1. The comparison ends with else: for x in sites [k + 1]: if x in sites [j] and x in sites [k + 1]: # as long as the two lines have public station node degrees plus 1 Node_ count[j] Node_ Count [k + 1] = node_ Count [j] + 1 Node_ count [k + 1] + 1 break # two lines correspond to the value of the list index plus 1. The comparison of the two lines ends # print (node_count) # Node number corresponds to the node degree index corresponding to node_number = [y for y in range (len (node_count))] # the maximum value of the linear network degree is 17 percent print (max (node_count)) # set large Pixel # setting font matplotlib for small images does not support displaying Chinese native settings plt.figure (figsize= (10) 6), dpi=150) mpl.rcParams ['font.family'] =' SimHei'# Analysis Node's probability Distribution of degree K # Statistics how many node_count = collections.Counter (node_count) node_count = node_count.most_common () # Point node_dic = {_ k: _ v for _ k _ v in node_count} # sort from small to large to get the degree of a list node sort_node = sorted (node_dic) # get the corresponding value of the key in order, that is, the number of degrees of the same node sort_num = [node_ Dick [Q] for q in sort_node] # the total number of degrees of the medium average probability distribution / number # print (sum (sort_node) / len ( Sort_node) # the print with the largest number of probability distribution (the maximum probability value in f "probability distribution is {max (sort_num)}") probability = [S1 / sum (sort_num) for S1 in sort_num] # probability distribution print (probability) # Tianjin bus line node probability distribution image plt.bar (sort_node) Probability, color= "red") # add description information plt.xlabel ("node degree K") plt.ylabel ("probability P (K) of node degree K") plt.title ("probability distribution of node degree in line network", fontsize=15) plt.savefig ("probability distribution of node degree in line network .png") plt.show ()

The results are as follows:

The maximum probability in the probability distribution is 16.

The degree distribution of Tianjin bus line network is shown above. The Tianjin line network collected in this paper consists of 618 lines, and the maximum degree of the line network is 175. The value of the maximum probability in the probability distribution is 16, and the average degree is 55.41, indicating that Tianjin public transport network provides more transfer opportunities, which makes the accessibility higher. Among them, the values with higher probability are mostly concentrated between 7 and 26. The strength distribution of nodes is relatively uneven, resulting in fewer bus lines in many road sections in Tianjin, and a few road sections pass through too dense lines, resulting in a waste of resources.

Clustering coefficient is to study the compactness of the connection between node neighbors, so it is not necessary to consider the direction of the edge. For a directed graph, treat it as an undirected graph. The network clustering coefficient is large, which indicates that the connection density between the nodes in the network and its nearby nodes is high, that is, the bus lines between the network and the actual stations are densely connected. It is calculated that the clustering coefficient of Tianjin public transport complex network is 0.091, which is lower than that of other cities.

According to the formula:

The aggregation coefficient of the random network of the same size is about 0.00044, which further reflects the small-world characteristics of the network.

Import xlrdimport matplotlib.pyplot as pltimport pandas as pdimport matplotlib as mpl# read data df = pd.read_excel ("site_information.xlsx") # use pandas operation to retrieve the name of each line loc = df ['line name']. Drop_duplicates () # get a list of each line name in the following order in the Excel table line_list = list (loc) # print (line_list) # open the Excel table Lattice data = xlrd.open_workbook ("site_information.xlsx") # print (data) # in memory # get a specific Sheet index of 0, that is, the first table table = data.sheets () [0] # from scratch # which site dictionary should be for each line pair site_dic = {k: [] for k in line_list} site_list = [] for i in range (1) Table.nrows): # each row of data returns a list x = table.row_values (I) if x [1] = = "0": # only take uplink site data which sites should be added to the list site_dic [x [0]] .append (x [3]) site_list.append (x [3]) else : continue# print (len (site_dic)) # 618 lines # print (len (site_list)) # 15248 site data # first initialize a list that counts the degree of each node and the index in the line name list corresponds to node_count = [m * 0 for m in range (len (line_list))] # take each line as a node line name as the key value For a list of all sites sites = [site for site in site_dic.values ()] # print (sites) # count the degree for j in range of each node (len (sites)-1): # similar to the bubble method sort how many times for k in range (j) Len (sites)-1): # push back one for each comparison until if len (sites [j]) > len (sites [k + 1]): for x in sites [j]: if x in sites [j] and x in sites [k + 1]: # as long as these two lines have public station node degrees plus 1 Node_ count[j] Node_ Count [k + 1] = node_ Count [j] + 1 Node_ count [k + 1] + 1 break # two lines correspond to the value of the list index plus 1. The comparison ends with else: for x in sites [k + 1]: if x in sites [j] and x in sites [k + 1]: # as long as the two lines have public station node degrees plus 1 Node_ count[j] Node_ Count [k + 1] = node_ Count [j] + 1 Node_ count [k + 1] + 1 break # two lines correspond to the value of the list index plus 1. The end of the comparison # find the actual number of sides between the neighbor nodes of the node Ei = [] # find the adjacent nodes for each line and count the actual number of sides of the adjacent nodes for an in range (len (sites)): neighbor = [ ] if node_ count[ a] = 0: Ei.append (0) continue if node_ Count [a] = 1: Ei.append (0) continue for b in range (len (sites)): if a = = b: # itself is no more than continue if len (sites [a]) > len (sites [b]): # more slave sites Select a station in the line to see if there is a public site for x in sites [a]: if x in sites [a] and x in sites [b]: # find the neighbor node neighbor.append (sites [b]) break else: for x in sites [b]: If x in sites [a] and x in sites [b]: # find the neighbor node neighbor.append (sites [b]) break # determine the actual number of sides of these nodes in the neighbor node and similar to the previous method to determine whether the two are connected or not count = 0 for c in range (len (neighbor)-1): for d in range (c Len (neighbor)-1): # push back one for each comparison until try: if len (sites [c]) > len (sites [d + 1]): for y in sites [c]: if y in sites [c] and y in sites [d + 1]: # The two neighbor nodes are also connected to count + = 1 break else: continue else: for y in sites [d + 1]: if y in sites [c] and y in Sites [d + 1]: # neighbor nodes are also connected count + = 1 break else: continue except IndexError: break Ei.append (count) # the actual connection between neighbor nodes of each node Number of sides # print (Ei) # Node number corresponds to the node's degree index node_number = [y for y in range (len (node_count))] # setting font matplotlib does not support displaying the pixel plt.figure of the Chinese native setting mpl.rcParams ['font.family'] =' SimHei'# setting size map (figsize= (10)) 6), Dpi=150) # clustering coefficient distribution image of bus route network Ci = [] for m in range (len (node_number)): if node_ Count [m] = 0: Ci.append (0) elif node_ Count [m] = 1: Ci.append (0) else: # 2 * the actual number of connected edges / the maximum number of edges of the neighbor node of this node Ci.append (2 * Ei [m] / (node_ Count [m] * (node_ Count [m]-1)) # average clustering coefficient print ("average clustering coefficient of Tianjin bus route network is: {: .4f}" .format (sum (Ci) / len (Ci) plt.bar (node_number) Ci, color= "blue") # add description information plt.xlabel ("node number n") plt.ylabel ("node clustering coefficient") plt.title ("clustering coefficient distribution of each node in the line network", fontsize=15) plt.savefig ("clustering coefficient distribution .png") plt.show ()

The results are as follows:

The average clustering coefficient of Tianjin bus line network is 0.0906.

The above is the Python shared by the editor to achieve urban public transport network analysis and visualization. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.