In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to use Numpy to analyze the riding time of a bicycle". The explanation in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian and study and learn "how to use Numpy to analyze the riding time of a bicycle" together.
analytical purposes
Look at the title to know, analyze the riding time of shared bicycles in each season.
data collection
Because this data comes from the network, let's briefly look at the structure of the data:
You can see that the data has nine fields:
"Duration (ms)","Start date","End date","Start station number","Start station","End station number","End station","Bike number","Member type"
According to our goal, we only need the first field Duration(ms).
Therefore, the first step is to read the downloaded data and then extract the required fields in the second step of data cleaning:
#Data collection
def data_collection():
data_arr_list = []
for data_filename in data_filenames:
file = os.path.join(data_path, data_filename)
data_arr = np.loadtxt(file,dtype=bytes,delimiter=',', skiprows=1).astype(str)
data_arr_list.append(data_arr)
return data_arr_list Data cleaning
Because the data is exported after sorting, there is no need to clean the missing values and other operations. We directly extract the required fields and do some processing.
The unit of riding time here is ms, so it needs to be converted to min/1000/60.
#Data Cleaning
def data_clean(data_arr_list):
duration_min_list = []
for data_arr in data_arr_list:
data_arr = data_arr[:,0]
duration_ms = np.core.defchararray.replace(data_arr,'"','')
duration_min = duration_ms.astype('float') / 1000 / 60
duration_min_list.append(duration_min)
return duration_min_list
data analysis
Compute the average value in numpy provides a calculation function that can be called directly.
#Data analysis
def mean_data(duration_min_list):
duration_mean_list = []
for duration_min in duration_min_list:
duration_mean = np.mean(duration_min)
duration_mean_list.append(duration_mean)
return duration_mean_list
results are shown
The visual display here uses matplotlib.pyplot library. Salted fish has not yet written relevant introductory articles. You can read the documents online and learn to use them simply. After that, there will be a series of articles to write visual content.
#Data Display
def show_data(duration_mean_list):
plt.figure()
name_list =['Q1 ', ' Q2 ', ' Q3 ', ' Q4 ']
plt.bar(range(len(duration_mean_list)),duration_mean_list,tick_label = name_list)
plt.show()
achievement display
From the above figure alone, we can see that the riding time in the second and third quarters with hot summer and cool autumn as the main tone is higher than that in the first and fourth quarters with spring and winter as the main tone, so as to judge the impact of temperature changes on people's shared bicycles.
Some trodden pits about data reading (1)
In Python, strings are divided into byte strings and text strings. We usually refer to strings as text strings. The string read by numpy's loadtxt function defaults to a byte string, and the output string will have a b in front of it, such as b'...'. It is usually necessary to convert, if you do not convert there will be problems.
If you don't pay attention to this in the data collection part, in the data cleaning part, the format of the field will be incorrect because the value of Duration is one more b.
Treatment:
numpy.loadtxt always reads bytes, always preceded by a b
Cause: np.loadtxt and np.genfromtxt operate in byte mode, which is the default string type in Python 2. But Python 3 uses unicode, and marks bytestrings with this b. Numpy.loadtxt also states: Note that generators should return byte strings for Python 3k. Resolution: Use numpy.loadtxt to read strings from files, preferably np.loadtxt(filename, dtype=bytes). atype (str)
About the Pit on Data Reading (II)
You can see that salted fish uses numpy.loadtxt when reading data. This operation is convenient, but the cost is that the memory directly explodes. Fortunately, the data this time is only 500M, so I don't recommend you to use this method. I will improve it later (if I will).
Here to share a piece of code, from Mu class net bobby teacher's actual class, how to use the generator to read large text files:
#500G, Special Line
def myreadlines(f, newline):
buf = ""
while True:
while newline in buf:
pos = buf.index(newline)
yield buf[:pos]
buf = buf[pos + len(newline):]
chunk = f.read(4096)
if not chunk:
#Read to end of file
yield buf
break
buf += chunk
with open("input.txt") as f:
for line in myreadlines(f, "{|}"):
print (line)
About matplotlib.pyplot using pit on
When visualized, the histogram is marked in Chinese, while when displayed, it is displayed directly as a square, and Chinese cannot be displayed. As follows:
Treatment:
Solution 1: Modify the configuration file
(1)Find the matplotlibrc file (search it and you'll find it)
(2)Modify: font.serif and font.sans-serif, mine in lines 205 and 206
font.serif: SimHei, Bitstream Vera Serif, New Century Schoolbook, Century Schoolbook L, Utopia, ITC Bookman, Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif
font.sans-serif: SimHei, Bitstream Vera Sans, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif
Solution 2: Modify in code
import matplotlib
Specify default font
matplotlib.rcParams['font.sans-serif'] = ['SimHei']
matplotlib.rcParams['font.family']='sans-serif'
Resolves an issue where the sign '-' appears as a square
matplotlib.rcParams['axes.unicode_minus'] = False
Thank you for your reading. The above is the content of "How to use Numpy to analyze the riding time of a bicycle." After studying this article, I believe everyone has a deeper understanding of how to use Numpy to analyze the riding time of a bicycle. The specific use situation still needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.