In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to crawl weather data and visually analyze it by Python". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to crawl weather data and visually analyze it by Python".
Core function design
Generally speaking, we need to crawl the weather data in the China Weather Network, save it as a csv file, and visually analyze and display the data.
To disassemble the requirements, we can sort out that we need to complete it in the following steps:
1. Through the crawler to obtain the China Weather Network 7.20-7.21 rainfall data, including city, wind direction, wind level, precipitation, relative humidity, air quality.
two。 The obtained weather data are preprocessed, the wind grade and wind direction of Henan are analyzed, and the wind direction and wind level radar map is drawn.
3. The correlation analysis diagram of temperature and humidity is drawn according to the obtained temperature and humidity, and the temperature and humidity are compared and analyzed.
4. According to the rainfall of each city, the hourly precipitation of nearly 24 hours is visualized.
5. Map the 24-hour cumulative rainfall of each city.
Implement steps to crawl data
First of all, we need to obtain the rainfall data of each city. Through the analysis of the website of China Weather Network, it is found that the city's weather website is: http://www.weather.com.cn/weather/101180101.shtml.
According to the analysis of the data, it is not difficult to find the returned data in json format:
101180101 is on behalf of the city code-7 days of weather forecast data information in the div tag and id= "7d"-date, weather, temperature, wind level and other information are in the ul and li tag page structure we have analyzed, then we can start to crawl the data we need. After getting all the data resources, you can save the data.
Request website
The website of Weather Network is http://www.weather.com.cn/weather/101180101.shtml. If you want to crawl different regions, just change the last 101180101 area code, and the front weather represents a 7-day web page.
Def getHTMLtext (url): "" request for web content "try: r = requests.get (url, timeout = 30) r.raise_for_status () r.encoding = r.apparent_encoding print (" Success ") return r.text except: print (" Fail ") return"
Processing data
The BeautifulSoup library is used to extract the string that has just been obtained. Get the wind direction, wind level, precipitation, relative humidity, air quality and so on.
Def get_content (html,cityname): "" process the useful information to save the data file "" final = [] # initialize a list to save the data bs = BeautifulSoup (html, "html.parser") # create the BeautifulSoup object body = bs.body data = body.find ('div' {'id':' 7d'}) # find the div tag and crawl the data for the day below id = 7d # data2 = body.find_all ('div' {'class':'left-div'}) text = data2 [2] .find (' script'). String text = text [text.index ('=') + 1:-2] # remove and change var data= to json data jd = json.loads (text) dayone = jd ['od'] [' od2'] # find the data for the day final_day = [] # store the data for the day count = 0 for i in Dayone: temp = [] if count 0: temp = [] # temporarily store daily data date = day.find ('H2'). String # get the date date = date [0:date.index ('day')] # take out the date number temp.append (date) inf = day.find_all ('p') # find the p tag under li Extract the value of the first p label That is, weather temp.append (inf [0] .string) tem_low = inf [1] .find ('i'). String # find the minimum temperature if inf [1] .find ('span') is None: # the weather forecast may not have a maximum temperature tem_high = None else: tem_high = inf [1] .find (' span') ). String # find the maximum temperature temp.append (tem_low [:-1]) if tem_high [- 1] = = '℃': temp.append (tem_high [:-1]) else: temp.append (tem_high) wind = inf [2]. Find _ all ('span') # find the wind for j in wind: Temp.append (j ['title']) wind_scale = inf [2] .find (' i') .string # find wind level index1 = wind_scale.index ('level') temp.append (int (wind_ scale [index1-1:index1])) final.append (temp) I = I + 1 return final_day Final
We have got the weather data of the city. Similarly, we can obtain the weather data of various prefecture-level cities in Henan Province according to different area numbers.
Citycode = {"Zhengzhou": "101180101", "Xinxiang": "101180301", "Xuchang": "101180401", "Pingdingshan": "101180501", "Xinyang": "101180601", "Nanyang": "101180701", "Kaifeng": "101180801" Luoyang: "101180901", "Shangqiu": "101181001", "Jiaozuo": "101181101", "Hebi": "101181201", "Puyang": "101181301", "Zhoukou": "101181401", "Luohe": "101181501" Zhumadian: "101181601", "Sanmenxia": "101181701", "Jiyuan": "101181801" "Anyang": "101180201"} citycode_lists = list (Citycode.items ()) for city_code in citycode_lists: city_code = list (city_code) print (city_code) citycode = city_code [1] cityname = city_code [0] url1 = 'http://www.weather.com.cn/weather/'+citycode+' .shtml'# 24h weather China Weather Network html1 = getHTMLtext (url1) data1, data1_7 = get_content (html1 Cityname) # get data for 1-7 days and for the current day
Store data
Def write_to_csv (file_name, data, day=14): "" Save as csv file "" with open (file_name, 'averse, errors='ignore', newline='') as f: if day= = 14: header = [' date', 'city', 'weather', 'minimum temperature', 'maximum temperature', 'wind direction', 'wind direction', 'wind force'] else: header = ['hour', 'city', 'temperature' F_csv = csv.writer (f) f_csv.writerow (header) f_csv.writerows (data) write_to_csv ('Henan Weather .csv', data_all,1)
In this way, we can save the weather data of various prefecture-level cities in the province.
Wind direction and wind level radar chart
Statistics of the wind force and wind direction of the whole province, because the wind direction is shown more clearly by the way of polar coordinates, so we use polar coordinates to show the wind direction of the day, dividing the circle into 8 parts, each representing a wind direction, and the radius represents the average wind force, and as the wind level increases, the blue deepens.
Def wind_radar (data): "" Wind Direction Radar Map "wind = list (data ['Wind Direction']) wind_speed = list (data ['Wind scale']) for i in range (0Magne24): if wind [I] =" north wind ": wind [I] = 90 elif wind [I] =" south wind ": wind [I] = 270 elif wind [I] =" west wind ": wind [I] = 180 elif wind [I] =" east wind " ": wind [I] = 360 elif wind [I] =" northeast wind ": wind [I] = 45 elif wind [I] =" northwest wind ": wind [I] = 135 elif wind [I] =" southwest wind ": wind [I] = 225 elif wind [I] =" southeast wind ": wind [I] = 315 degs = np.arange (45p361 temp = [] for deg in degs: speed = [] # get wind_deg in the specified range The wind speed average data of for i in range (0Power24): if wind [I] = = deg: speed.append (wind_ speed [I]) if len (speed) = 0: temp.append (0) else: temp.append (sum (speed) / len (speed)) print (temp) N = 8 theta = np.arange (0. 0np.pip. (temp) # draw polar map coordinate system plt.axes (polar=True) # define the RGB value for each sector (R Greco B) The larger the x, the closer the corresponding color is to blue colors = [(1-x/max (temp), 1-x/max (temp), 0.6) for x in radii] plt.bar (theta,radii,width= (2*np.pi/N), bottom=0.0,color=colors) plt.title ('Henan Wind scale-- Dragon Juvenile', Xanthi 0.2).
The results are as follows:
Observation shows that the wind is the most northeasterly on that day, with an average wind force of 1.75.
Correlation analysis of temperature and humidity
We can analyze whether there is a relationship between temperature and humidity. In order to verify it more clearly and intuitively, we can use the discrete point plt.scatter () method to set the temperature as Abscissa and humidity as ordinate, point out the points at each time in the graph, and calculate the correlation coefficient.
Def calc_corr (a, b): "" calculate the correlation coefficient "" a_avg = sum (a) / len (a) b_avg = sum (b) / len (b) cov_ab = sum ([(x-a_avg) * (y-b_avg) for x in zip (a) " ) sq = math.sqrt (sum ([(x-a_avg) * * 2 for x in a]) * sum ([(x-b_avg) * * 2 for x in b])) corr_factor = cov_ab/sq return corr_factordef corr_tem_hum (data): "temperature and humidity correlation analysis"tem = data ['temperature'] hum = data ['relative humidity'] plt.scatter (tem,hum Color='blue') plt.title ("temperature and humidity correlation Analysis Chart-- Dragon teenagers") plt.xlabel ("temperature / ℃") plt.ylabel ("relative humidity /%") # plt.text (20 size':'10','color':'red' 40, "correlation coefficient is:" + str (calc_corr (tem,hum)), fontdict= {'size':'10','color':'red'}) plt.show () print ("correlation coefficient:" + str (calc_corr (tem,hum)
The results are as follows:
It can be found that there is a strong and negative correlation between temperature and humidity in a day. When the temperature is lower, the water content in the air is more, the humidity is naturally higher, and when the temperature is high, the water vapor in the air increases, and the relative humidity decreases, but in fact, the water vapor in the air often increases.
Precipitation per hour in 24 hours from pyecharts import options as optsfrom pyecharts.charts import Map,Timeline# defines a combination of timeline and map def timeline_map (data): tl = Timeline (). Add_schema (play_interval = 300 Letters playing FalseParent = "horizontal", is_loop_play = True Is_auto_play=False) # set playback speed, Whether to loop the parameters such as for h in time_line_final: X = data [data ["hours"] = = h] ['City'] .values.tolist () # Select the specified city y=data [data ["hours"] = = h] ['precipitation'] .values.tolist () # precipitation at the selected time map_shape = (Map () .add) ("{} h hour precipitation (mm)" .format (h) [list (z) for z in zip (x, y)], "Henan") # package input area and corresponding precipitation data. Set _ series_opts (label_opts=opts.LabelOpts ("{b}")) # configuration series parameters {b} is the display area data. Set _ global_opts (title_opts=opts.TitleOpts (title= "rainfall Distribution in Henan Province-Dragon Juvenile"), # set the title visualmap_opts=opts.VisualMapOpts in the global parameters (max_=300, # set the maximum value of the mapping configuration item is_piecewise=True # set whether to display segments pos_top = "60%", # the distance between the mapping configuration item and the top of the image pieces= [{"min": 101, "label":'> 100ml' "color": "# FF0000"}, # segment specifies the color and name {"min": 11, "max": 50, "label":'11-50ml FF3333, "" color ":" # FF3333 "}, {" min ": 6," max ": 10 "label":'6-10 ml, "color": "# FF9999"}, {"min": 0.1, "max": 5, "label": '0.1-5 ml, "color": "# FFCCCC"}]) tl.add (map_shape "{} h" .format (h) # adds data from different dates to timeline return tltimeline_map (data) .render ("rainfall.html") 24-hour cumulative rainfall from pyecharts import options as optsfrom pyecharts.charts import Map,Timeline# defines a combination of timeline and map time_line_final = list (data1 ['hours'] .iloc [0:24]) def timeline_map (data1): tl = Timeline (). Add_schema (play_interval = 200 Heights 40) Is_rewind_play=False,orient = "horizontal", is_loop_play = True Is_auto_play=True) # set playback speed, Whether to loop the parameters such as for h in time_line_final: X = data1 [data1 ["hours"] = = h] ['City'] .values.tolist () # Select the specified city y=data1 [data1 ["hours"] = = h] ['precipitation'] .values.tolist () # precipitation at the selected time map_shape1 = (Map () .add) ("{} h accumulated precipitation (mm)" .format (h) [list (z) for z in zip (x, y)], "Henan") # package input area and corresponding precipitation data. Set _ series_opts (label_opts=opts.LabelOpts ("{b}")) # configuration series parameters {b} is the display area data. Set _ global_opts (title_opts=opts.TitleOpts (title= "cumulative rainfall distribution in Henan Province-Dragon juvenile"), # set the title visualmap_opts=opts.VisualMapOpts in the global parameters (max_=300, # set the maximum value of the mapping configuration item is_piecewise=True # set whether to display segments pos_top = "60%", # the distance between the mapping configuration item and the top of the image pieces= [{"min": 251 "label": "torrential rain", "color": "# 800000"}, # segments specify the color and name {"min": 101,250 "max", "label": 'rainstorm', "color": "# FF4500"} {"min": 51, "max": 100, "label": 'rainstorm', "color": "# FF7F50"}, {"min": 25, "max": 50, "label": 'heavy rain', "color": "# FFFF00"} {"min": 10, "max": 25, "label": 'moderate rain', "color": "# 1E90FF"}, {"min": 0.1, "max": 9.9, "label": 'light rain' "color": "# 87CEFA"}]) tl.add (map_shape1, "{} h" .format (h)) # add data from different dates to timeline return tltimeline_map (data1) .render ("rainfalltoall_1.html") to this I believe you have a deeper understanding of "how to achieve Python to crawl weather data and visual analysis". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.