In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to use Python to analyze second-hand housing prices in Shanghai". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "how to use Python to analyze second-hand housing prices in Shanghai".
Dashboard display
Project background
When I went to the interview, I was asked about the situation of second-hand housing in Shanghai. as a native, I only knew that the price of the expensive district in Shanghai was about this, but I couldn't tell the exact number. Therefore, there is this article.
From a policy point of view, there has been a marked drop in house prices in China since 2016, when the government shouted the slogan "no housing speculation". In 2019, the state announced several more policies to "rescue the market":
1. The government no longer monopolizes the housing supply, alleviating the financial pressure on developers, and reducing the sales price of new houses, resulting in a hot new housing market and a deserted second-hand housing market.
2. Adjust the provident fund loan interest rate, raise the mortgage interest rate of Fannie and Freddie, and combat the speculative demand in the market.
3. Encourage both renting and selling, and encourage enterprises whose main business is housing leasing to buy houses.
4. Lower the application threshold for residence permits, making it easier to settle down and buy a house.
5. Farmers have subsidies to help migrant workers save part of the cost of buying houses.
From an economic point of view, Shanghai's per capita disposable income led the country with 36577 yuan in the first half of 2020, an increase of 3.64 percent over the same period last year. Behind the seemingly beautiful value, how many ordinary people are hidden by the average.
We can use pandas_profiling for quick statistical analysis before cleaning the data.
Import pandas_profilingpandas_profiling.ProfileReport (data) .to_file (". / report/html")
According to the contents of the report, we can see that there are a total of 37491 rows and 20 columns of data this time. There are 7 rows of duplicates, and the proportion of repetitions is less than 0.1%. If the report continues to drop down, you can see the statistics of each column.
Here are a few things we need to clean:
1. Remove duplicate lines
2. Replace the None value
3. Divide and merge the area, housing type, floor and mortgage information.
4. Convert data types
5. Delete redundant characters
6. Due to the error in crawling, the price column is re-assigned.
7. Eliminate abnormal data
# if there is a duplicate value, keep the first data.drop_duplicates (keep='first', inplace=True) # replace Nonedata = data.applymap (lambda x:'no data yet'if x = = 'None' else x) # split the region, housing type, floor, mortgage information and delete the original column Merge the split new column into the original datadata = pd.concat ([data, data ['region'] .str.extract (pat=' (?))\ s (?)), data ['housing'] .str.extract (pat=' (? P\ d +) room (? P\ d +) room (? P\ d +) kitchen (? P\ d +) bathroom') Data ['floor'] .str.extract (pat=' (? P.+)\ (total (? P\ d+) layer\), data ['Mortgage Information'] .map (lambda x:x.strip ()) .str.extract (pat=' (? P. {1}) Mortgage (? P.*)?)], axis=1) data.drop (['region'') 'floor', 'Mortgage Information'], axis=1, inplace=True) data ['Zone'] = data ['Zone'] + 'Zone' # remove the square meters behind the floor area And convert to floatdata ['floor area'] = data ['floor area'] .map (lambda x: float (x [:-1])) # convert data type data ['price'] = data ['price'] .astype (float) # convert date type data ['listing time'] = pd.to_datetime (data ['listing time']) # replace it with NaTdata if there is a string of non-time type ['last transaction'] = pd.to_datetime (data ['last transaction'] Errors= "coerce") # there are several parentheses, a certain area Delete data ['community'] = data ['district'] .str.replace ("[\ (]. *? [\)]]", ") # filter out the data whose price is less than 20, we can find that the area and area of these houses are relatively good, and the recorded data may be wrong. # after returning to Lianjia's website to search for these houses, we found that the units at these prices were all" hundreds of millions ". So we need to clean all the data again # uniformly use 'ten thousand' as the unit of total price data ['price'] = data ['price'] .map (lambda x: Xerox 10000 if x
< 20 else x)# 计算每平米单价data['均价'] = round(data['价格']/data['建筑面积']*10000, 2) 从上面的散点图我们可以看出右边有一个异常点,建筑面积4702平米,总价68万,我返回链家网查询该套房源发现在网站上他就是这么标价的。而同小区的价格如下所示。 1、清洗好后总共剩下 37483 条数据 2、数据的统计周期是 2013-01-18 2020-07-24 3、上海目前出售的二手房面积从 13 平米 ~ 1663.1 平米不等 4、根据爬取数据来看上海最贵的二手房均价为 319960.62 元/平米,整体均价为 56466.26 元/平米 均价超过 30 万元/平米的房源到底在哪里? 没错,就是武康大楼,最早称为"诺曼底公寓",而我们更喜欢叫它"九层楼",这已经是清除了电线后的模样,其实我小时候的印象是这样的。Wires are woven like cobwebs, which is the old Shanghai flavor.
The location of the "nine-story building" is a fork in the road. If you want to clock in, please be careful not to stay in the middle of the road. Wukang Road next to it is also a "road of celebrities" with historical details.
Popular business areas hot_list = ['Sichuan North Road', 'Zhongshan Park', 'Caohejing', 'Xujiahui', 'Lujiazui', 'Nanjing West Road', 'Nanjing East Road', 'people's Square', 'Central Huaihai Road', 'Hongqiao', 'North Bund', 'Xintiandi' 'Jing 'an Temple'] hot = data [data ['town'] .isin (hot_list)] .groupby (by=' town') ['average price'] .agg (['mean',' count']). Sort_values (by='count', ascending=True) pyc.Bar (). Add_xaxis (hot.index.to_list ()). Add_yaxis (series_name= ", yaxis_data=hot ['count'] .tolist () Label_opts=opts.LabelOpts (is_show=False). Reversal_axis (). Set_global_opts (title_opts=opts.TitleOpts (title= "number of hot business areas listed", subtitle= "lack of corresponding housing data in Chongming District" Data as of July 2020\ nSource: Lianjia net "), toolbox_opts=opts.ToolboxOpts (). Render_notebook ()
Huxing huxing = data ['Household'] .where (data ['Household'] .isin (['2 rooms, 1 hall, 1 kitchen, 1 bathroom ",'2 rooms, 2 halls, 1 kitchen, 1 bathroom",'3 bedrooms, 2 halls, 1 kitchen, 2 baths ",'3 bedrooms, 1 hall, 1 kitchen and 1 bathroom"), other=' other', errors='ignore') pyc.Pie (height='600px' (height='600px') Width='600px') .add (series_name=' housing type', data_pair=huxing.value_counts (). Items (), radius= (100,150), rosetype= "radius", label_opts=opts.LabelOpts (is_show=True, formatter= "{b}\ n {c} units\ n {d}%"). Set_global_opts (title= (Shanghai second-hand housing listed housing type), subtitle= "Chongming District lacks corresponding housing data. Data as of July 2020\ nSource: Lianjia net "), toolbox_opts=opts.ToolboxOpts (). Render_notebook ()
There are all kinds of strange types of second-hand houses listed in Shanghai, but there are still more two-bedroom households and less one-bedroom households.
Second-hand housing price data ['price stratification'] = pd.cut (data ['price'], bins= [- np.inf, 100,300,500,800, 1000, np.inf], right=True, labels= ['within 1 million', '100-3 million', '300-5 million', '500-8 million', '800-10 million','10 million and above']) pyc.Pie (height='500px') Width='500px') .add (series_name= "house price", data_pair=data ['housing price stratification']. Value_counts (). Items (), radius= (100,150), rosetype=True, label_opts=opts.LabelOpts (formatter= "{b}\ n {c} suite\ n {d}%") .set_global_opts (title= "Shanghai housing price stratification", subtitle= "Chongming District lacks corresponding housing data" Data as of July 2020\ nSource: Lianjia net "), toolbox_opts=opts.ToolboxOpts (). Render_notebook ()
It is almost impossible to buy an apartment in Shanghai for 1 million yuan. You can think about it in Hegang. Comrades, work hard, there are more than 13000 apartments waiting for you in 3 million!
Pyc.Pie (init_opts=opts.InitOpts (height='500px', width='500px')) .add (series_name= "ring", data_pair=data ['ring'] .replace ("", "no data available"). Value_counts (). Items (), radius= (100,150), rosetype=True, label_opts=opts.LabelOpts (formatter= "{b}\ n {c} sets\ n {d}%"). Set_global_opts (title_opts=opts.TitleOpts (title= "second-hand housing belongs to ring") Lack of corresponding housing data in "subtitle=" Chongming District Data as of July 2020\ nSource: Lianjia net "), toolbox_opts=opts.ToolboxOpts (). Render_notebook ()
The housing supply outside the outer ring is obviously the most, and it is likely that the housing price outside the outer ring is on the low side, so it is more hot, let's continue to look down.
Shanghai average price map pyc.Map (init_opts=opts.InitOpts (height='500px', width='500px')) .add (maptype= "Shanghai", series_name= "average price", data_pair= [list (I) for i in data [data ['listing time']. Dt.year = = 2020] .groupby (by= ['zone']) ['average price'] .mean (). Apply (round). Items ()], is_map_symbol_show=False, is_selected=True Label_opts=opts.LabelOpts (is_show=False). Set_global_opts (tooltip_opts=opts.TooltipOpts (formatter= "{b}: {c} yuan / square meter"), visualmap_opts=opts.VisualMapOpts (max_=100000, pos_right='5%', pos_bottom='20%', is_calculable=True), title_opts=opts.TitleOpts (title= "average price map of second-hand housing in Shanghai in the first half of 2020", subtitle= "lack of corresponding housing data in Chongming District" Data as of July 2020\ nSource: Lianjia net "), toolbox_opts=opts.ToolboxOpts (), legend_opts=opts.LegendOpts (is_show=False)) .render_notebook ()
Look at the red area in the middle. The price in the city center is much higher than that outside the city.
Listing quantity
Affected by the epidemic in January and February this year, the number of second-hand housing listings in Shanghai is relatively low. with the easing of the epidemic, the number of second-hand housing listings in Shanghai has gradually increased since March. The number of listings in June in the first half of 2020 is the largest.
Although the house price in Shanghai is showing a downward trend, I still can't afford it.
Looking at the trend with a line chart, it seems that all districts are quite smooth, but the Pudong New area has risen after April, and Hongkou District has also seen a small wave rise in July.
At this point, I believe you have a deeper understanding of "how to use Python to analyze second-hand housing prices in Shanghai". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.