Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to analyze the data of second-hand houses in Beijing

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to use Python to analyze second-hand housing data in Beijing". In daily operation, I believe many people have doubts about how to use Python to analyze second-hand housing data in Beijing. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to use Python to analyze second-hand housing data in Beijing". Next, please follow the editor to study!

We used Python to obtain the second-hand housing data of Lianjia online in 16 districts of Beijing. First import the data processing package pandas, visualization tools pyecharts and plotly that you want to use.

# Import required packages import numpy as np import pandas as pd import matplotlib.pyplot as plt import os from pyecharts.charts import Pie, Map, Bar, Line, Grid, Pagefrom pyecharts import options as opts import plotly as py import plotly.graph_objs as go import plotly.express as px

Data reading

Use the loop to read in the dataset, then dereprocess it, and look at the size of the dataset. You can see that there are a total of 4403 pieces of data after deduplication.

# read data file_list = os.listdir ('.. / data/') df_all = pd.DataFrame () for file in file_list: file_name = file.split ('.') [0] df = pd.read_csv (f'../data/ {file}') df ['region_name'] = file_name df_all = df_all.append (df Ignore_index=True) # de-weight df_all = df_all.drop_duplicates () print (df_all.shape) (33509, 9)

Preview the following data:

Df_all.head (2)

Data preprocessing

We extract and process the features of the data set in order to facilitate the subsequent data analysis work, the main processing work includes:

Title: no analysis required, deletin

Detail_url: no analysis required, deletin

Position: dimension is too small, deleted

HouseInfo: extraction room, hall, area, orientation, decoration, floor (high, middle and low), year of construction, plate tower

FollowInfo: no analysis required, deletin

Tag_info: whether the extraction is close to the subway

Total_price: withdraw the total house price

UnitPrice: unit price of housing

Region_name: no processing required

# delete the column df_all = df_all.drop (['title',' detail_url', 'position',' followInfo'] Axis=1) # extraction room df_all ['halls'] = df_all [' houseInfo'] .str.split ('|). Str [0] .str.extract (r'(\ d +) room') df_all ['bedrooms'] = df_all [' houseInfo'] .str.split ('|'). Str [0] .str.extract (r'\ d room (\ d +) hall') # extraction area df_all ['area'] = df_ All ['houseInfo'] .str.split (' |') .str [1] .str.extract (r' (\ dcu.*\ d+) square meter') # extract toward df_all ['orient'] = df_all [' houseInfo'] .str.split ('|) .str [2] # extract decoration type df_all ['decorate_type'] = df_all [' houseInfo'] .str.split ('|'). Str [3] # mention Take the floor df_all ['floor'] = df_all [' houseInfo'] .str.split ('|). Str [4] # extract the year of the building df_all ['built_year'] = df_all [' houseInfo'] .str.split ('|). Str [5] .str.extract (r'(\ d +)') # extract the plate tower df_all ['banta'] = df_all [' houseInfo']. Str.split ('|') .str [6] # Delete houseInfodf_all = df_all.drop ('houseInfo' Axis=1) # extract subway df_all ['subway'] = [1 if' subway'in i else 0 for i in df_all ['tag_info']] # delete tag_infodf_all = df_all.drop (' tag_info' Axis=1) # extraction total price df_all ['total_price'] = df_all [' total_price'] .str.extract (r'(\ d +)') df_all ['unitPrice'] = df_all [' unitPrice'] .str.extract (r'(\ d +)') # Null value-directly delete df_all = df_all.dropna () # convert data type df_all ['total_price'] = df_all [' total_price' ] .astype ('int') df_all [' unitPrice'] = df_all ['unitPrice'] .astype (' int') df_all ['halls'] = df_all [' halls'] .astype ('int') df_all [' bedrooms'] = df_all ['bedrooms'] .astype (' int') df_all ['area'] = df_all [' area'] .astype ('float') df_all [' built_year'] = df_all [' Built_year'] .astype ('int') df_all [' subway'] = df_all ['subway'] .astype (' int') df_all.head ()

Further process the floor, year of construction, and house orientation fields.

Def transform_floor (x): if x = = 'high floor' or x = = 'top floor' or x = = 'upper floor': return 'upper floor' elif x = = 'lower floor' or x = = 'bottom floor' or x = = 'understack' or x = ='1 floor'or x ='2 floor'or x = ='3 floor': return 'lower floor' elif x = 'middle floor' or x = ='4 floor'or x ='5 floor 'or x = =' 6 floor': return 'middle' elif x = = 'basement': return 'basement' else: # others are classified as high-rise return 'high-level' # floor generalization DF _ all ['floor_type'] = df_all [' floor'] .str.replace (r'\ (. *?\)' ''). Str.strip () df_all ['floor_type'] = df_all.floor_type.apply (transform_floor) df_all = df_all.drop (' floor' Axis=1) # orient- generalization DF _ all ['orient'] = df_all [' orient'] .str.extract (r'([\ u4e00 -\ u9fa5])') # bulit_yeardf_all ['built_year'] = 2020-df_all [' built_year'] # banta- generalization DF _ all ['banta'] = df_all.banta.str.strip () df_all.head ()

First of all, we can see the trend chart of second-hand housing prices in Beijing over the past year, and we can see that there is a trend of pullback. The current average price is 57589 per square meter.

The number of second-hand houses in different areas of Beijing

So what is the distribution of second-hand housing in various areas of Beijing?

What about second-hand housing prices in different regions? Xicheng District takes the lead, leading the second-hand housing market in Beijing at a price of 114980 yuan per square meter. Secondly, Dongcheng District ranked second with 97295 per square meter. Haidian District ranked third with a price of 85954 per square meter.

The code is as follows:

# generate data s_region = df_all.groupby ('region_name') [' unitPrice'] .mean (). Sort_values (ascending=False) x_data = [for i in s_region.index.tolist ()] y_data = [round (I) for i in s_region.values.tolist ()] data_pair = [list (z) for z in zip (x_data, y_data)] # Map map1 = Map (init_opts=opts.InitOpts (width='1350px') Height='750px')) map1.add (', data_pair, maptype=' Beijing') map1.set_global_opts (title_opts=opts.TitleOpts (average price of title=' second-hand housing in different areas of Beijing (yuan / square meter), visualmap_opts=opts.VisualMapOpts (max_=114979)) map1.render () # bar chart bar2 = Bar (init_opts=opts.InitOpts (width='1350px', height='750px')) bar2.add_xaxis (x_data) bar2.add_yaxis ('' Y_data) bar2.set_global_opts (title_opts=opts.TitleOpts (title=' second-hand housing in different areas of Beijing (yuan / square meter)'), visualmap_opts=opts.VisualMapOpts (max_=114979) bar2.render ()

What is the price of second-hand houses in Beijing?

So how much does it cost to buy a second-hand house in Beijing? Then we analyzed the price of second-hand housing, and we can see from the chart that the total price is in the range of 300-5 million, accounting for 35.9%. The proportion of 50-8 million is 26.54%. Those under 3 million accounted for 19.54%.

The code is as follows:

Bins= [74,300,500,800,1000, 8299] bins_label = ['3 million and below', '300-5 million', '500-8 million', '800-10 million', 'over 10 million'] # New fields df_all ['price_cut'] = pd.cut (df_all [' total_price'], bins=bins Labels=bins_label) price_num = df_all.price_cut.value_counts () # data pair data_pair= [list (z) for z in zip (price_num.index.tolist (), price_num.values.tolist ())] # Pie chart pie1 = Pie (init_opts=opts.InitOpts (width='1350px', height='750px')) pie1.add ('', data_pair=data_pair, radius= ['30%','60%'] Rosetype='radius') pie1.set_global_opts (title_opts=opts.TitleOpts (what are the prices of title=' second-hand houses in Beijing?) , legend_opts=opts.LegendOpts (orient='vertical', pos_top='15%', pos_left='2%')) pie1.set_series_opts (label_opts=opts.LabelOpts (formatter= "{b}: {d}%")) pie1.set_colors (['# FF7F0E','# 1F77B4,'# 2CA02C),'# D62728,'# 946C8B']) pie1.render ()

Age distribution of second-hand houses in Beijing

So how old are these second-hand houses? It can be seen that the largest number of houses are over 20 years old, with 10946 houses accounting for 33.73%, followed by 7835 houses aged 15-20 years, accounting for 24.15%. Only 1441 sets were set within five years, accounting for 4.44%.

Whether it is close to the relationship between subway and housing unit price

In terms of housing orientation, south-facing nature is the most, accounting for 68.97%. The second is east-facing, accounting for 18.25%.

Quantitative distribution of different house structures

It can be seen from the scatter plot that there is a positive correlation between the house area and the house price, and the Pearson correlation coefficient calculated is 0.67, which is a strong correlation.

The code is as follows:

# add track fig = px.scatter (df_all, housing area, yearly housing area price') fig.update_layout (relationship between title=' housing area and house price (ten thousand yuan)') py.offline.plot (fig, relationship between house area and house price .html')

The relationship between the number of bedrooms and house prices

The living room and the bedroom are reflected in the area of the house, the more the living room, the higher the total price of the house.

The code is as follows:

# merge df_all ['halls'] = [i if i

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report