In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
In this issue, the editor will bring you about how to use Python to analyze the price of second-hand housing in 2019. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
Use the climb to the data for data analysis.
In the article, pandas, seaborn, Matplotlib and other tools are used, and the analysis tools are violin diagram, box chart, scatter diagram and so on.
Descriptive analysis
First of all, import all kinds of required libraries to facilitate all subsequent operations, and read the data table to directly describe the situation.
Import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import warnings # this part is the super parameter pre-setting sns.set (style='darkgrid') plt.rcParams ['font.family'] =' Arial Unicode MS' plt.rcParams ['axes.unicode_minus'] = False warnings.filterwarnings (' ignore') data = pd.read_csv ('Lianjia's new house .csv') data.describe ()
The most expensive and cheapest
From the output table above, we can see that the preliminary conclusions are as follows:
The minimum area of these second-hand houses is 9.6square meters, the largest is 718square meters, the cheapest is 560000, and the most expensive is 52 million. The area is about 59-102square meters, and the price is about 325-6.3 million. After reading the preliminary information, there is an impression. The following is a detailed analysis.
First of all, I am very interested in this 9.6 flat house, extract it to have a look, but run the following code to see, CBD core area, villa, 9.64ping, 56W, estimated to be sold from the toilet.
Give it up. Skip him and continue the analysis.
Data.min ()
The most expensive one is the townhouse on Gulou Street (on the edge of the second Ring Road), with a price of 52 million. Emmm
Data.max ()
Price Distribution & A rough View of area Distribution
Now I want to take an intuitive look at the price distribution. As you can see from the picture below, the price is mainly concentrated within 10 million.
Sns.distplot (data ['money'] .dropna ())
Looking at the area in the same way, we can see that the area of these second-hand houses is mainly concentrated in about 100 square meters.
Sns.distplot (data ['area'] .dropna ())
In fact, you can also look at the two pictures together here, and the code is as follows: (both a little to the right)
Fig, ax= plt.subplots (1, 2) # 2 subregions sns.distplot (data ['Qian'], ax=ax [0]) sns.distplot (data ['area'], ax=ax [1]) plt.show ()
Look at the price carefully.
To make a box chart of the price, it is obvious that the points above the horizontal line of 10 million are all data other than reasonable data.
Sns.boxplot (data=data ['money'])
So what are the reasonable data? You can refer to the following code
Mean, std = data ['money'] .mean (), data ['money'] .std () # get the upper and lower bounds lower, upper = mean-3*std, mean+3*std print ('mean', mean) print ('standard deviation', std) print ('lower bound', lower) print ('upper limit', upper)
What you can see from the print result is that the standard deviation is concentrated at 3.58 million, and the reasonable upper limit is 16.13 million. The guide to reality is: if there are 3.58 million, it is enough to buy a house, and more than 16.13 million of the houses will be fooled.
Average 538.44
Standard deviation 358.47
Lower limit-536.9763753150206
Upper limit 1613.8755022458467
20 sets with the lowest price
Through this code, you can see where these houses are distributed.
Conclusion see the screenshot below the code, if you are familiar with Beijing, you can see that these houses are mainly distributed outside the 5th Ring Road, some in Shunyi, Changping, Mentougou and other places.
T=data [['district', 'area', 'money'] .sort _ values ('money') display (t.iloc [: 20])
A detailed view of the area
In the same way, you can replace the "money" column with the "area" column. The average area is 89 flat, the standard deviation is 50 square, and the reasonable upper limit is 240 square.
Mean 89.8874210879787
Standard deviation 50.36697951495447
Lower limit-61.21351745688473
Upper limit 240.9883596328421
The information with the smallest area is as follows
Orientation and degree of decoration
Through the group display of the household direction, we can see that in Beijing, it is mainly north-south, and the east-west orientation is much lower.
Posit=data ['direction'] .value_counts () [: 10] display (posit)
There are four types of decoration: hardcover, paperback, blank, and others.
The architectural forms are: slab building, tower building, combination of slab tower and tower building, villa, etc.
What do these two dimensions have to do with price?
To analyze and analyze, make three pictures first:
Figure 1: the relationship between decoration status and price
Figure 2: decoration status & the relationship between architectural form and selling price
Figure 3: architectural form, together with the relationship between decoration status and price
Figure 4: architectural box diagram
Through the price distribution map of the decoration status, we can see that the fine decoration is concentrated at about 400 ±1 million, the simple package is a little cheaper, there are few second-hand semifinished houses, and there are many other forms, and the price is concentrated around 300-5 million.
After the demolition of the decoration state of the building form to make the box diagram as above, it is concluded that the combination of slab, tower and tower is the most, whether in hardcover, paperback or other unknown information.
As for the architectural form, as well as the decoration status and price relationship, we can see that no matter what type of architectural form, there are fine decoration, simple decoration, blank. The prices of slabs span between 1 million and 10 million and are concentrated between 300 and 6 million. The prices of the combination of slabs and towers are between 3.5 million and 7 million, and those of towers are between 380 and 7 million.
Preliminary conclusion, if you can get more than 3 million, finely decorated board buildings or towers are randomly selected.
But if you don't have so much money, you can have a choice of 50-3 million, but there are fewer options, but it's not that you don't have no choice.
Ask the question on the basis of this preliminary conclusion: I have xxx million, so how many flat houses can I buy?
Price area analysis
Group the area first, and the grouping function is as follows
Def value_to_level (area): if area > = 0 and area = 41 and area = 61 and area = 81 and area = 81 and area = 131 and area = 181 and area
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.