In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to grab second-hand housing price data through python". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to grab second-hand housing price data through python".
Module installation
Like the last new house, the following modules need to be installed here (of course, you don't have to install them if they have already been installed):
# install reference module pip3 install bs4 pip3 install requests pip3 install lxml pip3 install numpy pip3 install pandas
Well, after the installation is complete, you can start writing code. As for the code for configuring the request header and proxy IP address, it has already been said in the last introduction to the new house. I will not repeat it here, and the following code will be fetched directly.
Second-hand house price data object
Here we create the housing price information of second-hand housing into an object, and then as long as we save the obtained data as an object, it will be much more convenient to deal with. The SecHouse object code is as follows:
# second-hand housing information object class SecHouse (object): def _ init__ (self, district, area, name, price, desc, pic): self.district = district self.area = area self.price = price self.name = name self.desc = desc self.pic = pic def text (self): return self.district + "," +\ self.area + " "+\ self.name +", "+\ self.price +", "+\ self.desc +", "+\ self.pic"
Obtain second-hand house price information and save it
Ready, next we still take the shell as an example, batch crawl its Beijing area second-hand housing data, and save to the local. What I want to talk about here is how to fetch data, so it's still saved in the simplest txt text format. If you want to save to the database, you can modify the code to save the database.
Obtain district and county information
When we grab the second-hand housing information, we must want to know where the housing source is located, so here I write a method to capture the information of all districts and counties in Beijing and temporarily save it in the list variables for use in subsequent programs. The code is as follows:
# obtain district and county information def get_districts (): # request URL url = 'https://bj.ke.com/xiaoqu/' headers = create_headers () # request to obtain data response = requests.get (url, timeout=10 Headers=headers) html = response.content root = etree.HTML (html) # processing data elements = root.xpath ('/ / div [3] / div [1] / dl [2] / dd/div/div/a') en_names = list () ch_names = list () # Loop processing object for element in elements: link = element.attrib ['href'] en_ Names.append (link.split ('/') [- 2]) ch_names.append (element.text) # print list of English and Chinese names of districts and counties for index Name in enumerate (en_names): chinese_city_district_ naming [name] = ch_ namesindex [return en_names]
Acquire regional plate
In addition to the above to obtain district and county information, we should also obtain smaller plate regional information than districts and counties. In the same district and county, the price of second-hand housing in different plate areas must be different, so the plate is also very important to us. It has a reference value. The code for obtaining plate information is as follows:
# get all the plate information under a district and county def get_areas (district): # requested URL page = "http://bj.ke.com/xiaoqu/{0}".format(district) # Plate list definition areas = list () try: headers = create_headers () response = requests.get (page, timeout=10 Headers=headers) html = response.content root = etree.HTML (html) # get tag information links = root.xpath ('/ / div [3] / div [1] / dl [2] / dd/div/div [2] / a') # process for list for link in links: relative_link = link.attrib ['href'] # finally "/" remove relative_link = relative_link [:-1] # get the last section of information area = relative_link.split ("/") [- 1] # remove the name of district and county To prevent repetition of if area! = district: chinese_area = link.text chinese_area_ information [area] = chinese_area # to join the section information list areas.append (area) return areas except Exception as e: print (e)
Obtain second-hand housing information and save it
# create a file ready to write to with open ("sechouse.txt", "w") Encoding='utf-8') as f: # definition variable total_page = 1 # initialization list sec_house_list = list () # get all district and county information districts = get_districts () # cycle processing district and county for district in districts: # get all plate information under a district and county arealist = get_areas (district) # cycle through the residential second-hand housing information under all sections for area in arealist: # Chinese district and county chinese_district = chinese_city_district_dict.get (district ") # Chinese version chinese_area = chinese_area_dict.get (area,") # request address page = 'http://bj.ke.com/ershoufang/{0}/'.format(area) headers = create_headers () response = requests.get (page, timeout=10) Headers=headers) html = response.content # parse HTML soup = BeautifulSoup (html, "lxml") # get the total number of pages try: page_box = soup.find_all ('div', class_='page-box') [0] matches = re.search ('. * data-total-count= "(\ d+)". *' Str (page_box)) # get total pages total_page = int (math.ceil (int (matches.group (1)) / 10)) except Exception as e: print (e) print (total_page) # set request header headers = create_headers () # start from the first page Traverse to the last page for i in range (1, total_page + 1): # request address page = 'http://bj.ke.com/ershoufang/{0}/pg{1}'.format(area,i) print (page) # get the returned content response = requests.get (page, timeout=10) Headers=headers) html = response.content soup = BeautifulSoup (html, "lxml") # get the query list of second-hand houses house_elements = soup.find_all ('li' Class_= "clear") # iterate through each message for house_elem in house_elements: # Price price = house_elem.find ('div', class_= "totalPrice") # title name = house_elem.find (' div' Class_='title') # description desc = house_elem.find ('div', class_= "houseInfo") # picture address pic = house_elem.find (' averse, class_= "img") .find ('img' Class_= "lj-lazy") # cleaning data price = price.text.strip () name = name.text.replace ("\ n", ") desc = desc.text.replace ("\ n ") "). Strip () pic = pic.get ('data-original'). Strip () # Save the second-hand housing object sec_house = SecHouse (chinese_district, chinese_area, name, price, desc Pic) print (sec_house.text ()) sec_house_list.append (sec_house) # loop to write information to txt for sec_house in sec_house_list: f.write (sec_house.text () + "\ n")
At this point, the code is written, and now we can fetch the data by commanding python sechouse.py to run the code. You can view the crawled result by opening the sechouse.txt file in the current directory. The result is shown below:
Thank you for reading, the above is the content of "how to grab second-hand housing price data through python". After the study of this article, I believe you have a deeper understanding of how to grab second-hand housing price data through python, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.