How to make Python crawl the data of recruitment website and do data visualization 02/14 Update SLTechnology News&Howtos

How to make Python crawl the data of recruitment website and do data visualization

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

How to let Python crawl recruitment website data and do data visualization processing, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

Basic development environment

Python 3.6

Pycharm

Use of related modules

Crawler module

Import requestsimport reimport parselimport csv

Word cloud module

Import jiebaimport wordcloud

Target web page analysis

'https://jobs.51job.com/beijing-ftq/127676506.html?s=01&t=0'

The details page of each recruitment information has a corresponding ID. You only need regular matching to extract the ID value, splice the URL, and then go to the recruitment details page to extract recruitment data.

Response = requests.get (url=url, headers=headers) lis = re.findall ('"jobid": "(\ d +)", response.text) for li in lis: page_url = 'https://jobs.51job.com/beijing-hdq/{}.html?s=01&t=0'.format(li)

Although the website is a static web page, the page coding is garbled and needs to be transcoded in the process of crawling.

F = open ('recruitment .csv', mode='a', encoding='utf-8', newline='') csv_writer = csv.DictWriter (f, fieldnames= ['title', 'region', 'work experience', 'education', 'salary', 'benefits', 'recruitment', 'date of release']) csv_writer.writeheader () response = requests.get (url=page_url Headers=headers) response.encoding = response.apparent_encodingselector = parsel.Selector (response.text) title = selector.css ('.cn H2 div.cn strong::text' div.cn strong::text''). Get () # title salary = selector.css ('div.cn strong::text'). Get () # salary welfare = selector.css ('. Jtag div.t1 span::text'). Getall () # Welfare welfare_info ='| 'join (welfare) data_info = Selector.css ('.cn p.msg.ltype::attr (title)'). Get (). Split ('|') area = data_info [0] # area work_experience = data_info [1] # work experience educational_background = data_info [2] # academic qualifications number_of_people = data_info [3] # recruitment release_date = data_info [- 1] .replace ('publish' '') # release date all_info_list = selector.css ('div.tCompany_main > div:nth-child (1) > div p span::text'). Getall () all_info ='\ n'.join (all_info_list) dit = {'title': title, 'region': area, 'work experience': work_experience, 'education': educational_background, 'salary': salary 'benefits': welfare_info, 'recruitment': number_of_people, 'release date': release_date,} csv_writer.writerow (dit) with open ('recruitment information .txt', mode='a', encoding='utf-8') as f: f.write (all_info)

The above steps can complete the crawling of the relevant data about recruitment.

Simple and rough data cleaning

Salary and treatment

Content = pd.read_csv (rudder:\ python\ demo\ data Analysis\ recruitment\ recruitment .csv', encoding='utf-8') salary = content ['salary'] salary_1 = salary [salary.notnull ()] salary_count = pd.value_counts (salary_1)

Educational requirements

Content = pd.read_csv (rudder:\ python\ demo\ data Analysis\ recruitment\ recruitment .csv', encoding='utf-8') educational_background = content ['academic'] educational_background_1 = educational_ background [qualifications _ background.notnull ()] educational_background_count = pd.value_counts (educational_background_1). Head () print (educational_background_count) bar = Bar () bar.add_xaxis (educational_background_count.index.tolist ()) bar.add_yaxis ("academic qualifications") Educational_background_count.values.tolist () bar.render ('bar.html')

Show that the number of recruits is not required.

Work experience

Content = pd.read_csv (rudder:\ python\ demo\ data Analysis\ recruitment\ recruitment .csv', encoding='utf-8') work_experience = content ['work experience'] work_experience_count = pd.value_counts (work_experience) print (work_experience_count) bar = Bar () bar.add_xaxis (work_experience_count.index.tolist ()) bar.add_yaxis ("experience requirements" Work_experience_count.values.tolist () bar.render ('bar.html')

Word cloud analysis, technical point requirements

Py = imageio.imread ("python.png") f = open ('python recruitment information .txt', encoding='utf-8') re_txt = f.read () result = re.findall (r'[a-zA-Z] +', re_txt) txt = '.join (result) # jiabe participle word txt_list = jieba.lcut (txt) string =' .join (txt_list) # word Cloud set wc = wordcloud.WordCloud (width=1000) # wide height=700 of the picture, # High background_color='white', of the picture # Picture background color font_path='msyh.ttc', # word cloud font mask=py, # word cloud picture scale=15, stopwords= {'}, # contour_width=5 # contour_color='red' # outline color) # enter text wc.generate (string) to word cloud # word cloud save picture address wc.to_file (r'python recruitment information .png')

Summary:

The data analysis is really rough and really hot.

After reading the above, do you know how to make Python crawl the recruitment website data and do the data visualization? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.