In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to use Python to crawl and analyze pull-hook position data". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to use Python crawl to analyze pull-hook position data".
At present, we all admit that there is a lot of value in the data waiting to be excavated.
But for us personally, how can we use this to create value for ourselves?
The first thing that stands in front of this is that there must be data.
For individuals, crawlers are a very traversal way to get data. After getting the data, we can do some data analysis and statistics, and then use it to guide our study, work and sideline direction.
For example, if you want to find a job related to Python, you can climb the list of Python positions and do statistical analysis. For example, you can get the following information:
What are the industry areas, company size, financing stage, educational requirements and work experience of these companies that are looking for Python positions? What is the salary distribution of recruitment Python positions? How is the salary distribution different from that of other positions? What are the key words or skill requirements of the Python position itself?
Of course, the position keyword is not only Python, but also Java, big data, recommendation algorithm and so on. All of them can be used for data mining to guide their own macro control, learning and effort direction of the position.
This article introduces how to crawl Python positions in Beijing with summary information and simple data analysis.
Confirm the target to be crawled
The target to be crawled this time is pull hook > Beijing Railway Station > Python position:
First, go to the home page of the pull hook, switch to Beijing Railway Station, and enter python to search:
The goal this time is to crawl the summary information of the job list:
Analysis of crawling method
Click the paging button at the bottom of the page and find that URL has not changed. It has always been https://www.lagou.com/jobs/list_python/p-city_2?&cl=false&fromSearch=true&labelWords=&suginput=.
Open the viewing element and find that the data on the page comes from an ajax request and returns json.
And this request is a POST request:
After an attempt, you can get the data by directly requesting the Url of the json. However, the pull hook has made a strong anti-crawling measure, so you need to pay attention to the following:
To request this json, you need to attach a cookie, which can be obtained by requesting the list page first. After each request, you can sleep for a few seconds to prevent the prohibited IP code implementation from setting up the url for getting the cookie and the Url for submitting the post, and copying the headers from the browser to crawl the list page to extract position information. This is a JSON, which is similar to accessing the Python dictionary to view the result data analysis using pandas to load data financing phase distribution.
Companies that need more Python positions: companies that do not need financing, listed companies, companies that need fewer Python positions in round A: Angel wheel, round C, D-round or above
Distribution of company size
Companies with 50,150 employees need the most Python positions, followed by large companies with more than 2000 employees.
Distribution of salary
Because the salary is a range, it is processed to view only the low salary in the range as a reference value:
Positions with a maximum distribution of 15K and 20K
The relationship between financing stage and salary
It is more convenient to use seaborn
The poorest are the companies of Angel Wheel and Round C, which do not accept the argument.
Thank you for your reading, the above is the content of "how to use Python to crawl and analyze pull-hook position data". After the study of this article, I believe you have a deeper understanding of how to use Python to crawl and analyze pull-hook position data, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.