In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article is about Python how to crawl recruitment information, the editor thinks it is very practical, so share it with you to learn, I hope you can get something after reading this article, say no more, follow the editor to have a look.
1. target
Crawl all position information
Job title
Position url
Job type
Number of posts
Work location
Release time
two。 Website structure analysis
3. Write crawler programs
3.1. Configure the target variables to be crawled
For novice rookies who want to more easily learn the basics of Python, Python crawler, web development, big data, data analysis, artificial intelligence and other technologies, here to share system teaching resources Add me V:itz992 [tutorials / tools / methods / troubleshooting] class TecentjobItem (scrapy.Item): # define the fields for your item here like: positionname = scrapy.Field () positionlink = scrapy.Field () positionType = scrapy.Field () peopleNum = scrapy.Field () workLocation = scrapy.Field () publishTime = scrapy.Field ()
3.2. Write crawler file scrapy
#-*-coding: utf-8-*-import scrapyfrom tecentJob.items import TecentjobItemclass TencentSpider (scrapy.Spider): name = 'tencent' allowed_domains = [' tencent.com'] url = 'https://hr.tencent.com/position.php?&start=' offset = 0 start_urls = [url + str (offset)] def parse (self) Response): for each in response.xpath ("/ / tr [@ class = 'even'] | / / tr [@ class =' odd']"): # initialize the model object item = TecentjobItem () item ['positionname'] = each.xpath (". / td [1] / a/text ()"). Extract () [0] item [' positionlink'] = each .XPath (". / td [1] / a/@href"). Extract () [0] item ['positionType'] = each.xpath (". / td [2] / text ()"). Extract () [0] item [' peopleNum'] = each.xpath (". / td [3] / text ()"). Extract () [0] item ['workLocation'] = each.xpath (". / td [ 4] / text () "). Extract () [0] item ['publishTime'] = each.xpath (". / td [5] / text () "). Extract () [0] yield item if self.offset < 100: self.offset + = 10 # send the request rewrite to the scheduler to queue, Get out of the queue and give it to the downloader to download # stitching the new rurl And call back the parse function to handle response # yield scrapy.Request (url, callback=self.parse) yield scrapy.Request (self.url + str (self.offset), callback=self.parse)
3.3. Write the pipe files needed for yield
Import jsonclass TecentjobPipeline (object): def _ init__ (self): self.filename = open ("tencent.json", 'wb') def process_item (self, item, spider): text = json.dumps (dict (item), ensure_ascii=False) + "\ n" self.filename.write (text.encode (' utf-8')) return item def close_spider (self, spider): self.filename.close ()
3.4. Configure request header information in setting
DEFAULT_REQUEST_HEADERS = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36', 'Accept':' text/html,application/xhtml+xml,application/xml;q=0.9,*/* For novice rookies who want to learn the basics of Python, Python crawler, web development, big data, data analysis, artificial intelligence and other technologies more easily, add my V:itz992 [tutorials / tools / methods / answers] 4. Here we share the system teaching resources. The final result
The above is how Python crawls the recruitment information, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.