How does Python crawl recruitment information 10/21 Update SLTechnology News&Howtos

How does Python crawl recruitment information

2025-10-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is about Python how to crawl recruitment information, the editor thinks it is very practical, so share it with you to learn, I hope you can get something after reading this article, say no more, follow the editor to have a look.

1. target

Crawl all position information

Job title

Position url

Job type

Number of posts

Work location

Release time

two。 Website structure analysis

3. Write crawler programs

3.1. Configure the target variables to be crawled

For novice rookies who want to more easily learn the basics of Python, Python crawler, web development, big data, data analysis, artificial intelligence and other technologies, here to share system teaching resources Add me V:itz992 [tutorials / tools / methods / troubleshooting] class TecentjobItem (scrapy.Item): # define the fields for your item here like: positionname = scrapy.Field () positionlink = scrapy.Field () positionType = scrapy.Field () peopleNum = scrapy.Field () workLocation = scrapy.Field () publishTime = scrapy.Field ()

3.2. Write crawler file scrapy

#-*-coding: utf-8-*-import scrapyfrom tecentJob.items import TecentjobItemclass TencentSpider (scrapy.Spider): name = 'tencent' allowed_domains = [' tencent.com'] url = 'https://hr.tencent.com/position.php?&start=' offset = 0 start_urls = [url + str (offset)] def parse (self) Response): for each in response.xpath ("/ / tr [@ class = 'even'] | / / tr [@ class =' odd']"): # initialize the model object item = TecentjobItem () item ['positionname'] = each.xpath (". / td [1] / a/text ()"). Extract () [0] item [' positionlink'] = each .XPath (". / td [1] / a/@href"). Extract () [0] item ['positionType'] = each.xpath (". / td [2] / text ()"). Extract () [0] item [' peopleNum'] = each.xpath (". / td [3] / text ()"). Extract () [0] item ['workLocation'] = each.xpath (". / td [ 4] / text () "). Extract () [0] item ['publishTime'] = each.xpath (". / td [5] / text () "). Extract () [0] yield item if self.offset < 100: self.offset + = 10 # send the request rewrite to the scheduler to queue, Get out of the queue and give it to the downloader to download # stitching the new rurl And call back the parse function to handle response # yield scrapy.Request (url, callback=self.parse) yield scrapy.Request (self.url + str (self.offset), callback=self.parse)

3.3. Write the pipe files needed for yield

Import jsonclass TecentjobPipeline (object): def _ init__ (self): self.filename = open ("tencent.json", 'wb') def process_item (self, item, spider): text = json.dumps (dict (item), ensure_ascii=False) + "\ n" self.filename.write (text.encode (' utf-8')) return item def close_spider (self, spider): self.filename.close ()

3.4. Configure request header information in setting

DEFAULT_REQUEST_HEADERS = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36', 'Accept':' text/html,application/xhtml+xml,application/xml;q=0.9,*/* For novice rookies who want to learn the basics of Python, Python crawler, web development, big data, data analysis, artificial intelligence and other technologies more easily, add my V:itz992 [tutorials / tools / methods / answers] 4. Here we share the system teaching resources. The final result

The above is how Python crawls the recruitment information, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.