Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the segmentation techniques of Python

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the knowledge about "Python's segmentation technology." In the actual case operation process, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!

List segmentation

Regardless of memory usage, we perform a split on the large task above. For example, if we split a large task into smaller tasks, we can access up to 5 URLs per second.

import os import time CURRENT_DIR = os.path.dirname(os.path.abspath(__file__)) def read_file(): file_path = os.path.join(CURRENT_DIR, "url_list.txt") with open(file_path, "r", encoding="utf-8") as fs: result = [i.strip() for i in fs.readlines()] return result def fetch(url): print(url) def run(): max_count = 5 url_list = read_file() for index in range(0, len(url_list), max_count): start = time.time() fetch(url_list[index:index + max_count]) end = time.time() - start if end < 1: time.sleep(1 - end) if __name__ == '__main__': run()

The key code is in the for loop. First, we declare the third parameter of range, which specifies that the iteration step is 5, so that each index increment is based on 5, i.e. 0, 5, 10.

Then we slice the url_list, taking five elements at a time, these five elements will change continuously with the increase of index, if the last five are not enough, according to the characteristics of the slice, how many will be taken at this time, will not cause the index superindex problem.

As the url list grows, we'll notice that memory usage increases. At this time we need to modify the code, we know that the generator is more memory saving space, after the modification of the code becomes, the following is the case.

generator segmentation

# -*- coding: utf-8-*- # @ Time: 2019-11-23 23:47 # @ Author: Chen Xiangan # @ File name: g.py # @ Public number: Python learning development import os import time from itertools import islice CURRENT_DIR = os.path.dirname(os.path.abspath(__file__)) def read_file(): file_path = os.path.join(CURRENT_DIR, "url_list.txt") with open(file_path, "r", encoding="utf-8") as fs: for i in fs: yield i.strip() def fetch(url): print(url) def run(): max_count = 5 url_gen = read_file() while True: url_list = list(islice(url_gen, 0, max_count)) if not url_list: break start = time.time() fetch(url_list) end = time.time() - start if end < 1: time.sleep(1 - end) if __name__ == '__main__': run()

First, we changed the way files are read from a read list to a generator. This way we save a lot of memory when we call the file fetch method.

Then it is to modify the above for loop, because of the characteristics of the generator, it is not suitable to use for iteration, because each iteration will consume the elements of the generator, by using the islice of itertools to split url_gen,islice are the slices of the generator, here we split the generator with 5 elements each time, because the generator does not have the__len_method, so we turn it into a list, and then determine whether the list is empty, You know if it's time for the iteration to end.

The modified code greatly improves both performance and memory savings. Reading tens of millions of files is not a problem.

In addition, asynchronous generator slices may be used when using asynchronous crawlers. The following discussion and you, asynchronous generator segmentation problem

asynchronous generator segmentation

Let's start with a simple asynchronous generator.

We know that calling the following code will result in a generator

def foo(): for i in range(20): yield i

If you add async to def, it is an asynchronous generator when called.

The complete example code is as follows

import asyncio async def foo(): for i in range(20): yield i async def run(): async_gen = foo() async for i in async_gen: print(i) if __name__ == '__main__': asyncio.run(run())

About async for splitting is a bit complicated, here we recommend using aiostream module, after using the code changed to the following

import asyncio from aiostream import stream async def foo(): for i in range(22): yield i async def run(): index = 0 limit = 5 while True: xs = stream.iterate(foo()) ys = xs[index:index + limit] t = await stream.list(ys) if not t: break print(t) index += limit if __name__ == '__main__': asyncio.run (run())"What are Python's slicing techniques?" The content is introduced here. Thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report