How to use Python for practical thread programming 04/01 Update SLTechnology News&Howtos

How to use Python for practical thread programming

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how to use Python for practical thread programming". The editor shows you the operation process through an actual case. The operation method is simple, fast and practical. I hope this article "how to use Python for practical thread programming" can help you solve the problem.

Introduction

It is important to first define the difference between a process and a thread. Threads differ from processes in that they share state, memory, and resources. This simple difference is both an advantage and a disadvantage for threads. On the one hand, threads are lightweight and easy to communicate, but on the other hand, they bring a series of problems, including deadlocks, race conditions, and pure complexity. Fortunately, threads in Python are much easier to implement than other languages because of GIL and queuing modules.

Hello, Python thread.

Next, I assume that you have installed Python2.5 or later, because many of the examples will use the updated features of the Python language, which appear at least in Python2.5. To start using threads in Python, we'll start with a simple "Hello World" example:

Listing 1. Hello_threads_example import threading import datetime class ThreadClass (threading.Thread): def run (self): now = datetime.datetime.now () print "% s says Hello World at time:% s"% (self.getName () Now) for i in range (2): t = ThreadClass () t.start ()

If you run this example, you will get the following output:

# python hello_threads.py Thread-1 says Hello World at time: 2008 Muy05Mui 13 13 22purl 50.252069 Thread-2 says Hello World at time: 2008 Murray 05Mui 13 1322 Purse 50.252576

Looking at this output, you can see that you have received Hello World statements from two date-stamped threads. If you look at the actual code, you will find that there are two import statements; one imports the datetime module and the other imports the threading module. This class ThreadClass inherits from threading.Thread, so you need to define a run method to execute the code you run in the thread. The only important thing to note in the run method self.getName () is that the method will identify the name of the thread.

The last three lines of code actually call the class and start the thread. If you notice that t.start () actually starts the thread. Threading modules are designed with inheritance in mind and are actually based on lower-level threading modules. In most cases, inheritance from threading.Thread is considered a best practice because it creates a very natural API for thread programming.

Use queues with threads

As I mentioned earlier, threading can be complex when threads need to share data or resources. The threading module does provide many synchronization primitives, including semaphores, condition variables, events, and locks. Although these options exist, it is best to focus on using queues. Queues are easier to handle and make thread programming safer because they effectively centralize all access to resources to a single thread and allow clearer and more readable design patterns.

In the next example, you will first create a program that takes the URL of the site one by one or one after another and prints out the first 1024 bytes of the page. This is a classic example of how threads can be used to accomplish something faster. First, let's use the urllib2 module to crawl these pages at once and time the code:

Listing 2. URL get sequence import urllib2 import time hosts = "http://yahoo.com"," http://google.com", "http://amazon.com"," http://ibm.com", "http://apple.com" start = time.time () # grabs urls of hosts and prints first 1024 bytes of page for host in hosts: url = urllib2.urlopen (host) print url.read (1024) print" Elapsed Time:% s "% (time.time ()-start)

When you run it, you get a lot of output to standard output because the page is partially printed. But you'll get this at the end of the day:

Elapsed Time: 2.40353488922

Let's take a look at this code a little bit. You only import two modules. First of all, the urllib2 module is something that takes on heavy responsibilities and crawls web pages. Second, you create a start time value by calling time.time (), and then call it again and subtract the initial value to determine the time required for the program to execute. Finally, in terms of the speed of the program, the result of "two and a half seconds" is not scary, but if you have hundreds of pages to retrieve, considering the current average, it takes about 50 seconds. See how creating a threaded version speeds up:

Listing 3. URL acquisition thread #! / usr/bin/env python import Queue import threading import urllib2 import time hosts = "http://yahoo.com"," http://google.com", "http://amazon.com"," http://ibm.com", "http://apple.com" queue = Queue.Queue () class ThreadUrl (threading.Thread):"Threaded Url Grab" def init (self) Queue): threading.Thread.init (self) self.queue = queue def run (self): while True: # grabs host from queue host = self.queue.get () # grabs urls of hosts and prints first 1024 bytes of page url = Urllib2.urlopen (host) print url.read (1024) # signals to queue job is done self.queue.task_done () start = time.time () def main () # spawn a pool of threads And pass them queue instance for i in range (5): t = ThreadUrl (queue) t.setDaemon (True) t.start () # populate queue with data for host in hosts: queue.put (host) # wait on the queue until everything Has been processed queue.join () main () print "Elapsed Time:% s"% (time.time ()-start)

There is more code to explain in this example, but due to the use of the queuing module, it is not much more complex than the first thread example. This pattern is a very common and recommended way to use threads in Python. The steps are described as follows:

Create an instance of Queue.Queue () and populate it with data.

Pass an instance of the populated data to the Thread class that is inherited from threading.Thread.

Generate a daemon thread pool.

Pull one item at a time from the queue and use that data (that is, the run method) inside the thread to do the job.

When the work is done, send a signal to the queue.task_done () queue that the task has been completed.

Join the queue, which actually means waiting for the queue to be empty and then exiting the main program.

Note about this mode: by setting the daemon thread to true, it allows the main thread or program to exit when only the daemon thread is active. This creates an easy way to control the flow of the program, because you can join the queue before exiting, or wait until the queue is empty. The exact process is best described in the documentation of the queue module, as shown in the resources section on the right:

Join ()

Block until all items in the queue are fetched and processed. Each time an item is added to the queue, the count of unfinished tasks increases. Each time the consumer thread calls task_done () to indicate that the item has been retrieved and all work on it has been completed, the count of unfinished tasks drops. When the number of unfinished tasks drops to 00:00, join () unlocks. Use multiple queues

Because the pattern demonstrated above is very effective, it is relatively simple to extend it by linking an additional thread pool to the queue. In the above example, you are just printing out the first part of the page. The next example returns the entire web page crawled by each thread and puts it in another queue. Then set up another thread pool to join the second queue and work on the web page. The work performed in this example involves parsing a web page using a third-party Python module named Beautiful Soup. Using only a few lines of code, using this module, you will extract the title tag and print it for each page you visit.

Listing 4. Multi-queue data mining website import Queueimport threadingimport urllib2import timefrom BeautifulSoup import BeautifulSouphosts = "http://yahoo.com"," http://google.com", "http://amazon.com"," http://ibm.com", "http://apple.com"queue = Queue.Queue () outqueue = Queue.Queue () class ThreadUrl (threading.Thread):"Threaded Url Grab" def init (self, queue) Outqueue): threading.Thread.init (self) self.queue = queue self.outqueue = outqueue def run (self): while True: # grabs host from queue host = self.queue.get () # grabs urls of hosts and then grabs chunk of webpage url = urllib2.urlopen (host) chunk = url.read () # place chunk into out queue self.out_queue.put (chunk) # signals to queue job is done self.queue.task_done () class DatamineThread (threading.Thread): "Threaded Url Grab" def _ _ init (self) Out_queue): threading.Thread.__init (self) self.out_queue = out_queue def run (self): while True: # grabs host from queue chunk = self.out_queue.get () # parse the chunk soup = BeautifulSoup (chunk) print soup.findAll (['title']) # signals to Queue job is done self.out_queue.task_done () start = time.time () def main (): # spawn a pool of threads And pass them queue instance for i in range (5): t = ThreadUrl (queue Out_queue) t.setDaemon (True) t.start () # populate queue with data for host in hosts: queue.put (host) for i in range (5): dt = DatamineThread (out_queue) dt.setDaemon (True) dt.start () # wait on the queue until everything has been processed queue.join () out_queue.join () main () print "Elapsed Time:% s "% (time.time ()-start)

If you run this version of the script, you will get the following output:

# python url_fetch_threaded_part2.py Google Yahoo! Apple IBM United States Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more Elapsed Time: 3.75387597084

When you look at the code, you can see that we added another queue instance, and then passed that queue to the first thread pool class, ThreadURL. Next, you copy almost the exact same structure DatamineThread for the next thread pool class. In the run method of this class, grab the web page from the queue of each thread, chunk, and then process the chunk with Beautiful Soup. In this case, you can use Beautiful Soup to simply extract the title tag from each page and print it out. This example can easily become something more useful because you have the core of a basic search engine or data mining tool. One idea is to use Beautiful Soup to extract links from each page and then follow them.

This is the end of the introduction to "how to use Python for practical thread programming". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.