How to analyze Celery in Python parallel distributed Framework 07/12 Update SLTechnology News&Howtos

How to analyze Celery in Python parallel distributed Framework

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to analyze Celery in Python parallel distributed framework, I believe that many inexperienced people are helpless about this, so this article summarizes the causes and solutions of the problem, through this article I hope you can solve this problem.

Celery is a distributed task queue based on Python development. It supports task scheduling on distributed machines/processes/threads using task queues.

architecture design

Celery's architecture consists of three parts: message broker, worker, and task result store.

message middleware

Celery does not provide messaging services per se, but can be easily integrated with messaging middleware provided by third parties. RabbitMQ, Redis, MongoDB (experimental), Amazon SQS (experimental),CouchDB (experimental), SQLAlchemy (experimental),Django ORM (experimental), IronMQ

task execution unit

Worker is the unit of task execution provided by Celery, and worker runs concurrently in distributed system nodes.

task result storage

Task result store is used to store the results of tasks performed by workers. Celery supports storing task results in different ways, including AMQP, Redis, memcached, MongoDB, SQLAlchemy, Django ORM, Apache Cassandra, IronCache.

Celery also supports different concurrency and serialization methods.

concurrent

Prefork, Eventlet, gevent, threads/single threaded

serialization

pickle, json, yaml, msgpack. zlib, bzip2 compression, Cryptographic message signing, etc.

install and run

The Celery installation process is slightly more complicated. The following installation process is based on the Linux version of my AWS EC2 installation process. The installation process may vary from system to system. You can refer to official documents.

First I chose RabbitMQ as my messaging middleware, so I installed RabbitMQ first. As a preparation for installation, update YUM.

sudo yum -y update

RabbitMQ is based on erlang, so install erlang first.

# Add and enable relevant application repositories:# Note: We are also enabling third party remi package repositories.wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpmwget http://rpms.famillecollet.com/enterprise/remi-release-6.rpmsudo rpm -Uvh remi-release-6*.rpm epel-release-6*.rpm# Finally, download and install Erlang:yum install -y erlang

Then install RabbitMQ

# Download the latest RabbitMQ package using wget:wget # Add the necessary keys for verification:rpm --import # Install the .RPM package using YUM:yum install rabbitmq-server-3.2.2-1.noarch.rpm

Start RabbitMQ service

rabbitmq-server start

RabbitMQ service is ready, then install Celery, assuming you use pip to manage your python installation package

pip install Celery

To test whether Celery works, we run the simplest task, writing tasks.py

from celery import Celeryapp = Celery('tasks', backend='amqp', broker='amqp://guest@localhost//')app.conf.CELERY_RESULT_BACKEND = 'db+sqlite:///results.sqlite'@app.taskdef add(x, y): return x + y

Run a worker in the current directory to perform the task of the addition

celery -A tasks worker --loglevel=info

The-A parameter represents the name of the Celery App. Note that here I am using SQLAlchemy as the result store. Python packages must be installed beforehand.

In the worker log, we will see such information.

- ** ---------- [config]- ** ---------- .> app: tasks:0x1e68d50- ** ---------- .> transport: amqp://guest:**@localhost:5672//- ** ---------- .> results: db+sqlite:///results.sqlite- *** --- * --- .> concurrency: 8 (prefork)

Among them, we can see that worker defaults to using prefork to perform concurrency, and sets the number of concurrency to 8.

The following tasks execute client code:

from tasks import addimport timeresult = add.delay(4,4)while not result.ready(): print "not ready yet" time.sleep(5)print result.get()

Executing this client-side code in python, on the client side, results as follows

not ready 8

Work log display

[2015-03-12 02:54:07,973: INFO/MainProcess] Received task: tasks.add[34c4210f-1bc5-420f-a421-1500361b914f][2015-03-12 02:54:08,006: INFO/MainProcess] Task tasks.add[34c4210f-1bc5-420f-a421-1500361b914f] succeeded in 0.0309705100954s: 8

Here we can see that each task has a unique ID, and tasks are executed asynchronously on workers.

Note here that if you run the examples in the official documentation, you won't get results on the client side, which is why I use SQLAlchemy to store task execution results. The official example uses AMPQ. It is possible that the Worker takes out the task execution result and displays it in the worker log when printing the log. However, AMPQ acts as a message queue. When the message is taken away, there is no queue in the queue, so the client always cannot get the task execution result. I don't know why the official documentation turned a blind eye to such errors.

After reading the above, do you know how to analyze Celery in Python parallel distributed framework? If you still want to learn more skills or want to know more related content, welcome to pay attention to the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.