In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "what are the advantages of Pulsar". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what are the advantages of Pulsar.
Basic knowledge of Kafka
Kafka is the king of messaging systems. It was founded by LinkedIn in 2011 and has been widely spread with the support of Confluent.
Confluent has released many new features and add-ons to the open source community, such as Schema Registry for schema evolution, Kafka Connect for easy streaming from other data sources, and so on.
Database to Kafka,Kafka Streams for distributed stream processing, recently using KSQL to execute SQL-like queries against Kafka topic, and so on.
Kafka is fast, easy to install, very popular, and can be used in a wide range of or use cases. From the developer's point of view, although Apache Kafka has always been friendly, it is a mess in terms of operation and maintenance.
So let's review some of the pain points of Kafka:
Kafka demo [2]
Many of Kakfa's pain points are as follows:
Extending Kafka is tricky because of the coupled architecture of broker and stored data. Stripping a broker means that it must copy topic partitions and replicas, which is very time-consuming.
A local multi-tenant that is not completely isolated from the tenant.
Storage can become very expensive, and although data can be stored for a long time, it is rarely used because of cost.
In case the copy is out of sync, the message may be lost.
The number of broker, topic, partitions, and replicas must be planned and calculated in advance (ensuring planned future usage growth) to avoid scaling issues, which is very difficult.
If only a messaging system is needed, using offsets can be complex.
Cluster rebalancing affects the performance of connected producers and consumers.
MirrorMaker [3] there is a problem with Geo replication mechanism. Companies like Uber have created their own solutions to overcome these problems.
As you can see, most of the problems are related to operation and maintenance. Although it is relatively easy to install, Kafka is difficult to manage and tune. Moreover, it also lacks the flexibility and flexibility it should have.
Basic knowledge of Pulsar
Pulsar by Yahoo! Founded in 2013 and donated to the Apache Foundation in 2016. Pulsar is now a top-level project of the Apache Software Foundation.
Yahoo!, Verizon, Twitter and other companies have used it to process thousands of messages in production. It has the characteristics of low operating cost and flexibility. Pulsar is designed to solve most of the challenges of Kafka and make it easier to extend.
Pulsar is very flexible: it can be applied to both distributed logging application scenarios such as Kafka and pure messaging system scenarios such as RabbitMQ.
It supports multiple types of subscriptions, multiple delivery guarantees, retention policies, and ways to handle schema evolution, among many other features.
Pulsar architecture diagram [4]
Pulsar has the following features:
Built-in multi-tenancy, different teams can use the same cluster and isolate it, solving many management problems. It supports isolation, authentication, authorization, and quotas.
Multi-tier architecture: Pulsar stores all topic data in a professional data tier supported by Apache BookKeeper.
The separation of storage and messaging solves many of the problems of expanding, rebalancing, and maintaining clusters. It also improves reliability and makes it almost impossible to lose data.
In addition, the BookKeeper can be directly connected when reading data without affecting real-time uptake. For example, you can use Presto to perform SQL queries on topic, similar to KSQL, but without affecting real-time data processing.
Virtual topic: due to the n-tier architecture, there is no limit to the number of topic, topic and its storage are separate. Users can also create non-persistent topic.
N-tier storage: one of the problems with Kafka is that storage costs can become higher. As a result, it is rarely used to store "cold" data, and messages are often deleted, Apache Pulsar can automatically offload old data to Amazon S3 or other data storage systems with tiered storage, and still show a transparent view to clients; Pulsar clients can read from time nodes as if all messages were in the log.
Pulsar Function: easy to deploy, lightweight computing process, developer-friendly API without running your own stream processing engine (such as Kafka).
Security: it has built-in proxies, multi-tenant security, pluggable authentication, and so on.
Quick rebalancing: partitions are divided into slices that are easy to rebalance.
Server-side deduplication and invalid fields: you do not need to do this in the client to remove duplicate data during compression.
Built-in Schema registry (Schema Registry): supports multiple policies and is easy to operate.
Geo-replication and built-in Discovery: it is easy to replicate clusters to multiple regions.
Integrated load balancer and Prometheus metrics.
Multiple integration: Kafka, RabbitMQ, etc.
Support for multiple programming languages, such as GoLang, Java, Scala, Node, Python... ...
Sharding and data partitioning are carried out transparently on the server side, and the client does not need to know about sharding and partitioning data.
Pulsar feature list [5]
Getting started with Pulsar
Getting started with Pulsar is very easy to use if you install JDK.
① downloads Pulsar and decompresses it (Note: the latest version of Apache Pulsar is 2.7.0):
$wget https://archive.apache.org/dist/pulsar/pulsar-2.6.1/apache-pulsar-2.6.1-bin.tar.gz
② download Connector (optional):
$wget https://archive.apache.org/dist/pulsar/pulsar-2.6.1/connectors/{connector}-2.6.1.nar
After ③ downloads the nar file, copy the file to the Connectors directory in the Pulsar directory.
④, start Pulsar!
$bin/pulsar standalone
Pulsar provides a CLI tool called Pulsar-Client that we can use to interact with the cluster.
Production message:
$bin/pulsar-client produce my-topic-messages "hello-pulsar"
Consumption message:
$bin/pulsar-client consume my-topic-s "first-subscription"
Example of Akka flow
To take a client-side example, we use Pulsar4s on Akka.
First, we need to create a Source to consume the data stream, and all we need is a function that creates the consumer on demand and looks up the message ID:
Val topic = Topic ("persistent://standalone/mytopic") val consumerFn = () = > client.consumer (ConsumerConfig (topic, subscription))
Then, we pass the ConsumerFn function to create the source:
Import com.sksamuel.pulsar4s.akka.streams._ val pulsarSource = source (consumerFn, Some (MessageId.earliest))
The materialized value of the Akka source is an instance of Control, which provides a "off" method that can be used to stop consuming messages. Now, we can use Akka Streams to process data as usual.
To create a receiver:
Val topic = Topic ("persistent://standalone/mytopic") val producerFn = () = > client.producer (ProducerConfig (topic)) import com.sksamuel.pulsar4s.akka.streams._ val pulsarSink = sink (producerFn)
The complete example is extracted from Pulsar4s [6]:
Object Example {import com.sksamuel.pulsar4s. {ConsumerConfig, MessageId, ProducerConfig, PulsarClient, Subscription Topic} import org.apache.pulsar.client.api.Schema implicit val system: ActorSystem = ActorSystem () implicit val materializer: ActorMaterializer = ActorMaterializer () implicit val schema: Sche ma [Array [byte]] = Schema.BYTES val client = PulsarClient ("pulsar://localhost:6650") val intopic = Topic ("persistent://sample/standalone/ns1/in") val outtopic = Topic ("persistent://sample/standalone/ns1/out") val consumerFn = () = > client.consumer (ConsumerConfig (topics = Seq (intopic)) SubscriptionName = Subscription ("mysub")) val producerFn = () = > client.producer (ProducerConfig (outtopic)) val control = source (consumerFn, Some (MessageId.earliest)) .map {consumerMessage = > ProducerMessage (consumerMessage.data)} .to (sink (producerFn)). Run () Thread.sleep (10000) control.stop ()}
Pulsar Function example
Pulsar Function processes messages from one or more topic, transforms them, and outputs the results to another topic:
Pulsar Function [7]
You can choose between two interfaces to write a function:
Language native interface: no specific Pulsar libraries or special dependencies are required; context cannot be accessed, only Java and Python are supported.
Pulsar Function SDK: can be used for Java/Python/ Go and provides more functionality, such as accessing context objects.
You only need to write a simple function to transform messages using the language native interface:
Def process (input): return "{}!" .format (input)
This simple function written in Python simply adds an exclamation point to all incoming strings and publishes the resulting string to topic.
Using SDK requires importing dependencies. For example, in Go, we can write:
Package main import ("context"fmt"github.com/apache/pulsar/pulsar-function-go/pf") func HandleRequest (ctx context.Context, in [] byte) error {fmt.Println (string (in) + "!") Return nil} func main () {pf.Start (HandleRequest)}
If you want to publish serverless functionality and deploy it to a cluster, you can use Pulsar-Admin CL;. If you use Python, we can write:
$bin/pulsar-admin functions create\-- py ~ / router.py\-- classname router.RoutingFunction\-- tenant public\-- namespace default\-- name route-fruit-veg\-- an important function of inputs persistent://public/default/basket-items Pulsar Function is that users can set the delivery guarantee when publishing the function: $bin/pulsar-admin functions create\-- name my-effectively-once-function\-- processing-guarantees EFFECTIVELY_ONCE.
There are the following options:
Advantages of Pulsar
Compared to Kafka, let's review the main advantages of Pulsar:
More features: Pulsar Function, multi-tenancy, Schema registry, n-tier storage, multiple consumption patterns, persistence patterns, etc.
Greater flexibility: with 3 subscription types (exclusive, sharing, and failover), users can manage multiple topic on a single subscription.
Persistence options: non-persistent (fast), persistent, compressed (only the last key per message), and the user can choose to deliver the guarantee. Pulsar has the characteristics of server-side deduplication and invalid word multiple retention policy and TTL.
There is no need to define extension requirements in advance.
Queue and stream message consumption models are supported, so Pulsar can replace either RabbitMQ or Kafka.
Storage is separated from broker, resulting in better scalability and faster and more reliable rebalancing.
Easy to operate and maintain: architecture decoupling and n-tier storage.
SQL integration with Presto allows you to query storage directly without affecting broker.
With the n-tier automatic storage option, it can be stored at a lower cost.
Faster: benchmark [8] shows better performance in a variety of situations. Pulsar has lower latency and better scalability.
Pulsar Function supports serverless computing without deployment management.
Integrate Schema registry.
Integrated load balancer and Prometheus metrics.
Geographic replication works better and is easier to set up. Pulsar has built-in Discover-ability.
There is no limit to the number of topic created.
Compatible with Kafka, easy to integrate.
Disadvantages of Pulsar
Pulsar is not perfect, and Pulsar has some problems:
There is a relative lack of support, documentation and cases.
The n-tier architecture leads to the need for more components: BookKeeper.
Plug-ins and clients are relatively rare in Kafka.
There is less support in the cloud, and Confluent has hosted cloud products.
However, the above situation is improving rapidly, and Pulsar is gradually being used by more and more companies and organizations.
Apache Pulsar business support company StreamNative has also launched that StreamNative Cloud,Apache Pulsar is growing rapidly, and we can all see gratifying changes.
Confluent has posted blogs comparing Pulsar and Kafka, but please note that these issues may be biased.
Pulsar usage scenario
Pulsar can be used in a wide range of scenarios:
Publish / subscribe queue messaging.
Distributed logs.
Event traceability for permanent event storage.
Micro services.
SQL analysis.
Serverless function.
When should Pulsar be considered?
Queues like RabbitMQ and flow handlers like Kafka are also needed.
Easy-to-use geo-replication is required.
Implement multi-tenancy and ensure access for each team.
The message needs to be retained for a long time and you do not want to unload it to another storage.
High performance is required, and benchmarking shows that Pulsar provides lower latency and higher throughput.
If you are in the cloud, be careful to consider cloud-based solutions. Cloud providers have different services that cover certain scenarios.
For example, for queued messages, cloud providers provide many services, such as Google pub / sub; for distributed logs, Confluent cloud or AWS Kinesis;StreamNative also provides Pulsar-based cloud services.
Cloud providers also provide very good security. The advantage of Pulsar is that it can provide many functions on one platform.
Some teams may use it as a messaging system for microservices, while others may use it as distributed logs for data processing.
Thank you for your reading, the above is the content of "what are the advantages of Pulsar". After the study of this article, I believe you have a deeper understanding of the advantages of Pulsar, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.