What are the common interview questions for big data kafka? 07/12 Update SLTechnology News&Howtos

What are the common interview questions for big data kafka?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the common test questions of big data kafka". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "What are the big data kafka common test questions?"

1 What is Kafka?

Apache Kafka is an open source messaging system written by Scala. Apache is an open source messaging system project developed by the Apache Software Foundation.

Kafka was originally developed by LinkedIn and open-sourced in early 2011. Graduated from Apache Incubator in October 2012. The goal of the project is to provide a unified, high-throughput, low-latency platform for processing real-time data.

Kafka is a distributed message queue: producer, consumer functionality. It provides features similar to JMS, but with a completely different design implementation, and is not an implementation of the JMS specification. Kafka classifies messages according to Topic when they are saved. The sender is called Producer, and the receiver is called Consumer. In addition, the kafka cluster consists of multiple kafka instances, and each instance (server) becomes broker. Both the kafka cluster and the producers and consumers rely on the zookeeper cluster to store meta information to ensure system availability.

Difference between kafka and traditional messaging systems

(1) In terms of architectural model

RabbitMQ follows AMQP protocol. RabbitMQ broker consists of Exchange, Binding and queue, where exchange and binding form the routing key of messages; client Producer communicates with server through connection channel, Consumer obtains messages from queue for consumption (long connection, queue messages will be pushed to consumer end, consumer loop reads data from input stream). rabbitMQ is broker-centric; there is an acknowledgement mechanism for messages.

kafka follows the general MQ structure, producer, broker, consumer, consumer as the center, the client consumer on which the consumption information of the message is stored, consumer according to the point of consumption, batch pull data from the broker; no message confirmation mechanism.

(2) In terms of throughput

Kafka has high throughput, internal batch processing of messages, zero-copy mechanism, data storage and retrieval are local disk sequential batch operations, with O(1) complexity, message processing efficiency is very high.

rabbitMQ is slightly inferior to kafka in terms of throughput. Their starting point is different. rabbitMQ supports reliable delivery of messages, supports transactions, and does not support batch operations. Storage can be memory or hard disk based on storage reliability requirements.

3) In terms of availability

rabbitMQ supports miror queue, main queue fails, miror queue takes over. Kafka's broker supports active-standby mode.

(4) Cluster Load Balancer

Kafka uses zookeeper to manage brokers and consumers in the cluster, and can register topics on zookeeper; through zookeeper's coordination mechanism, producer saves broker information corresponding to topics, and can send them to broker randomly or polling; and producer can specify shards based on semantics, and messages can be sent to a certain shard of broker.

(5) Difference between kafka and activeMQ

Topic: A topic, i.e. an identifier, similar to the key in the map, through which messages are classified, and messages are classified according to Topic.

Common ground: Both have producer and consumer components, producers send messages to their respective servers,

(Sending a message defines a topic) and stores it. Difference:

activeMQ: Consumers subscribe to the topic they want in advance. When there is a message in the topic, the activeMQ server will send a message to the consumer, and then the consumer will go to the server to get the data they want.

Kafka: Consumers (specified topic) will periodically go to the kafka server to get data from that topic.

(6) Introduction to kafka components

producer: producer, mainly used for the production of our messages, push our messages to the kafka cluster through producer

topic: A highly abstract type of message, which can be understood as a collection of messages of a certain type. For a type of message, each topic will be divided into multiple partitions, which will be configured in the configuration file of the cluster. broker: kafka server, a broker represents a server node partition: the concept of partition, a topic in the message, can be split into multiple partition partitions, stored on multiple different servers, to achieve horizontal expansion of data storage.

response: copy, all partitions can be specified to store several copies, to achieve data redundancy, to ensure data security

segment: Each partiiton consists of multiple segments, and the segment contains two parts, a.log file and a.index file.

.log: Store our log files, all the data is finally stored in the form of log files in the kafka cluster.

.index: index file, all the indexes of.log files are stored here, so that we can quickly find a log file.

consumer: consumer, consume messages in our kafka cluster, question: how do we know which message a consumer has consumed?? We can record the location of each consumption by recording it.

The first recording method: kafka's local file system, which is slower, corresponds to a slow consumption method of kafka.

The second recording method: node data recording in zookeeper is relatively fast, corresponding to a fast consumption method of kafka.

Offset: offset, which is the record of which piece of data we consumed.

Messages sent to a topic by a publisher will be evenly distributed to multiple parts, and broker will add the message to the last segment of the corresponding part after receiving the published message.

3. Installation and erection of kafka cluster

Step 1: Download and upload compressed package

Step 2: Decompress

Step 3: Distribution of Installation Packages

Four Steps: Modifying the Configuration File

The first server modifies the profile

Second server modifies profile Third server modifies profile Step 4: Startup of three servers Startup of three servers

At this point, I believe everyone has a deeper understanding of "what are the big data kafka common test questions". Let's do it in practice! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.