In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how to deploy the real environment of kafka, which has a certain reference value, and interested friends can refer to it. I hope you will gain a lot after reading this article.
Operating system selection
Because the kafka server code is developed in Scala language, it belongs to the big data framework of JVM. At present, the three most deployed operating systems are mainly Linux, OS X and Windows, but the largest number of operating systems are deployed in Linux. Why? Because of the use of the Icano model and the efficiency of data network transmission.
1: first: Kafka's new version of Clients uses Java's Select model when designing the underlying network library, while the implementation mechanism of Linux is epoll. Interested readers can query the difference between epoll and select. It is clear that kafka is more efficient on Linux, because epoll cancels the polling mechanism and replaces it with a callback mechanism, which avoids the waste of CPU time when there are a large number of underlying socket connections.
2: second: the efficiency of network transmission. Kafka needs to transfer data through network and disk, and most operating systems are implemented through Java's FileChannel.transferTo method, while Linux operating system will call sendFile system call, that is, zero copy (Zero Copy technology), which avoids repeated copying of data in kernel address space and user program space.
Disk type planning
1: mechanical disk (HDD)
Generally speaking, the seek time of mechanical disks is in milliseconds, and if there are a large number of random Icano, there will be an exponential delay, but kafka reads and writes sequentially, so the performance of mechanical disks is not weak, so it can be considered based on the cost.
2: solid state hard drive (SSD)
The speed of reading and writing is considerable, and there is no cost to consider.
JBOD (Just Bunch Of Disks) is an economical solution, which can be used when the data security level is not very high. It is recommended that users set multiple log paths on the Broker server, each of which is mounted on different disks, which can greatly improve the speed of concurrent log writes.
3: RAID disk array
The common RAID is RAID10, or (RAID1 + 0). This disk array combines disk mirroring and disk striping technology to protect data. Because the disk mirroring technology is used, the utilization rate is only 50%. Note that LinkedIn uses RAID as storage to provide services. So what are the drawbacks? If the number of Kafka replicas is set to 3, there will actually be six times as much redundant data, and the utilization is simply too low. Therefore, LinkedIn is planning to change the scheme to JBOD.
Disk capacity planning
Our company's Internet of things platform can generate about 100 million messages a day. Assuming that the copy replica is set to 2 (in fact, we set it to 3), the data retention time is 1 week, and the average reported event message is about 1K. Then the total amount of messages generated per day is 100 million times 2 times 1K divided by 1000 divided by 1000 = 200G disk. Set aside 10% of the disk space for 210G. It's about 1.5T a week. Using compression, the average compression ratio is 0.5 and the overall disk capacity is 0.75T.
The main related factors are:
Number of new messages
Number of copies
Whether compression is enabled
Message size
Message retention time
Memory capacity planning
For the use of memory, kafka does not rely too much on JVM memory, but more on the page cache of the operating system. If consumer hits the page cache, it does not need to consume physical Icano operations. In general, the use of java heap memory is disillusioned and will soon be GC. In general, it will not exceed 6G. For machines with 16G memory, the file system page cache can reach 10-14GB.
1: how to design page cache can be set to the file size of a single log segment. If the log segment is 10G, then the page cache should be at least designed to be above 10g.
2: heap memory should not exceed 6 gigabytes.
CPU selection planning
Kafka is not a computing-intensive system, so it is OK to have enough CPU cores instead of pursuing the clock frequency, so it is better to choose a core number greater than 8.
Network bandwidth determines the number of Broker
The main bandwidth is 1Gb/s and 10 Gb/s. We can call it gigabit network and 10 gigabit network. Examples are as follows:
Our Internet of things system processes 1Tb data every hour of the day. If we choose 1Gb/b bandwidth, how many machines do we need to choose?
Assuming that the network bandwidth is dedicated to kafka and 70% of the bandwidth is allocated to the kafka server, then the bandwidth of a single Borker is 710Mb/s, but in case of sudden traffic problems, it is easy to fill up the Nic, so it is reducing by 1max 3, or 240Mb/s. Because you need to process 292MB 1TTB data per second, that is, 2336Mb data per second, you need at least 2336MB Broker data per hour to process Broker data. Redundant design, which can eventually be set at 20 machines.
Thank you for reading this article carefully. I hope the article "how to deploy the real environment of kafka" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.