Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Suggestions on Cluster Construction Scheme in the initial stage of CDH

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Cluster Scale Computing

Cluster size depends on user data and application requirements, and the final planning value is the maximum of the minimum cluster size obtained by the following calculation methods

·Capacity requirements

- Estimation is relatively easy and accurate

- In most cases, cluster size can be determined by capacity

·Computational requirements

- Accurate estimates of computing resources can only be made through small-scale testing and reasonable estimates

·Other resource constraints

- If the MapReduce application of the user may have special requirements for resources such as memory, and the configurable resources of a single node are relatively limited, the minimum size of the cluster must meet the requirements of such resources of the user.

network recommended

·10 Gigabit or higher speed networks recommended

- To fully utilize disk parallel operation bandwidth, at least 10 Gigabit network is required

- Even when bandwidth is adequate, using a high-bandwidth network can lead to performance improvements

·Network bandwidth sensitive scenarios:

- ETL type or other MapReduce tasks with high input and output data volumes

- For environments with limited space or power resources, large capacity nodes can be used in conjunction with high-speed networks

- Delay-sensitive applications such as HBase also have requirements for network transmission speed

traditional tree network

·Oversubscription

- Extending the network by adding layers has the following problems

- Network distance between nodes increases

- network excess problem worsens

·Therefore, try to use super multi-terminal ××× switch or expand the port capacity of the switch backplane

- Small or medium-sized networks can use a two-tier tree architecture

- Interact with external systems only through upstream ports of top-level switches

- Avoid Hadoop's network traffic storm from contaminating external networks

component architecture

·Head/Master Node: NameNode, Yarn and Master

- Provide critical, centralized, irreplaceable cluster management services

- If the management service stops, the corresponding cluster Hadoop service stops.

- Requires highly reliable hardware

Data/Worker/Slave Node

- Handles actual tasks such as data storage, subtask execution, etc.

- Running multiple services on the same node, to ensure locality

- if that service stop, other nodes automatically replace the service

- Hardware components may be damaged, but can be easily replaced

·Edge Node

- Provide Hadoop service agents and wrappers externally

- Access actual Hadoop services as clients

- Requires highly reliable hardware

Management node hardware requirements

Management node roles include NameNode, Secondary NameNode, Yarn RM

- Hive Meta Server and Hive Server are usually deployed on the management node server

- Zookeeper Server and Hmaster can select data node servers. Due to limited load, there are no special requirements for nodes.

- All HA candidate servers (Active and Standby) use the same configuration

- Usually high memory requirements but low storage requirements

·High-end PC servers or even minicomputer servers are recommended for improved performance and reliability

- Dual power supplies, redundant fans, NIC aggregation, RAID…

- System disk uses RAID1

- Due to the small number and high importance of management nodes, high configuration is generally not an issue

Data Node Configuration Strategy Recommendations

·Small number of clusters with high single-point performance versus large number of clusters with low single-point performance

- In general, use more machines than upgrade server configurations

- Purchasing servers in the most mainstream "cost-effective" configuration reduces overall costs

- Data multi-distribution for better scale-out parallelism and reliability

- Physical space, network size and other supporting equipment need to be considered

·Consider the number of clustered servers

- Compute intensive applications consider using better CPU and more memory

Memory requirements calculation

·Master roles that require large memory:

- NameNode, Secondary NameNode,YARN AM, Hbase Regionserver

·Node memory algorithm:

- Large Memory Role Memory Addition

- Computational applications require large memory, such as Spark/Impala recommends at least 256GB of memory

Hard disk capacity selection

·A higher number of hard disks is generally recommended

- Get better parallelism

- Different tasks can access different disks

- 8 1.5 TB drives perform better than 6 2TB drives

- In addition to permanent data storage requirements, it is generally recommended to reserve 20 to 30 percent of the space for temporary data storage.

MapReduce task intermediate data

·Twelve hard disks per server is common in real deployments

- Single node storage capacity up to 48TB

Storage service requirements

Data Source Hadoop Method Physical Storage Capacity Number of Data Nodes Original File, Data Volume 625T625TB3(Number of Replications)0.3(Compression Ratio)/80%(Disk Utilization)=703TB (Only store detailed data, no table, no MR) 703TB/30*1.05(redundancy)=25 Hbase and Cassandra data services: assume historical data volume is 2.6 T, daily increment is 55G, data retention is 365 days, 3 copies When compressed: ( 2.6 + 0.055365 ) 1.3*1.2 (key overhead)/70%(hard disk utilization)=51T press 30T 51T/30*1.3(redundancy) per node =3 When opening WAL, increase: region server wal size (usually less than half of RS memory)

Server configuration recommendations

Management Server Data Server Edge Server CPU2*E5-2620 v42*E5 - 2620 v42 * E5 - 2620v4 Hard Disk SAS 600GB*4;RAID0+1SAS 600 GB 2;SATA 2T15SAS 600GB2;SATA 2T15 Memory 256 G ECC 256 G ECC Network Dual 10 Gigabit NIC Dual 10 Gigabit NIC Number 3303

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report