Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

NetApp DataONTAP Cluster Mode Learning Note 1

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

NetApp storage operating system

Data ONTAP is the most popular storage operating system for NetApp, which runs on NetApp FAS (Fabric Attached Storage) systems. FAS system is designed as a shared storage system, which supports a variety of SAN and NAS storage protocols and has flexible functions.

NetApp also offers a SANtricity operating system that runs on the E-series. E-Series systems require dedicated SAN storage for some applications, especially applications that need to manage their own data. E-series systems evolved from the acquisition of Engenio in 2011.

Data ONTAP has two modes: 7 mode and cluster, also known as cluster mode or CDOT). FAS systems can run in either 7 mode or a cluster, but not both modes. Both modes are exactly the same, operating and controlling everything on the storage system.

Module 7 is an evolution from NetApp's original operating system Data ONTAP-7G. The cluster mode is developed from Spinnaker acquired by ONTAP and is more scalable than the 7 mode.

The functionality of earlier software versions of Cluster ONTAP has some limitations compared to the 7-mode model. Since 8.3, NetApp has focused on developing the cluster mode, not the 7 mode.

II. 7-mode Extensible limits

Up to two FAS controllers can be configured for high availability, with HA paired and managed as a pairing system. The disk owned by controller 1 will be accessed through controller 1. There are maximum limits on the number of disks and throughput handled by a single node, and additional HA pairs can be purchased, but they are separate systems for the client.

Move data uninterruptedly from disk to disk on the same controller, but moving between the controller will interrupt the client's access order and is more complex.

III. Cluster-mode

To overcome the scalability limitations of 7-mode, NetApp completely rewrites the 7-mode software architecture. The DataONTAP of the cluster developed from the acquisition of Spinnaker Networks in 2003.

The cluster Data ONTAP can be extended to 24 nodes on the NAS protocol and to 8 nodes of the cluster that supports the SAN protocol. A single cluster can be extended to 138PB. And you can add disks, enclosures and nodes without interruption.

The entire cluster can be managed as a single system. Clusters can be virtualized into different virtual storage systems called SVM storage virtual machines or Vservers. The SVM is displayed to the client as a single system. You can create SVM-level administrators, and SVM administrator administrators can access their own SVM. Data can be moved uninterruptedly between all nodes in the cluster through cluster interconnection.

Data processing is spread across different nodes in the cluster, each with its own CPU, memory and network resources, providing performance scalability and load balancing across clusters. Data can be mirrored or cached on multiple nodes in the cluster.

Four. disk and disk cabinet

NetApp used fibre Channel to connect the controller to the disk shelf, but now the fibre Channel disk shelf has been disabled by it.

The current model uses a SAS (Serial attached SCSI) disk shelf, which means that the controller is connected to the cabinet through SAS ports and cables. NetApp offers three types of disks-SSD solid state drives, SAS drives and SATA drives. All three types of disks are suitable for SAS shelves.

SSD disks provide the best performance, but have the highest cost per GB.

SATA (high-capacity) disks provide the lowest performance, but the lowest cost per GB.

SAS (performance) disks balance performance and cost per GB.

5. NetApp storage network

Through the implementation of VLAN, the performance and security of local area network can be improved. A separate broadcast domain that divides the LAN into layer 2 configures an access VLAN on the port inserted by the end host, and only traffic from that particular VLAN is sent out the access port. The configuration is all done on the switch, and the end host does not know its own VLAN.

Configure Dot1Q trunking on the links between switches to carry traffic that needs to carry multiple VLAN. When the switch forwards traffic to another switch, it marks it in the layer 2 Dot1Q header with the correct VLAN, and the receiving switch only forwards the VLAN.

The end host is usually only a member of a VLAN and is not aware of the existence of VLAN. A special case is a virtualized host where multiple virtual machines are connected to different subnets, in which case we need to relay the VLAN to the host.

NIC Teaming combines multiple physical network cards into a single logical interface to provide redundancy and (optional) load balancing. NIC Teaming is also known as binding, payload, or aggregation. When multiple physical ports are bundled into a logical link on a switch, it is called port channel (Port Channel), Ethernet channel (Ether channel), or link bundle (Link Bundle).

The network card is combined as a master / standby on the server, and all traffic will be transmitted through the primary physical port. If the port fails, traffic automatically fails over to the alternate port. For active and standby redundancy, there is no need to estimate any configuration on the switch.

Active / active NIC Teaming, traffic will be load balanced on all physical ports. In the event of a port failure, there are multiple ports that provide redundancy. For active / active mode, both the server and the connected switch need to add physical ports to a logical link, and the configuration needs to be consistent on both sides. The protocols used to negotiate between the server and the switch are static 802.3ad or LACP link aggregation control protocol, which is preferred if both the server and the switch support LACP.

Introduction to San Stora

SAN terminology:

LUN (logical unit number) is presented as a disk to the host

LUN is specific to the SAN (not NAS) protocol

The client is called the initiator

The storage system is called the target

Fibre Channel is the original SAN protocol and is still very popular. It uses dedicated adapters, cables and switches. Unlike Ethernet layering in OSI, including the physical layer, FCP sends SCSI commands over the fibre Channel network. Fibre Channel is a very stable and reliable protocol. It is lossless and unlike TCP and UDP, it supports bandwidth of 2meme, 4dev, 6je, 8 and 16 Gbps.

FCP is addressed using the full name of WWN, and WWN is an 8-byte address of 16 hexadecimal characters in the format: 15:00:00:f0:8c:08:95:de. The WWNN full name is assigned to the node in the storage network, and the same WWNN can identify multiple network interfaces of a single network node.

On the node, each individual end is assigned a different WWPN full-name port name. Multi-port HBA has a different WWPN on each port. WWPN is the equivalent of an MAC address in Ethernet, and WWPN is burned by the manufacturer and guaranteed to be unique in the world. WWPN is assigned to the HBA on the client and storage system, and when configuring the fibre Channel network, we are mainly concerned with WWPN, not WWNN.

Aliases can be configured for easier configuration and troubleshooting, for example, we can create an alias called EXCHANGE-SERVER for Exchange Server 15:00:00:f0:8c:08:95:de with WWPN. Aliases can be configured on fibre Channel switches and storage systems.

It is important to present the correct LUN to the correct host, and if the wrong host can connect to the LUN, it may damage it. The partition prevents unauthorized hosts from connecting to the storage system, but it does not prevent the host from accessing the wrong LUN it arrives at. Configure LUN mask on the storage system to lock LUN to the host that is authorized to access. To protect storage, you need to configure zoning on the switch and store LUN masking on the system.

Each switch in the fibre Channel network will be assigned a unique Domain ID, and one switch in the network will be automatically assigned as the root switch, which will assign Domain ID to other switches. Each switch listens to other switches in the network based on their Domain ID and how to route to other machines.

When the server or storage system's HBA is powered on, it sends a FLOGI Fabric login request to its locally connected fibre Channel switch, which then assigns it a 24-bit FCID fibre Channel ID address. The FCID assigned to the host is generated by the domain ID of the switch and the switch port accessed by the host. FCID is similar to an IP address. Used by fibre Channel switches to route traffic between servers and their storage. The switch maintains a FCID table-to-WWPN address mapping to the port on which the host resides.

Fibre Channel switches share FLOGI database information and use the FCNS fibre Channel name Service (Channel Name Service) with each other. Each switch in the network learns where each WWPN is and how to route it.

After the FLOGI Fabric login process is complete, the initiator will send the PLOGI port login, and based on the zoning configuration on the switch, the host will learn its available target WWPN. Finally, the initiating host will send a PLRI process to its target to log in to the storage, and the storage system will grant access to the host based on its configured LUN masking.

It is a critical task for the server to access its storage in the enterprise, and there must be no single point of failure. Therefore, redundant fibre Channel networks need to be configured. Each server and storage system will be connected to both fibre Channel networks through redundant HBA ports.

Fibre Channel switches will share information that is distributed to each other (such as Domain ID,FCNS databases and Zoning). If an error occurs on one of the switches, it can be propagated to the other switch, thus invalidating both switches and reducing server-to-storage connectivity. Therefore, the FABRIC of different bypass should not be cross-connected to each other, and the two bypass fabric remain physically separate. The host is connected to two FABRICS, but the switch is not.

Storage is allocated using ALUA asymmetric logical units, and the storage system tells the client which is the preferred path to use. The direct path of the node that owns the LUN is marked as optimized, and the other paths are marked as non-optimized.

During the process login, the initiator will detect that the available ports are connected to the storage target port group, and ALUA will notify which paths are preferred, and the multipathing software on the initiator will choose which paths or paths can reach the storage. All popular operating systems have multipath software and support active / active or active / standby paths. If a port of the client fails, it will automatically fail over to the alternate path.

The connection between the client and SAN storage is completely different from how Ethernet works. In Ethernet, all routing and switching decisions are handled by the network infrastructure. However, in SAN storage, multi-path intelligent selection is enabled by the client and the host.

In fibre Channel, the initiator automatically detects the available paths through FLOGI, and the PLOGI and PLRI processes on the initiator will choose which path or use the path.

ISCSI is an Internet small computer system interface protocol that runs on Ethernet and was initially seen as a less inexpensive alternative to fibre Channel. It has higher header packet overhead and lower reliability and performance than fibre Channel. It runs over Ethernet and can share data networks or have its own data private network infrastructure. The TOE (TCP offload engine) card is a professional adapter that can be used to reduce the CPU load on the server. Sometimes called iSCSI HBA.

Fibre Channel uses WWN to identify initiators and targets, and iSCSI uses IQN iSCSI qualified name (or, less commonly, EUI extension unique) identifiers for addressing. IQN can be up to 255characters long and has the following format: iqn.yyyy-mm.naming-authority: unique names such as iqn.1991-05.com.microsoft:testHost. IQN is assigned to the host as a whole, similar to WWNN in Fiber. ISCSI runs over Ethernet, so each port is addressed by an IP address.

The multipathing software on the initiator can choose which path or which path to take. Although it runs on Ethernet, iSCSI is still a SAN protocol, and multipath software is still used for intelligent routing on the initiator.

LUN masking is configured in the same way as fibre Channel, using IQN instead of WWPN to identify clients on the storage system. ISCSI does not support Zoning. Password-based authentication is usually configured on initiators to prevent spoofing. End-to-end IPSec encryption can also be enabled to enhance security.

Fibre Channel over Ethernet (FCoE) is the latest SAN protocol. With the advent of 10Gbps Ethernet, it is possible to support data and storage traffic with sufficient bandwidth on the same adapter.

FCoE uses the fibre Channel protocol encapsulated in Ethernet, but runs on Ethernet. QoS is used to guarantee the bandwidth required for storage traffic, and it retains the reliability and performance of fibre Channel.

Fibre Channel over Ethernet (FCoE) works the same way as the original fibre Channel FCP, except that it is encapsulated in Ethernet so that it can cross Ethernet. We still have WWPN initiators and goals, and use FLOGI,PLOGI and PLRI processes.

In FCoE, storage and data traffic are shared physical interfaces, and storage traffic uses FCP, so WWPN is required. Ethernet data traffic requires an MAC address, and Ethernet data traffic and FCP storage traffic work completely differently, so how can we make the physical interface support them at the same time? The answer is-We virtualize the physical interface into two virtual interfaces: the virtual NIC with the MAC address of Ethernet data traffic and the virtual HBA with WWPN for storage traffic. Storage and data traffic is divided into two different VLAN.

Fibre Channel transmission between initiator and destination is a lossless protocol that ensures no frame loss. Ethernet is not lossless. The TCP needs to be confirmed by the receiver in order to confirm that the number of data has arrived at the destination. If it acknowledges that it has not been received, the packet will be resent. FCoE uses FCP that assumes a lossless network, so we need a way to ensure that our storage packets are not lost as they pass through Ethernet.

PFC priority flow control the FCoE extension for Ethernet ensures lossless arrival, and PFC works on a hop-by-hop basis. Each NIC and switch in the path between the initiator and destination must support FCoE. The network card with FCoE function is called CNA converged network adapter.

NIC: network interface adapter, a traditional Ethernet card, which is used in the NAS protocol and iSCSI.

The TOE:TCP uninstall engine, which is used to uninstall TCP / IP processing from the server's CPU, can improve the performance of the NAS protocol and iSCSI.

HBA: host bus adapter. Fibre Channel is equivalent to NIC.

ISCSI HBA: Ethernet Toe card optimized for iSCSI.

CNA: converged network adapters. 10Gb Ethernet card that supports FCoE.

UTA: the universal target adapter supports FCoE's NetApp proprietary card or fibre Channel.

RAID is a cheap disk or redundant array, where multiple physical disks are combined into a single logical unit to provide redundant or improved performance, or both. Different RAID levels provide different levels of redundancy and performance compared to individual disks, and RAID can be managed by operating system software or controlled by hardware RAID controllers.

VII. NetApp storage system configuration

Vol0

When the storage system leaves the factory, it already has Data ONTAP installed.

The operating system image is installed on the CompactFlash (CF card)

System configuration information is stored on the hard disk

An existing aggregation and volume is required to hold the system configuration

Aggr0 and Vol0 exist on every node in the cluster

System information includes replication database (RDB) and log files stored on Vol0

System information is replicated between node networks on the cluster

Do not store user data on Vol0, it is only used to store system information

Copy database (RDB)

RDB consists of five modules:

-manage the gateway

-Volume location database

-Virtual Interface Manager

-prevent configuration and operation management

-configure replication service

A node in the cluster will be selected as the replication RDB information for each node

The same node will become the new primary node in the event of a failover

Management Gateway

Management gateway provides management CLI

Manage the cluster by connecting to the cluster's management address, using GUI or CLI mode

When any changes occur in the cluster, the changes are replicated in all nodes throughout the cluster

Volume location database (VLDB)

The volume location database lists which aggregates contain which volumes and which nodes contain which aggregations

Clients can connect to different nodes to view volumes without having to go to the volume's host node. VLDB allows all nodes in the cluster to track where the volume is located

Administrators can move volumes to different aggregates, which triggers VLDB updates

VLDB is cached in the memory of each node to optimize performance

Virtual Interface Manager (VIFMGR)

The virtual interface manager lists the physical interface on which the logical interface is currently up

The IP address exists on the logical interface (LIF)

If a failover occurs, the logical interface can be moved to a different physical interface

Block configuration and Operations Management (BCOM)

BCOM stores information about the SAN protocol

Contains information on LUNs and iGroups (LUN Masking)

Configure replication Service (CRS)

The configuration replication service is used by MetroCluster to replicate the configuration and manipulate data replication to the remote auxiliary cluster

Data SVMs

Data SVM provides data to the client

Data SVM does not exist by default (initial)

You must have at least one data SVM and contain volumes for clients to access the data

You can create multiple data SVM for different departments to create separate secure logical storage systems

Managed SVM

Management SVM, also known as cluster management server, provides administrative access to the system

The management SVM is created during system setup

Management SVM does not host user volumes, it is used purely for administrative access

The management SVM has a cluster management LIF, and the cluster management LIF can fail over to another physical port in the entire cluster

Node SVM

The node SVM is also used purely for administrative access

A node SVM is also created during system setup

Node SVM owns the node management LIF for that node

Node management LIF can fail over to another physical port of the same node

Just like the cluster management LIF, the node management LIF can also be used to manage the entire cluster, not just the individual node

Service processor (SP)

The service processor provides remote management of the controller

SP is an independent system within the controller, and the SP example can be used as long as the power cord is connected, even if the Data ONTAP handles the off state.

You can view environmental properties such as temperature, fan speed, voltage, etc., through SP's CLI.

If the management IP address is not responding, you can restart the system by connecting to the Data ONTAP CLI.

If there are any environmental restrictions, SP can shut down the controller and notify NetApp to support

Cluster1:: > system nodeservice-processor network modify-node cluster1-01

-address-type IPv4-enable true-ip-address 172.23.1.14

-netmask 255.255.255.0-gateway172.23.1.254

Access to the service processor

There are two ways to access the service processor

Through the IP address of the SSH service processor

Press Ctrl+G in the controller session and press Ctrl+D to end the session

Login to SP requires a special user name: naroot, and use the password of admin

Hardware-assisted failover

Controllers between HA pair pairs send keepaives to each other through HA cable

If multiple keepalives are not received, start failover

Through hardware-assisted failover, if its own controller is detected to fail, SP sends a signal to the peer controller and requires an immediate takeover.

Authorization. There are three kinds of authorization.

Standard authorization

A standard license is a node locking license. The license is associated with the serial number of the node. If the node leaves the cluster, the license will leave the cluster together. As long as there is at least one node in the cluster, the feature can be used. Each node in the cluster should have a license.

Site authorization

Site licenses are related to cluster serial numbers, not specific nodes. The function is available to all nodes in the cluster. If a node leaves the cluster, the license remains at the site.

Evaluation authorization

The evaluation license is a demo license with an expiration time limit. They are bound to the cluster serial number, not to a specific node. The function is available to all nodes in the cluster. If a node leaves the cluster, the license remains at the site.

VIII. NetApp physical resources and structure

Storage structure

SVM (Vservers) structure

Disk cabinet

"you can store up to 10 enclosures in the stack, and it is a best practice not to mix media types in the same stack, and will ID the top enclosure with a 0 end."

Disks (and the aggregations made up of them) belong to a node. If a node fails, the HA peer can take ownership of the disk, and the HA peer connected through SAS is the active / standby structure. Nodes can reach the aggregation switches of other nodes through cluster interconnection.

Aggr0 and vol0: when the system is powered on, it loads the Data ONTAP system image from the CF card and then loads the system information from disk. The lowest accessible data level volume (volume) level, so an aggr and vol are required to hold the system information. The system root collection and volumes are Aggr0 and Vol0, system information is replicated on all nodes in the cluster, and each node in the cluster has Aggr0 and Vol0. If the system is factory reset, all disks will be erased, then a new Aggr0 will be created and a Vol0 will be created on each node.

Ownership: the disk must be assigned to a specific node in the HA pair to use the disk. By default, disks are automatically allocated, and automatic allocation can be done at the Stack,Shelf or bay level. At the Stack level, all disks in the Stack are allocated to the IOM-A controller connected to the Stack. In the Shelf layer, half of the Shelf in the Stack will be assigned to each node. At the bay level, half of the disks in a Shelf will be allocated to each node. Automatic disk allocation can be disabled, and if so, all newly added disks need to be manually assigned to the system before they can be used. On smaller 2-node systems, you may need to allocate all disks to specific nodes, and you can do a large aggregation, so that the system will act as active / standby (instead of controlling one aggr per node). Read and write requests from the client will be processed by only one node.

Disks are grouped into RAID groups. The RAID group is assigned to the aggregation. RAID group configuration is an attribute of aggregation. The RAID group configuration specifies how many data disks we have, capacity, and how many parity disks are used for redundancy.

The RAID group can be RAID 4 or RAID-DP. RAID 4-single parity with a single hard disk failure,-aggr0 with a minimum size of 2 disks, and normal data aggregation with a minimum size of 3 disks. RAID-DP- double parity can tolerate two hard disk failures,-the minimum size of aggr0 is 3 disks, and the minimum size of normal data aggregation is 5 disks.

The disks in the RAID group must be of the same type (SAS,SATA or SSD) of the same size and speed.

If the hard drive fails, the system will automatically replace it with a spare disk. Disks and data of the same type, size, and speed will be rebuilt. Performance degradation occurs before the rebuild is completed. There should be at least two spare disks in the disk system, the size and speed of each type.

Aggregations can consist of one or more RAID groups, small aggregates will have only one RAID group, and larger aggregations will have multiple RAID groups, in order to achieve a good balance between capacity and redundancy. You can have an aggregate disk of 16 individual RAID-DP groups, which will give 14 data drives the capacity and 2 parity disks. For redundancy, do not aggregate a 48 hard disk into only one RAID group. This will give 46 data disks and 2 parity disks. Because there are too many opportunities for multiple drive failures. The more disks, the higher the probability of failure. Use 3 RAID groups with a size of 16 disks. This increases redundancy. All RAID groups in the aggregation should be as close to the same size as possible, with the recommended RAID group size for HDD being 12 to 20 disks and the recommended RAID group size for SSD 20 to 28 disks. We also need to consider performance, the more disks we have, the better performance, because we can read and write data from multiple disks at the same time.

Advanced disk partitioning, which supports ADP only on entry-level platforms (FAS2500) and AFF, which uses RAID-DP and does not support MetroCluster. The new system is equipped with ADP, and systems running earlier versions of Data ONTAP can be converted to ADP.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report