High availability Cluster of pfSense book (HA) 07/19 Update SLTechnology News&Howtos

High availability Cluster of pfSense book (HA)

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

High availability cluster

Overview of pfsync

Overview of pfSense XML-RPC configuration synchronization

Redundant configuration exampl

HA and Multi-WAN

Verify failover capabiliti

Provide redundancy without NAT

Layer 2 redundancy

High availability and bridging

Use IP aliases to reduce heartbeat traffic

Interface

Troubleshooting

The high availability of pfSense is achieved through a combination of the following features:

CARP is used for IP address redundancy

XMLRPC is used to configure synchronization

Pfsync is used for status table synchronization

With this configuration, the cell acts as an "active / passive" cluster, with the primary node as the primary unit, the secondary node as the standby role, and if the primary node fails, it will take over as needed.

Although it is often mistakenly referred to as "CARP cluster", because CARP is only one of several technologies used by pfSense to achieve high availability, two or more redundant pfSense firewalls are more suitable to be called "high availability cluster" or "HA cluster". Future CARP can be exchanged for different redundancy protocols.

An interface on each cluster node will be dedicated to synchronization tasks. This interface is often called the "Sync" interface and is used to configure synchronization and pfsync state synchronization. Any available interface can be used for "Sync".

Be careful

Some people call it the "CARP" interface, which is incorrect and very misleading. The CARP heartbeat occurs on each interface to the CARP VIP; CARP traffic and failover operations do not use the Sync interface.

The most common high-availability cluster configuration contains only two nodes. It's okay to have more nodes in the cluster, but they obviously don't have a particular advantage.

It is important to distinguish between the three functions (IP address redundancy, configuration synchronization, and state table synchronization) because they occur in different places. Configuration synchronization and state synchronization occur on the synchronization interface and communicate directly between firewall units. The CARP heartbeat is sent along with CARP VIP on each interface. The failover signal does not occur on the synchronous interface, but on each CARP-enabled interface.

Overview of CARP

The shared address redundancy Protocol (CARP) was created by OpenBSD developers as a free, open redundancy solution for sharing IP addresses among a group of network devices. There has been a similar solution before, mainly the IETF Virtual Router redundancy Protocol (VRRP) standard. However, Cisco claimed that VRRP had been covered by its Hot standby Router Protocol (HSRP) patent and informed OpenBSD developers that it would enforce its patent rights. As a result, OpenBSD developers have created a new free and open protocol to achieve essentially the same effect without infringing Cisco patents. CARP was put into use on OpenBSD in October 2003 and was later added to FreeBSD.

CARP type virtual IP addresses (VIP) are shared among the nodes of the cluster. One node is the primary node and receives traffic from IP addresses, while the other nodes maintain a backup state and monitor the heartbeat to see and confirm whether the primary node needs to assume the role of primary node in the event of a failure. Because only one member of the cluster is using the IP address at a time, CARP VIP does not have IP address conflicts.

For failover to work properly, inbound and outbound traffic entering the cluster (for example, routing upstream traffic, × ×, NAT, local client gateway, DNS requests, etc.) must be passed through CARP VIP, for example, outbound NAT will be sent from CARP VIP. If traffic is sent directly to a node instead of a CARP VIP, the traffic will not be received by other nodes.

CARP is similar to VRRP and HSRP, and conflicts can occur in some cases. Send a heartbeat on each interface that contains a CARP VIP, one heartbeat per VIP per interface. At the default deviation and base values, VIP emits a heartbeat every second. The deviation value determines which node is the primary node at a given point in time. No matter which node transmits the heartbeat, the faster node will assume the role of the master node. A higher deviation value can lead to an increase in heartbeat transmission delay, so a node with a lower deviation value will become the primary node device unless network or other problems cause heartbeat delay or loss.

Be careful

Do not use CARP VIP to access firewalls GUI, SSH, or other. For administrative purposes, you can only use the actual IP address on the interface of each individual node, not VIP. Otherwise, it is not possible to determine in advance which node unit is being accessed.

IP address requirements for CARP

A high-availability cluster using CARP requires three IP addresses in each subnet, and the Sync interface requires separate unused subnets. For the WAN, this means that the optimal configuration needs to be greater than or equal to 29 subnets. Each node uses an IP address, plus a shared CARP VIP address for failover. The synchronization interface requires only one IP address per node.

It is technically possible to configure the CARP VIP interface as a unique IP address in a given subnet, but this is generally not recommended. When used on WAN, this type of configuration will only allow communication from the primary node to the WAN, which can complicate tasks such as updates, plug-in installation, gateway monitoring, or any task that requires an external connection to the secondary node. It may be more suitable for internal interfaces, but internal interfaces are usually not subject to the same IP address restrictions as WAN, so it is still best to configure IP addresses on all nodes.

Switch / layer 2 issu

CARP heartbeats utilize multicast and may require special handling of the switches involved in the cluster. Some switches filter, limit speed, or otherwise interfere with multicast in ways that may cause CARP to fail. In addition, some switches use port restriction methods, which may cause CARP to fail to function properly.

Therefore, the switch must:

Allows you to send and receive multicast traffic without interfering with ports that use CARP VIP.

Allows traffic to be sent and received using multiple MAC addresses.

Allow CARP VIP MAC addresses to move between ports.

Almost all problems with CARP that fail to correctly reflect the expected state are switch failures or other layer 2 problems, so make sure the switch is configured correctly before continuing.

Overview of pfsync

Pfsync can keep firewall state tables synchronized between cluster nodes. Changes to the state table on the primary firewall are sent to the secondary firewall through the Sync interface, and vice versa. When pfsync is active and configured correctly, all nodes will know about each connection that flows through the cluster. If the primary node fails, the backup node takes over and the client does not notice that the transition has occurred because both nodes know the connection in advance.

Pfsync uses multicast by default, but you can define IP addresses to force unicast updates so that multicast traffic with only two firewalls cannot function properly. Any active interface can be used to send pfsync updates, but using a dedicated interface is more conducive to security and performance. Pfsync does not support any authentication method, so if you use any method other than a dedicated interface, any user with local network access can insert status into the status table. In a low-throughput environment with low security requirements, it is acceptable to use the LAN interface. The bandwidth required for this state synchronization can vary widely from environment to environment, but can be as high as 10% of the throughput through the firewall, depending on the insertion and deletion rates of states in the network.

Failover can still run without pfsync, but it will not be seamlessly connected. Without pfsync, if one node fails and another takes over, the user connection will be discarded. Users can immediately reconnect through another node, but they are interrupted during the conversion process. Depending on the usage in a particular environment, this may be ignored, or it may be an important but short interruption.

When using pfsync, the pfsync setting must be enabled on all nodes participating in state synchronization, including secondary nodes, or it will not function properly.

Pfsync and firewall rules

Rules that allow pfsync communication must be made on the Sync interface. This rule must pass the pfsync protocol from the Sync network source to any destination. The rules for all traffic through any protocol also allow the required traffic, but in general, more specific rules are more secure.

Pfsync and physical interface

The state in pfSense is bound to a specific system interface. For example, if WAN is em0, then the state on WAN will be bound to em0. If the cluster node has the same hardware and interface assignments, it works as expected. There may be problems with different hardware. If the WAN on one node is em0, but the WAN on the other node is igb0, then these states will not match and they will not be considered the same.

It is best to use the same hardware, but it may not meet the actual needs. There is a solution: add the interface to the LAGG, away from the underlying physical interface, so in the above example, the WAN will be lagg0 on both, and the state will be bound to the lagg0, although the lagg0 contains em0 on one node and igb0 on the other.

Pfsync and upgrad

Usually pfSense allows firewalls to be upgraded online without causing network disruption. However, this is not always the case with every upgrade, because the pfsync protocol can be changed to accommodate other features. Before upgrading, always check the upgrade guide linked in all release announcements to see if there are any special considerations for CARP users.

Overview of XML-RPC configuration synchronization

To make it easier to maintain firewall nodes, you can use XML-RPC for configuration synchronization. When XML-RPC synchronization is enabled, the settings for the support zone are replicated to the secondary device and activated each time the configuration is changed. XMLRPC synchronization is optional, but without it, the workload of maintaining a cluster is much greater.

Some locales cannot be synchronized, such as interface configuration, but more zones can be synchronized: firewall rules, aliases, users, certificates, × ×, DHCP, routing, gateways, etc. As a general rule, hardware-specific or specific installation items (such as interfaces or values under system > general or system > advanced settings) are not synchronized. The list of supported zones may vary depending on the version of pfSense used. Most plug-ins do not synchronize, but some plug-ins contain their own synchronization settings. Please refer to the relevant documentation for more details.

Configuration synchronization should use the Sync interface, or if there is no dedicated Sync interface, use the same interface configured for pfsync.

In a two-node cluster, the XML-RPC setting can only be enabled on the primary node, and the XML-RPC setting must be disabled on the secondary node.

For XML-RPC to work properly, both nodes must have GUI running on the same port and protocol, such as HTTPS on port 443 (the default). The administrator account cannot be disabled, and both nodes must have the same administrator account password.

Redundant configuration exampl

This section describes a simple three-interface HA configuration. The three interfaces are LAN,WAN and Sync. This is functionally equivalent to two interfaces LAN and WAN deployment, and the pfsync interface is only used to synchronize the configuration and firewall state between the primary and secondary firewalls.

Be careful

This example covers only the IPv4 configuration. High availability is compatible with IPv6, but it requires static addressing on the firewall interface. When you are ready to configure HA, if the static IPv6 assignment is not available, set IPv6 to none on all interfaces.

Determine IP address assignment

The first task is to plan the IP address assignment. A good strategy is to use the lowest available IP address in the subnet as the CARP VIP, the next subsequent IP address as the primary firewall interface IP address, and the next IP address as the secondary firewall interface IP address. This strategy is optional and you can use any solution, but we strongly recommend using a consistent and reasonable solution to simplify design and management.

WAN address

The WAN address will be selected from the address assigned by ISP. As shown in the WAN IP address allocation Table, the WAN subnet of HA is 198.51.100.0 Universe 24, and the addresses 198.51.100.200 to 198.51.100.202 will be used as WAN IP addresses.

WAN IP address assignment Table IP address function 198.51.100.200/24CARP shared IP address 198.51.100.201swap 24 Primary Node WAN IP address 198.51.100.202 Universe 24 Secondary Node WAN IP address

LAN address

The LAN subnet is 192.168.1.0 Universe 24. In this example, the LAN IP address will be assigned as shown in the LAN IP address allocation Table.

LAN IP address assignment Table IP address function 192.168.1.1/24CARP shared IP address 192.168.1.2 LAN IP address of 24 primary node LAN IP address of 24 secondary node

Sync interface address

There is no shared CARP VIP on this interface because it is not needed. These IP addresses are used only for communication between firewalls. In this example, 172.16.1.0 take 24 is used as the Sync subnet. Only two IP addresses will be used, but the mask (/ 24) will be the same as the other inside interfaces (LAN). For the last octet of the IP address, use the same last octet as the LAN IP address of the firewall to keep it consistent.

Sync IP address assignment Table IP address function 172.16.1.2 Sync IP address of 24 primary node 172.16.1.3 + 24 Sync IP address of secondary node

The following figure shows the structure of this sample HA. Both the primary and secondary nodes have the same connections as WAN and LAN, and a crossover cable is used to connect the Sync interface between them. In this example, there is still a potential single point of failure for WAN and LAN switches. Switch redundancy is described in layer 2 redundancy later in this chapter.

Sample diagram of HA network

Cluster configuration Foundation

Each node requires some basic configuration other than the actual HA settings. Do not connect both nodes to the same LAN until there are no conflicting LAN settings for both nodes.

Installation, interface assignment, and basic configuration

Install the operating system on the firewall and assign interfaces in the same way on both nodes. Interfaces must be assigned in the same order on all nodes. If the interface is not aligned, configuration synchronization and other tasks will not work properly. If any adjustments are made to the interface assignment, the same replication must be made on both nodes.

Then, connect to GUI and use the installation wizard to configure each firewall with a unique hostname and a non-conflicting static IP address.

For example, one node can be "firewall-a.example.com", the other can be "firewall- b.example.com", or a more personalized pair of names.

Be careful

Avoid naming nodes "master" or "backup" because these are the default states used by firewalls, and you can name them "primary" and "secondary".

The default LAN IP address is 192.168.1.1. Each node must use its own address, for example, the primary node uses 192.168.1.2 and the secondary node uses 192.168.1.3. The layout is displayed in the LAN IP address assignment. Once each node has a unique LAN IP address, both nodes can be plugged into the same LAN switch.

Set the Sync interface

Before continuing, you must configure the Sync interface on the cluster node. The synchronous IP address assignment lists the addresses used for the synchronization interface on each node. After setting up on the primary node, set it on the secondary node.

After the Sync interface is configured, firewall rules must also be added on both nodes to allow synchronization between them.

Firewall rules must allow communication between the two nodes by configuring synchronization (by default, port 443 used by HTTPS) and pfsync. You can use simple allow all style rules.

The following figure is a list of configured firewall rules that also contain rules that allow ICMP (ping) to be used for diagnostic purposes.

Example of Sync Interface Firewall rules

The secondary node does not need these rules, only one rule that allows traffic to pass through the GUI to make the XML-RPC run. Once XML-RPC is configured, all rules of the primary node are synchronized to the secondary node.

Configure pfsync

State synchronization using pfsync must be configured on the primary and secondary nodes for it to work properly.

Do the following first on the primary node and then on the secondary node:

Navigate to system > High availability synchronization (dual backup)

Set synchronization status

Set the synchronization interface to SYNC

Synchronize the pfsync peer IP to another node. Set to 172.16.1.3 when configuring the primary node and 172.16.1.2 when configuring the secondary node

Click Save

Configure synchronization (XML-RPC) settin

Warning

Configuration synchronization can only be configured on the primary node. Secondary nodes cannot and do not need to be configured.

On the primary node, do the following:

Navigate to system > High availability synchronization

Set "configure synchronization destination IP" under synchronization configuration to synchronize interface IP address 172.16.1.3 for the secondary node

Set the remote system user name to admin.

Be careful

User name must be "admin", other user names will not work properly!

Set the remote system password to the administrator user account password and enter it repeatedly in the confirmation box.

Select the check box for the area where synchronization is required to synchronize to the secondary node. The Toggle all button is used to select all options at once.

Click Save

After quick confirmation, go to the secondary node's firewall > rule policy list, and you can see that the rules entered by the primary node have been synchronized.

The two nodes are connected and configured to synchronize! Whenever modifications are made on the primary node, the changes are quickly synchronized to the secondary node.

Warning

Do not make any changes to the area where synchronization is set in the auxiliary section! These options will be overridden the next time the primary node performs synchronization.

Configure CARP virtual IP

By configuring synchronization, CARP virtual IP addresses only need to be added to the primary node, and they are automatically synchronized to the secondary node.

Navigate to the firewall > virtual IPs on the primary node and set up CARP VIP

Click the button on the right to add a new VIP at the top of the list.

Be careful

One VIP must be added for each interface that handles user traffic, and in this example, one for WAN and one for LAN. Type: defines the type of VIP, in which case CARP.

Interface: defines the interface on which VIP will reside, such as WAN

Address: the address box is the location where you enter the IP address value for VIP. The subnet mask must also be selected, and it must match the subnet mask on the interface IP address. In this example, enter 198.51.100.200 and 24 (see WAN IP address assignment).

Virtual IP password: sets the password for CARP VIP. This only requires a match between the two nodes, which will be handled through synchronization. The password and confirmation password boxes must be filled in and must match.

VHID group: a common strategy for defining ID for CARP VIP is to have VHID match the last byte of the IP address, so select 200in this case

Broadcast frequency: determines the frequency at which the CARP heartbeat is sent.

Base (basic value): controls the number of whole seconds between Heartbeats, usually 1. 0. This should match between cluster nodes. Skew (deviation value): controls the fraction of seconds (1max 256 increment). The primary node is usually set to 0 or 1, and the secondary node will be 100 or higher. This adjustment is handled automatically by XML-RPC synchronization.

Description: some text can recognize VIP, such as WAN CARP VIP.

Be careful

If CARP is too sensitive to the latency of a given network, it is recommended that you adjust the base value one second at a time until it is stable.

The above description takes WAN VIP as an example. LAN VIP is similarly configured, but it will use the LAN interface at 192.168.1.1 (see LAN IP address assignment).

If there are any additional IP addresses in the WAN subnet for 1:1 NAT, port forwarding, × ×, and so on, you can also add them here.

When you are finished editing, click apply changes.

After adding the VIP, check the firewall > virtual IPs on the secondary node to ensure that the VIP is synchronized as expected.

If the synchronization is successful, the virtual IP addresses on both nodes will look like the following figure.

CARP virtual IP address list

Configure outbound NAT for CARP

NAT will be configured next so that clients on LAN use the shared WAN IP to access the WAN.

Navigate to Firewall > NAT (address Translation), outbound tab

Click to select manual outbound NAT rule generation

Click Save

A set of rules that apply to the auto outbound NAT appears. Adjust the rules for internal subnet sources to use CARP IP addresses instead.

Click to the right of the rule to edit

Find the conversion part of the page

Select a WAN CARP VIP address from the address drop-down list

Change the description to mention that this rule connects NAT LAN to WAN CARP VIP address

Warning

If you later add another local interface (such as a second LAN,DMZ, etc.) and the interface uses a private IP address, you must add additional manual outbound NAT rules at this time.

When complete, the rule change will be similar to the rule change in the LAN outbound NAT rule using CARP VIP.

LAN outbound NAT rules of CARP VIP

Modify DHCP server

The DHCP server settings on the cluster nodes need to be adjusted so that they can work together. These changes will be synchronized from the primary node to the secondary node, so for VIP and outbound NAT, these changes only need to be made on the primary node.

Navigate to system Services > DHCP Server, LAN tab.

Set the DNS server to LAN CARP VIP, in this case 192.168.1.1

Set the gateway to LAN CARP VIP, in this case 192.168.1.1

Set the failover peer IP to the actual LAN IP address of the secondary node, here 192.168.1.3

Click Save

Setting the DNS server and gateway to CARP VIP ensures that the local client communicates with the failover address rather than directly to either node. If the primary node fails, the local client continues to talk to the secondary node.

The failover peer IP allows the daemon to communicate directly with the peer in this subnet to exchange data such as lease information. When the setting is synchronized with the secondary device, the value is automatically adjusted so that the secondary device points to the primary device.

High availability and multi-WAN

HA can also be used for firewall redundancy in multi-WAN configurations. This section details the VIP and NAT configurations required for dual WAN HA deployment.

Determine IP address assignment

In this example, four IP addresses will be used on each WAN. Each firewall needs an IP address, an additional CARP VIP for outbound NAT, and a CARP VIP for the 1:1 NAT entry, which will be used for the internal mail server in the DMZ segment.

WAN and WAN2 IP addr

The following table shows the IP addresses of the two WAN. In most environments, these will be public IP addresses.

WAN IP address IP address function 198.51.100.200 shared CARP VIP198.51.100.201 Primary Node Firewall WAN198.51.100.202 Secondary Node Firewall for outbound NAT WAN198.51.100.203 for 1:1 NAT shared CARP VIP

WAN2 IP address IP address function 203.0.113.10 shared CARP VIP203.0.113.11 Primary Node Firewall for outbound NAT WAN2203.0.113.12 Secondary Node Firewall WAN2203.0.113.13 for 1:1 NAT shared CARP VIP

LAN address

The LAN subnet is 192.168.1.0 Universe 24. In this example, the LAN IP address will be assigned as follows.

LAN IP address assignment IP address function 192.168.1.1CARP shared LAN VIP192.168.1.2 Primary Node Firewall LAN192.168.1.3 Secondary Node Firewall LAN

DMZ address

The DMZ subnet is 192.168.2.0lap24. Address assignments are shown in the table below.

DMZ IP address assignment Table IP address function 192.168.2.1CARP shared DMZ VIP192.168.2.2 Primary Node Firewall DMZ192.168.2.3 Secondary Node Firewall DMZpfsync address

There will be no shared CARP VIP on this interface, so there is no need to set it. These IP addresses are used only for communication between firewalls. In this example, 172.16.1.0 take 24 will be the Sync subnet. Use two IP addresses with the same subnet mask as the other internal interfaces.

Sync IP address assignment IP address function 172.16.1.2 Primary Node Firewall Sync172.16.1.3 Secondary Node Firewall SyncNAT configuration

The NAT configuration for using HA and multiple WAN is the same as for HA using a single WAN. Ensure that only CARP VIP is used for inbound traffic or routing. For more information about NAT configuration, see the relevant documentation.

Firewall configuration

When using multiple WAN, you must set up firewall rules to use the default gateway to pass traffic to the local network. Otherwise, when the traffic attempts to reach the CARP address or from LAN to DMZ, it will go out to connect to the WAN.

A rule must be added at the top of the firewall rule for all internal interfaces that directs traffic from all local networks to the default gateway. Care must be taken to use the default gateway, not one of the failover or load balancing gateway groups. The destination of this rule is the local LAN network, or an alias that contains any local reachable network.

Multi-WAN HA graphs with DMZ

The diagram for this layout is much more complex because of the additional WAN and DMZ elements, as shown in the following figure.

Multi-WAN HA schematic diagram with DMZ

Verify failover capabiliti

Because using HA involves high availability, it should be thoroughly tested before putting the cluster into production. The most important part of the test is to ensure that the HA system can fail over properly during a system outage.

"if any of the operations in this section do not work as expected, see High availability troubleshooting."

Check CARP status

On both systems, navigate to system status > CARP (failover). If everything works properly, all CARP VIP states of the primary node device will be displayed as MASTER and the secondary node device will be displayed as BACKUP.

If one of them displays DISABLED, click the enable CARP button and refresh the page.

If an interface displays INIT, it means that the interface that contains CARP VIP is not connected. Connect the interface to the switch, or at least to other nodes. If the interface is not in use, remove CARP VIP from the interface, as this interferes with normal CARP operation.

Check whether the configuration is synchronized correctly

Navigate to the primary location on the secondary node, such as Firewall > Rule Policy and Firewall > NAT (address Translation), to ensure that the rules created on the primary node are being synchronized to the secondary node.

Check DHCP failover statu

If DHCP failover is configured, its status can be checked in system status > DHCP lease. A new section appears at the top of the page containing the status of the DHCP failover pool, as shown in the following figure.

DHCP failover pool statu

Test CARP failover

Now do a real failover test. Before you begin, make sure that the local client behind CARP on LAN can connect to the Internet and that pfSense Firewall is running online. Once it is confirmed that it is working properly, now is a good time to make a backup.

During the actual test, unplug the primary node from the network or temporarily shut down the primary node. The client will be able to continue loading content from the Internet through the secondary node. Check the status > CARP (failover) again on the secondary node, and it will now report that it is the MASTER of LAN and WAN CARP VIP.

Now bring the primary node back online, it will return to the role of MASTER, and the secondary node system will demote itself to BACKUP. At any time in the process, the Internet connection still works.

Test HA in as many failure situations as possible. Other tests include:

Unplug the network cable from the WAN or LAN

Unplug the main power supply

Disable CARP on the primary node using the temporary disable feature and maintenance mode

Test each system separately (power off the secondary node, then power back on and turn off the main power)

Download files or attempt to transfer audio / video streams during failover

Run continuous ICMP echo requests (ping) to the Internet host during failover

Provide redundancy without NAT

As mentioned earlier, only CARP VIP provides redundancy for addresses that are processed directly by the firewall, and they can only be used with NAT or the services of the firewall itself. Redundancy can also provide redundancy for routed public IP subnets with HA. This section describes this type of configuration, which is common in large networks, ISP and wireless ISP networks, and data center environments.

Public IP allocation

The Wang side of pfSense requires at least one / 29 public IP range, which provides six available IP addresses. Only three are required for two firewall deployments, but this is the smallest IP subnet that holds three IP addresses. Each firewall requires an IP, and at least one CARP VIP is required on the WAN side.

The second public IP subnet will be routed to the CARP VIP via ISP, data center, or upstream routers. Because this subnet is routed to CARP VIP, routing does not depend on a single firewall. For the example configuration described in this chapter, the / 24 public IP subnet will be used and divided into two / 25 subnets.

Network Overview

The example network described here is a data center environment consisting of two pfSense firewalls, each with four interfaces: WAN, LAN, DBDMZ, and pfsync. The network contains many network and database servers. It is not based on any real network, but there will be instance deployments like this.

WAN network

The WAN side is connected to the upstream network, that is, the ISP, data center, or upstream router.

WEB network

The network segment in this network uses the "LAN" interface, but has been renamed. It contains a Web server, so it is named WEB, but it can be called DMZ,SERVERS or anything you need.

DBDMZ network

This segment is an OPT interface and contains a database server. In a managed environment, it is common to separate the network and the database server into two networks. Database servers usually do not need to be accessed directly from the Internet, so they are not affected by the harm of the Web server.

Sync network

The synchronization network in this figure is used to synchronize pfSense configuration changes through XML-RPC, as well as state table changes between two firewalls through pfsync synchronization. It is recommended to use a dedicated interface.

Network topology

The network topology is shown in the following figure, including all routable IP addresses, WEB networks, and database DMZ.

Relationship Diagram between HA and routing IPs

Be careful

Network segments containing database servers usually do not require public access, so it is more common to use private IP subnets, but the examples described here can be used regardless of the functionality of the two internal subnets.

Layer 2 redundancy

This section describes the layer 2 design elements to consider when planning a redundant network. This chapter assumes that only two systems can be deployed, but can also be extended to more deployments as needed.

If both redundant pfSense firewalls are connected to any interface of the same switch, the switch will become a single point of failure. To avoid this single point of failure, the best option is to deploy two switches per interface (except for dedicated pfsync interfaces).

The following figure is network-centric and does not show the switch infrastructure. The HA diagram with redundant switches explains how the environment works with the redundant switch infrastructure.

HA schematic diagram with redundant switch

Switch configuration

When using multiple switches, the switches should be interconnected. As long as there is a single connection between the two switches and there is no bridge on either firewall, it is secure for any type of switch. In cases where bridging is used, or where there are multiple interconnections between switches, care must be taken to avoid layer 2 loops. If this is the case, a managed switch is required that can use spanning Tree Protocol (STP) to detect and block ports that cause switch loops. When using STP, if the active link is down, such as a switch failure, the backup link can be automatically committed to its location.

PfSense also supports lagg (4) link aggregation and link failover interfaces, allowing multiple network interfaces to be plugged into one or more switches to improve fault tolerance. For more information about configuring link aggregation, see LAGG (Link aggregation).

Host redundancy

It is difficult to obtain the host redundancy of the key systems in the firewall. Each system can use two network cards to connect to each set of switches using Link aggregation Control Protocol (LACP) or similar vendor-specific features. Servers can also have multiple network connections, and depending on the operating system, you can run CARP or similar protocols on a group of servers so that they are redundant to each other.

Other single point of failure

When trying to design a fully redundant network, many single points of failure will be ignored. There are a lot more things to consider than a simple switch failure. Here are some common examples of redundancy:

Provide isolated power for each redundant segment.

Use separate circuit breakers for redundant systems.

Use multiple UPS/ generators.

Use multiple power suppliers to get into both sides of the building as much as possible.

Even a multi-WAN configuration does not guarantee the proper operation of Internet.

Use a variety of Internet connection technologies (DSL,Cable,Fiber,Wireless).

If any two operators use the same fiber / tunnel / path, they may be eliminated at the same time.

Backup cooling, redundant chillers or portable / emergency air conditioners.

Consider placing a second set of redundant equipment in another room, another floor, or another building.

There are duplicate settings in another part of the town or other city.

I heard that the mainframe on Mars is cheap, but the delay is killer.

High availability and bridging

High availability is currently not compatible with local bridging.

Use IP aliases to reduce heartbeat traffic

If there is a large amount of CARP VIP on a network segment, it will result in a lot of multicast traffic. Each CARP VIP sends a heartbeat every second. To reduce this traffic, the additional VIP can be "stacked" on a CARP VIP on the interface. First, select a CARP VIP as the "primary" VIP for the interface. Then, change the other CARP VIP in the same subnet to the IP alias type VIP, and select the "primary" CARP VIP interface as their interface on the VIP configuration.

This not only reduces the heartbeat seen in a given segment of the audience, but also causes all IP aliases VIP to change state along with the "master" CARP VIP, reducing layer 2 problems and reducing the probability of failover problems.

The IP alias VIP is not usually synchronized through XML-RPC configuration synchronization, but the IP alias VIP that is set to use the CARP interface in this way will.

Interface

If you need multiple subnets on a single interface with HA, you can use an IP alias to do so. As with the primary interface IP address, we recommend that each firewall have one IP address in the additional subnet, with a total of at least three IP per subnet. A separate IP alias entry must be added to each node within the new subnet to ensure that its subnet mask matches the actual subnet mask of the new subnet. The IP alias VIP directly on the interface is out of sync, and security can be guaranteed.

Once the IP alias VIP is added to both nodes to gain a foothold in the new subnet, you can add the CARP VIP using the IP address from the new subnet.

As long as there is no communication between additional subnets and two independent HA nodes, you can omit the IP alias and use CARP VIP directly in another subnet.

Troubleshooting

High availability configurations are relatively complex, and there are many different ways to configure failover clusters. In this section, some common (uncommon) problems are discussed in most cases and are expected to be resolved. If the problem persists after consulting this section, there is a special CARP / VIPs forum on the pfSense forum.

Before continuing, take the time to check all members of the HA cluster to ensure that they have a consistent configuration. In general, this helps to carefully check all the correct settings through the example settings. Repeat the process on the secondary node and note the different configurations on the secondary node. Be sure to check the CARP status (check the CARP status) and make sure that all cluster members have CARP enabled.

System status > system log, the system tab records errors related to HA. Check the logs on each system involved to see if there is any information about synchronizing with XMLRPC, CARP state transition, or other related errors.

Common misconfiguration

There are three common misconfigurations that prevent HA from working properly.

Each CARP VIP uses a different VHID

A different VHID must be used on each CARP VIP created on a given interface or broadcast domain. When using a single HA, input validation prevents duplicate VHID. But it's not always that simple. CARP is a multicast technology, so any content that uses CARP on the same network segment must use a unique VHID. VRRP also uses a protocol similar to CARP, so make sure there is no conflict with VRRP VHID, such as ISP or other routers on the local network that are using VRRP.

The best solution is to use a unique set of VHID. If you are using a private network that is known to be secure, start numbering from 1. On a network where VRRP or CARP is in conflict, consult the administrator of that network to find the VHID of the free zone.

Incorrect time

Check that the system involved synchronizes time correctly and has a valid time zone, especially when running in a virtual machine. If the time difference is too large, some synchronization tasks, such as DHCP failover, will not work properly.

Incorrect subnet mask

The real subnet mask must be used for CARP VIP, not / 32. This must match the subnet mask of the IP address on the interface to which the CARP IP is assigned.

IP address of the CARP interfac

The interface on which CARP VIP resides must have another IP defined directly on the interface (VLAN,LAN,WAN,OPT) before it can be used.

Incorrect hash error

If CARP does not work properly when this error occurs, the configuration may not match. Ensure that the given VIP, VHID, password, and IP address / subnet mask all match.

If the setting is correct and CARP still does not work when this error message is generated, there may be multiple instances of CARP in the same broadcast domain. Use tcpdump (packet capture) to disable CARP and monitor the network to examine other CARP or CARP-like traffic and adjust VHID appropriately.

If CARP is working properly and this message is in the log when the system boots, you can ignore it. As long as the CARP continues to work properly (the primary node shows the MASTER and the secondary node shows the BACKUP status), you will see that this message is normal at startup.

Both systems are shown as MASTER

This occurs if the secondary node cannot receive CARP broadcasts from the primary node. Check firewall rules, connections, and switch configuration. Also check the Syslog for possible error messages related to the solution. "if you encounter this problem in a virtual machine (VM) product, such as ESX, refer to the problem in the virtual machine (ESX)."

Primary node is downgraded to BACKUP

In some cases, this may occur normally for a short period of time after the system comes back online. However, some hardware failures or other error conditions may cause the server to silently assume a higher priority of 240 to indicate that it still has a problem and should not be the primary device. This can be checked through GUI or through the shell or system Diagnostics > command.

In GUI, this status is displayed in the error message system status > CARP.

CARP status when the primary node is degraded

In the shell or system Diagnostics > command, run the following command to check for degradation:

# sysctl net.inet.carp.demotion

Net.inet.carp.demotion: 240

If the value is greater than 0, the node has been degraded.

In this case, isolate the firewall, check the network connection, and perform further hardware tests.

If the downgrade value is 0 and the primary node still degrades itself to BACKUP or is constantly changing, check the network to make sure there are no layer 2 loops. If the firewall receives its own heartbeat from the switch, it can also trigger a change to the state of the BACKUP.

Virtual machine internal issues (ESX)

When using HA inside a virtual machine, especially VMware ESX, some special configurations are required:

Enable promiscuous mode on vSwitch.

Enable MAC address change.

Enable pseudo transfer.

The solution of ESX VDS hybrid mode

If you are using a virtual distributed switch, you can create port groups for firewall interfaces that enable promiscuous mode, and set separate non-promiscuous port groups for other hosts. Some users on the forum use this approach to balance the need for CARP functionality and the need to protect client ports.

ESX VDS upgrade issu

If you use VDS (virtual distributed switch) in 4.0 or 4.1 and upgrade it from 4.0 to 4.1 or 5.0, VDS will not be able to deliver CARP traffic correctly. If a new VDS is created on 4. 1 or 5. 0, it will work, but the upgraded VDS will not.

You can disable promiscuous mode on VDS and then re-enable it to solve this problem.

ESX VDS Port Mirror issu

If port mirroring is enabled on VDS, promiscuous mode is broken. To fix it, disable and then re-enable promiscuous mode.

ESX client port issu

If the physical HA cluster connects to the switch using multiple ports on the ESX host (lagg group or similar) and the destination VM can only access some devices / IP, the port group settings may need to be adjusted by ESX to set the load balancing of the IP-based settings group to be hash-based rather than the originating interface.

Side effects of incorrect settings include:

Traffic reaches the destination VM in promiscuous mode only on its NIC.

When the "real" IP address of the main firewall can be reached, the CARP VIP cannot be reached from the destination VM.

Port forwarding or other inbound connections to the destination VM work from some IP addresses rather than other IP addresses.

ESX physical Nic failure cannot trigger failover

Self-degradation in CARP depends on the loss of switch port links. Therefore, if the primary and secondary firewall instances are on different ESX units, and the primary device loses the switch port link and does not expose it to the virtual machine, CARP will retain MASTER on all its VIP, and the secondary device will believe that it should be MASTER. One way to solve this problem is to write an event in ESX that cancels the switch port on the VM if the physical port loses a link. There may also be other methods in ESX.

KVM + Qemu problem

Using the e1000 network card (em (4)), the ed (4) network card or CARP VIP will not leave the init state.

VirtualBox problem

Set the promiscuous mode on the relevant interfaces of VM: allow CARP to run on any interface type (bridging, host, internal)

Other switches and layer 2 issu

If the node unit is inserted into a separate switch, make sure that the switch is properly relayed and passes broadcast / multicast traffic.

Some switches have broadcast / multicast filtering, limiting or "storm control" functions that can break the CARP.

Some switches have damaged the firmware, causing functions such as IGMP Snooping to interfere with CARP.

If you are using the switch on the back of the modem / CPE, try using a real switch. These built-in switches usually fail to handle CARP traffic correctly. Typically, inserting a firewall into the appropriate switch and then uplink to CPE will eliminate the problem.

Configure synchronization issu

When you encounter problems with configuration synchronization, check the following items carefully:

The user name on all nodes must be admin.

The password in the configuration synchronization settings on the primary server must match the password in the backup.

WebGUI must be on the same port on all nodes.

WebGUI must use the same protocol (HTTP or HTTPS) on all nodes.

Traffic must be allowed to pass through the WebGUI port on the synchronous traffic interface.

The pfsync interface must be enabled and configured on all nodes.

Verify that only the primary synchronization node has the configuration synchronization option enabled.

Ensure that the IP address is not specified in the synchronization configuration to IP on the secondary node.

Ensure that the time on both nodes is up-to-date and quite accurate.

Troubleshooting HA and multi-WAN

If you are having trouble with CARP VIP when dealing with multiple WAN, check carefully for the same rules as those mentioned in the firewall configuration.

Translated from pfsense book

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.