loading...

Windows Server 2019 – Clustering tiers

How to Install Intellij IDEA on Windows 10

An overhead concept to failover clustering that is important to understand is the different tiers at which clustering can benefit you. There are two levels upon which you can use clustering: you can take an either/or approach and use just one of these levels of failover clustering, or you can combine both to really impress your high availability friends.

Application-layer clustering

Clustering at the application level typically involves installing failover clustering onto VMs. Using VMs is not a firm requirement, but is the most common installation path. You can mix and match VMs with physical servers in a clustering environment, as long as each server meets the installation criteria. This application mode of clustering is useful when you have a particular service or role running within the operating system that you want to make redundant. Think of this as more of a microclustering capability, where you are really digging in and making one specific component of the operating system redundant with another server node that is capable of picking up the slack in the event that your primary server goes down.

Host-layer clustering

If application clustering is micro, clustering at the host layer is more macro. The best example I can give of this is the one that gets most admins started with failover clustering in the first place: Hyper-V. Let’s say you have two physical servers that are both hosting virtual machines in your environment. You want to cluster these servers together, so that all of the VMs being hosted on these Hyper-V servers are able to be redundant between the two physical servers. If a whole Hyper-V server goes down, the second one is able to spin up the VMs that had been running on the primary node, and after a minimal interruption of service, your VMs that are hosting the actual workloads in your environment are back up and running, available for users and their applications to tap into.

A combination of both

These two modes of using failover clustering mentioned earlier can certainly be combined together for an even better and more comprehensive high availability story. Let’s let this example speak for itself: you have two Hyper-V servers, each one prepared to run a series of virtual machines. You are using host clustering between these servers, so if one physical box goes down, the other picks up the slack. That in itself is great, but you use SQL a lot, and you want to make sure that SQL is also highly available. You can run two virtual machines, each one a SQL server, and configure application-layer failover clustering between those two VMs for the SQL services specifically. This way, if something happens to a single virtual machine, you don’t have to fail over to the backup Hyper-V server, rather your issue can be resolved by the second SQL node taking over. There was no need for a full-scale Hyper-V takeover by the second physical server, yet you utilized failover clustering in order to make sure that SQL was always online. This is a prime example of clustering on top of clustering, and by thinking along those lines, you can start to get pretty creative with all of the different ways that you can make use of clustering in your network.

How does failover work?

Once you have configured failover clustering, the multiple nodes remain in constant communication with each other. This way, when one goes down, they are immediately aware and can flip services over to another node to bring them back online. Failover clustering uses the registry to keep track of many per-node settings. These identifiers are kept synced across the nodes, and then when one goes down, those necessary settings are blasted around to the other servers and the next node in the cluster is told to spin up whatever applications, VMs, or workloads were being hosted on the primary box that went offline. There can be a slight delay in services as the components spin up on the new node, but this process is all automated and hands-off, keeping downtime to an absolute minimum.

When you need to cut services from one node to another as a planned event, such as for patching or maintenance, there is an even better story here. Through a process known as live migration, you are able to flip responsibilities over to a secondary node with zero downtime. This way, you can take nodes out of the cluster for maintenance or security patching, or whatever reason, without affecting the users or system uptime in any way. Live migration is particularly useful for Hyper-V clusters, where you will often have the need to manually decide which node your VMs are being hosted on, in order to accomplish work on the other node or nodes.

In many clusters, there is an idea of quorum. This means that if a cluster is split, for example, if a node goes offline or if there are multiple nodes that are suddenly unavailable through a network disconnect of some kind, then quorum logic takes over in order to determine which segment of the cluster is the one that is still online. If you have a large cluster that spans multiple subnets inside a network, and something happens at the network layer that breaks cluster nodes away from each other, all the two sides of the cluster know is that they can no longer communicate with the other cluster members, and so both sides of the cluster would automatically assume that they should now take responsibility for the cluster workloads.

Quorum settings tell the cluster how many node failures can happen before action is necessary. By the entire cluster knowing the quorum configuration, it can help provide answers to those questions about which section of the cluster is to be primary in the event that the cluster is split. In many cases, clusters provide quorum by relying on a third party, known as a witness. As the name implies, this witness watches the status of the cluster and helps to make decisions about when and where failover becomes necessary. I mention this here as a precursor to our discussion on new clustering capabilities baked into Server 2019, one of which is an improvement in the way that witnesses work in small environments. 

There is a lot more information to be gained and understood if you intend to create clusters large enough for quorum and witness settings. If you’re interested in learning more, check out https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/understand-quorum.

Comments are closed.

loading...