Windows Server 2019 – Network Load Balancing (NLB)

How to Create MySQL Users Accounts and Grant Privileges

Often, when I hear people discussing redundancy on their servers, the conversation includes many instances of the word cluster, such as, “If we set up a cluster to provide redundancy for those servers…” or “Our main website is running on a cluster…” While it is great that there is some form of resiliency being used on the systems to which these conversations pertain, it is often the case that clustering is not actually involved anywhere. When we boil down the particulars of how their systems are configured, we discover that it is NLB doing this work for them. We will discuss real clustering further along in this chapter, but first I wanted to start with the more common approach to making many services redundant. NLB distributes traffic at the TCP/IP level, meaning that the server operating systems themselves are not completely aware of or relying on each other, with redundancy instead being provided at the network layer. This can be particularly confusing—NLB versus clustering—because sometimes Microsoft refers to something as a cluster, when in fact it is using NLB to make those connections happen. A prime example is DirectAccess. When you have two or more DA servers together in an array, there are TechNet documents and even places inside the console where it is referred to as a cluster. But there is no failover clustering going on here; the technology under the hood that is making connections flow to both nodes is actually Windows NLB.

You’ve probably heard some of the names in the hardware load balancer market—F5, Cisco, Kemp, Barracuda. These companies provide dedicated hardware boxes that can take traffic headed toward a particular name or destination, and split that traffic between two or more application servers. While this is generally the most robust way that you can establish NLB, it is also the most expensive and makes the overall environment more complex. One feature these guys offer that the built-in Windows NLB cannot provide is SSL termination, or SSL offloading, as we often call it. These specialized appliances are capable of receiving website traffic from user computers, that is SSL, and decrypting the packets before sending them on their way to the appropriate web server. This way, the web server itself is doing less work, since it doesn’t have to spend CPU cycles encrypting and decrypting packets. However, today we are not going to talk about hardware load balancers at all, but rather the NLB capabilities that are provided right inside Windows Server 2019.

Not the same as round-robin DNS

I have discovered, over the years, that some people’s idea of NLB is really round-robin DNS. Let me give an example of that: say you have an intranet website that all of your users access daily. It makes sense that you would want to provide some redundancy to this system, and so you set up two web servers, in case one goes down. However, in the case that one does go down, you don’t want to require manual cutover steps to fail over to the extra server, you want it to happen automatically. In DNS, it is possible to create two host A records that have the same name, but point to different IP addresses. If Server01 is running on 10.10.10.5 and Server02 is running on 10.10.10.6, you could create two DNS records both called INTRANET, pointing one host record at 10.10.10.5, and the other host record at 10.10.10.6. This would provide round-robin DNS, but not any real load balancing. Essentially what happens here is that when the client computers reach out to INTRANET, DNS will hand them one or the other IP address to connect. DNS doesn’t care whether that website is actually running, it simply responds with an IP address. So even though you might set this up and it appears to be working flawlessly because you can see that clients are connecting to both Server01 and Server02, be forewarned. In the event of a server failure, you will have many clients who still work, and many clients who are suddenly getting Page cannot be displayed when DNS decides to send them to the IP address of the server that is now offline.

NLB is much more intelligent than this. When a node in an NLB array goes down, traffic moving to the shared IP address will only be directed to the node that is still online. We’ll get to see this for ourselves shortly, when we set up NLB on an intranet website of our own.

What roles can use NLB?

NLB is primarily designed for stateless applications, in other words, applications that do not require a long-term memory state or connection status. In a stateless application, each request made from the application could be picked up by Server01 for a while, then swing over to Server02 without interrupting the application. Some applications handle this very well (such as websites), and some do not.

Web services (IIS) definitely benefits the most from the redundancy provided by NLB. NLB is pretty easy to configure, and provides full redundancy for websites that you have running on your Windows Servers, without incurring any additional cost. NLB can additionally be used to enhance FTP, firewall, and proxy servers.

Another role that commonly interacts with NLB is the remote access role. Specifically, DirectAccess can use the built-in Windows NLB to provide your remote access environment with redundant entry-point servers. When setting up DirectAccess to make use of load balancing, it is not immediately obvious that you are using the NLB feature built into the operating system because you configure the load-balancing settings from inside the Remote Access Management console, rather than the NLB console. When you walk through the Remote Access Management wizards in order to establish load balancing, that Remote Access console is actually reaching out into the NLB mechanism within the operating system and configuring it, so that its algorithms and transport mechanisms are the pieces being used by DirectAccess in order to split traffic between multiple servers.

One of the best parts about using NLB is that you can make changes to the environment without affecting the existing nodes. Want to add a new server into an existing NLB array? No problem. Slide it in without any downtime. Need to remove a server for maintenance? No issues here either. NLB can be stopped on a particular node, allowing another node in the array to pick up the slack. In fact, NLB is actually NIC-particular, so you can run different NLB modes on different NICs within the same server. You can tell NLB to stop on a particular NIC, removing that server from the array for the time being. Even better, if you have a little bit of time before you need to take the server offline, you can issue a drainstop command instead of an immediate stop. This allows the existing network sessions that are currently live on that server to finish cleanly. No new sessions will flow to the NIC that you have drain-stopped, and old sessions will evaporate naturally over time. Once all sessions have been dropped from that server, you can then yank it and bring it down for maintenance.

Virtual and dedicated IP addresses

The way that NLB uses IP addresses is an important concept to understand. First of all, any NIC on a server that is going to be part of a load-balanced array must have a static IP address assigned to it. NLB does not work with DHCP addressing. In the NLB world, a static IP address on an NIC is referred to as a Dedicated IP Address (DIP). These DIPs are unique per NIC, obviously meaning that each server has its own DIP. For example, in my environment, WEB1 is running a DIP address of 10.10.10.40, and my WEB2 server is running a DIP of 10.10.10.41.

Each server is hosting the same website on their own respective DIP addresses. It’s important to understand that when establishing NLB between these two servers, I need to retain the individual DIPs on the boxes, but I will also be creating a new IP address that will be shared between the two servers. This shared IP is called the Virtual IP Address (VIP). When we walk through the NLB setup shortly, I will be using the IP address of 10.10.10.42 as my VIP, which is so far unused in my network. Here is a quick layout of the IP addresses that are going to be used when setting up my network load-balanced website:

WEB1 DIP = 10.10.10.40 
WEB2 DIP = 10.10.10.41 
Shared VIP = 10.10.10.42 

When establishing my DNS record for intranet.contoso.local, which is the name of my website. I will be creating just a single host A record, and it will point at my 10.10.10.42 VIP.

NLB modes

Shortly, we will find ourselves in the actual configuration of our load balancing, and will have a few decisions to make inside that interface. One of the big decisions is what NLB mode we want to use. Unicast is chosen by default, and is the way that I see most companies set up their NLB, perhaps because it is the default option and they’ve never thought about changing it. Let’s take a minute to discuss each of the available options, to make sure you can choose the one that is most appropriate for your networking needs.

Unicast

Here, we start to get into the heart of how NLB distributes packets among the different hosts. Since we don’t have a physical load balancer that is receiving the traffic first and then deciding where to send it, how do the load-balanced servers decide who gets to take which packet streams?

To answer that question, we need to back up a little bit and discuss how traffic flows inside your network. When you open up a web browser on your computer and visit HTTP://WEB1, DNS resolves that IP address to 10.10.10.40, for example. When the traffic hits your switches and needs to be directed somewhere, the switches need to decide where the 10.10.10.40 traffic needs to go. You might be familiar with the idea of MAC addresses.

Each NIC has a MAC address, and when you assign an IP address to a NIC, it registers its own MAC address and IP with the networking equipment. These MAC addresses are stored inside an ARP table, which is a table that resides inside most switches, routers, and firewalls. When my WEB1 server was assigned the 10.10.10.40 IP address, it registered its MAC address corresponding to 10.10.10.40. When traffic needs to flow to WEB1, the switches realize that traffic destined for 10.10.10.40 needs to go to that specific NIC’s MAC address, and shoots it off accordingly.

So in the NLB world, when you are sending traffic to a single IP address that is split between multiple NICs, how does that get processed at the MAC level? The answer with unicast NLB is that the physical NIC’s MAC address gets replaced with a virtual MAC address, and this MAC is assigned to all of the NICs within the NLB array. This causes packets flowing to that MAC address to be delivered to all of the NICs, therefore all of the servers, in that array. If you think that sounds like a lot of unnecessary network traffic is moving around the switches, you would be correct. Unicast NLB means that when packets are destined for the virtual MAC address of an array, that traffic is basically bounced through all ports on the switch before finding and landing on their destinations.

The best part about unicast is that it works without having to make any special configurations on the switches or networking equipment in most cases. You set up the NLB configuration from inside the Windows Server tools, and it handles the rest. A downside to unicast is that, because the same MAC address exists on all the nodes, it causes some intra-node communication problems. In other words, the servers that are enabled for NLB will have trouble communicating with each other’s IP addresses. Often, this doesn’t really matter, because WEB1 would rarely have reason to communicate directly with WEB2. But if you really need those web servers to be able to talk with each other consistently and reliably, the easiest solution is to install a separate NIC on each of those servers, and use that NIC for those intra-array communications, while leaving the primary NICs configured for NLB traffic.

The other downside to unicast is that it can create some switch flooding. The switches are unable to learn a permanent route for the virtual MAC address, because we need it to be delivered to all of the nodes in our array. Since every packet moving to the virtual MAC is being sent down all avenues of a switch so that it can hit all of the NICs where it needs to be delivered, it has the potential to overwhelm the switches with this flood of network packets. If you are concerned about that or are getting complaints from your networking people about switch flooding, you might want to check out one of the multicast modes for your NLB cluster.

An alternative method for controlling unicast switch flooding is to get creative with VLANs on your switches. If you plan an NLB server array and want to ensure that the switch traffic being generated by this array will not affect other systems in your network, you could certainly create a small VLAN on your switches and plug only your NLB-enabled NICs into that VLAN. This way, when the planned flood happens, it only hits that small number of ports inside your VLAN, rather than segmenting its way across the entire switch.

Multicast

Choosing multicast as your NLB mode comes with some upsides, and some headaches. The positive is that it adds an extra MAC address to each NIC. Every NLB member then has two MAC addresses: the original and the one created by the NLB mechanism. This gives the switches and networking equipment an easier job of learning the routes and sending traffic to its correct destinations, without an overwhelming packet flood. In order to do this, you need to tell the switches which MAC addresses need to receive this NLB traffic, otherwise, you will cause switch flooding, just like with unicast. Telling the switches which MACs need to be contacted is done by logging into your switches and creating some static ARP entries to accommodate this. For any company with a dedicated networking professional, usually proficient in Cisco equipment, this will be no sweat. If you are not familiar with modifying ARP tables and adding static routes, it can be a bit of a nuisance to get it right. In the end, multicast is generally better than unicast, but it can be more of an administrative headache. My personal preference still tends to be unicast, especially in smaller businesses. I have seen it used in many different networks without any issues, and going with unicast means we can leave the switch programming alone.

Multicast IGMP

Better yet, but not always an option, is multicast with Internet Group Management Protocol (IGMP). Multicast IGMP really helps to mitigate switch flooding, but it only works if your switches support IGMP snooping. This means that the switch has the capability to look inside multicast packets in order to determine where exactly they should go. So where unicast creates some amount of switch flooding by design, multicast can help to lower that amount, and IGMP can get rid of it completely.

The NLB mode that you choose will depend quite a bit upon the capabilities of your networking equipment. If your servers have only a single NIC, try to use multicast or you will have intra-array problems. On the other hand, if your switches and routers don’t support multicast, you don’t have a choice—unicast will be your only option for configuring Windows NLB.

Comments are closed.