loading...

Ubuntu Server 18.04 – Tracing network issues

Install PHP on CentOS 8

It’s amazing how important TCP/IP networking is to the world today. Of all the protocols in use in modern computing, it’s by far the most widespread. But it’s also one of the most annoying situations to figure out when it’s not working well. Thankfully, Ubuntu features really handy utilities you can use in order to pinpoint what’s going on.

First, let’s look at connectivity. After all, if you can’t connect to a network, your server is essentially useless. In most cases, Ubuntu recognizes just about all network cards without fail, and it will automatically connect your server or workstation to your network if it is within reach of a DHCP server. While troubleshooting, get the obvious stuff out of the way first. The following may seem like a no-brainer, but you’d be surprised how often one can miss something obvious. I’m going to assume you’ve already checked to make sure network cables are plugged in tight on both ends. Another aspect regarding cabling is that sometimes network cables themselves develop faults and need to be replaced. You should be able to use a cable tester and get a clean signal through the cable.

Routing issues can sometimes be tricky to troubleshoot, but by testing each destination point one by one, you can generally see where the problem lies. Typical symptoms of a routing issue may include being unable to access a device within another subnet, or perhaps not being able to get out to the internet, despite being able to reach internal devices. To investigate a potential routing issue, first check your routing table. You can do so with the route -n command. This command will print your current routing table information:

Viewing the routing table on an Ubuntu Server

In this example, you can see that the default gateway for all traffic is 172.16.250.1. This is the first entry on the table, which tells us that all traffic to the destination 0.0.0.0 (which is everything) leaves via 172.16.250.1. As long as ICMP traffic isn’t disabled, you should be able to ping this default gateway, and you should be able to ping other nodes within your subnet as well.

To start troubleshooting a routing issue, you would use the information shown after printing your routing table to conduct several ping tests. First, try to ping your default gateway. If you cannot, then you’ve found the issue. If you can, try running the traceroute command. This command isn’t available by default, but all you’ll have to do is install the traceroute package, so hopefully you have it installed on the server. If you do, you can run traceroute against a host, such as an external URL, to find out where the connection drops. The traceroute command should show every hop between you and your target. Each “hop” is basically another default gateway. You traverse through one gateway after another until you ultimately reach your destination. With the traceroute command, you can see where the chain stops. In all likelihood, you’ll find that perhaps the problem isn’t even on your network, but perhaps your internet service provider is where the connection drops.

DNS issues don’t happen very often, but by using a few tricks, you should be able to resolve them. Symptoms of DNS failures will usually result in a host being unable to access internal or external resources by name. Whether the problem is with internal or external hosts (or both) should help you determine whether it’s your DNS server that’s the problem, or perhaps the DNS server at your ISP.

The first step in pinpointing the source of DNS woes is to ping a known IP address on your network, preferably the default gateway. If you can ping it, but you can’t ping the gateway by name, then you probably have a DNS issue. You can confirm a potential DNS issue by using the nslookup command against the domain, such as:

nslookup myserver.local

In addition, make sure you try and ping external resources as well, such as a website. This will help you narrow down the scope of the issue.

You will also want to know which DNS server your host is sending queries to. In the past, finding out which DNS server is assigned to your host was a simple as inspecting the contents of /etc/resolv.conf. However, nowadays this file will often refer to a local resolver instead and won’t reveal the actual server requests are being sent to. To find out the real DNS server that’s assigned to your host, the following command will do the trick:

systemd-resolve --status | grep DNS Servers

Are they what you expect? If not, you can temporarily fix this problem by removing the incorrect name server entries from this file and replacing them with the correct IP addresses. The reason I suggest this as a temporary fix and not a permanent one is because the next thing you’ll need to do is investigate how the invalid IP addresses got there in the first place. Normally, these are assigned by your DHCP server. As long as your DHCP server is sending out the appropriate name server list, you shouldn’t run into this problem. If you’re using a static IP address, then perhaps there’s an error in your Netplan config file.

A useful method of pinpointing DNS issues in regard to being unable to resolve external sites is to temporarily switch your DNS provider on your local machine. Normally, your machine is going to use your external DNS provider, such as the one that comes from your ISP. Your external DNS server is something we went through setting up in Chapter 7, Setting up Network Services, specifically the forwarders section of the configuration for the bind9 daemon. The forwarders used by the bind9 daemon is where it sends traffic if it isn’t able to resolve your request based on its internal list of hosts.

You could consider bypassing this by changing your local workstation’s DNS name servers to Google’s, which are 8.8.8.8 and 8.8.4.4. If you’re able to reach the external resource after switching your name servers, you can be reasonably confident that your forwarders are the culprit. I’ve actually seen situations in which a website has changed its IP address, but the ISP’s DNS servers didn’t get updated quickly enough, causing some clients to be unable to reach a site they need to perform their job. Switching everyone to alternate name servers (by adjusting the forwarders option, as we did in Chapter 7Setting up Network Services) was the easiest way they could work around the issue.

Some additional tools to consider while checking your server’s ability to resolve DNS entries are dig and nslookup. You should be able to use both commands to test your server’s DNS settings. Both commands are used with a host name or domain name as an option. The dig command will present you with information regarding the address (A) record of the DNS zone file responsible for the IP address or domain. The host command should return the IP address of the host you’re trying to reach. The dig command is also useful for troubleshooting caching. The first time you use the dig command, you’ll see a response time (in milliseconds). The subsequent time you run it, the response time should be much shorter:

Output of the dig and host commands

Hardware support is also critical when it comes to networking. If the Linux kernel doesn’t support your network hardware, then you’ll likely run into a situation where the distribution doesn’t recognize or do anything when you insert a network cable, or in the case of wireless networking, doesn’t show any nearby networks despite there being one or more. Unlike the Windows platform, hardware support is generally baked right into the kernel when it comes to Linux. While there are exceptions to this, the Linux kernel shipped with a distribution typically supports hardware the same age as itself or older. In the case of Ubuntu 18.04 LTS (which was released in April of 2018), it’s able to support hardware released as of the beginning of 2018 and older. Future releases of Ubuntu Server will publish hardware entitlement updates, which will allow Ubuntu Server 18.04 to support newer hardware and chip-sets once it comes out. Therefore, it’s always important to use the latest installation media when rolling out a new server. Typically, Ubuntu will release several point releases during the life of a supported distribution, such as 18.04.1, 18.04.2, and so on. As long as you’re using the latest one, you’ll have the latest hardware support that Ubuntu has made available at the time.

In other cases, hardware support may depend on external kernel modules. In the case of a missing hardware driver, the first thing you should try when faced with network hardware that’s not recognized is to look up the hardware using a search engine, typically the search terms <hardware name> Ubuntu will do the trick. But, what do you search for? To find out the hardware string for your network device, try the lspci command:

    lspci | grep -i net  

The lspci command lists hardware connected to your server’s PCI bus. Here, we’re using the command with a case insensitive grep search for the word net:

    lspci |grep -i net  

This should return a list of networking components available in your server. On my machine, for example, I get the following output:

    
      01:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)
    
    
      02:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a)  

As you can see, I have a wired and wireless network card on this machine. If one of them wasn’t working, I could search online for information by searching for the hardware string and the keyword Ubuntu which should give me results pertaining to my exact hardware. If a package is required to be installed, the search results will likely give me some clues as to which package I need to install. Without having network access though, the worst-case scenario is that I may have to download the package from another computer and transfer it to the server via a flash drive. That’s certainly not a fun thing to need to do, but it does work if the latest Ubuntu installation media doesn’t yet offer full support for your hardware.

Another potential problem point is DHCP. When it works well, DHCP is a wonderfully magical thing. When it stops working, it can be frustrating. But generally, DHCP issues often end up being a lack of available IP addresses, the DHCP daemon (isc-dhcp-server) not running, an invalid configuration, or hosts that have clocks that are out of sync (all servers should have the ntp package installed).

If you have a server that is unable to obtain an IP address via DHCP and your network utilizes a Linux-based DHCP server, check the system log (/var/log/syslog) for events related to dhcpd. Unfortunately, there’s no command you can run that I’ve ever been able to find that will print how many IP address leases your DHCP server has remaining, but if you run out, chances are you’ll see log entries related to an exhausted pool in the system log. In addition, the system log will also show you attempts from your nodes to obtain an IP address as they attempt to do so. Feel free to use tail -f against the system log, to watch for any events relating to DHCP leases.

In some cases, a lack of DHCP leases being available can come down to having a very generous lease time enabled. Some administrators will give their clients up to a week for the lease time, which is generally unnecessary. A lease time of one day is fine for most networks, but ultimately the lease time you decide on is up to you. In Chapter 7Setting up Network Services, we looked at configuring our DHCP server, so feel free to refer to that chapter if you need a refresher on how to configure the isc-dhcp-server daemon.

Although it’s probably not the first thing you’ll think of while facing DHCP issues, hosts having out of sync clocks can actually contribute to the problem. DHCP requests are timestamped on both the client and the server, so if the clock is off by a large degree on one, the timestamps will be off as well, causing the DHCP server to become confused. Surprisingly, I’ve seen this come up fairly often. I recommend standardizing NTP across your network as early on as you can. DHCP isn’t the only service that suffers when clocks are out of sync, file synchronization utilities also require accurate time. If you ensure NTP is installed on all of your clients and it’s up to date and working, you should be in good shape. Using configuration management utilities such as Ansible to ensure NTP is not only configured, but is running properly on all the machines in your network, will only benefit you.

Of course, there are many things that can go wrong when it comes to networking, but the information here should cover the majority of issues. In summary, troubleshooting network issues generally revolves around ping tests. Trying to ping your default gateway, tracing failed endpoints with traceroute, and troubleshooting DNS and DHCP will take care of a majority of issues. Then again, faulty hardware such as failed network cards and bad cabling will no doubt present themselves as well.

Comments are closed.

loading...