loading...

Nginx – Troubleshooting Tips and FAQ

Installing Apache On CentOS 8

Troubleshooting a web server is not fun, and this is mostly because of the pressure of the time ticking by while it’s offline. The pressure mounts even further if the server remains down for a longer period. No business likes to suffer losses due to hardware or configuration issues. It is imperative that you baseline your servers and make yourself aware of the traffic when the going is good. That way, it makes it simpler to troubleshoot when the going gets tough. In this chapter, you will learn about the troubleshooting mindset and how isolation helps in troubleshooting.

First, What You Should Not Do

Often, when the server acts up, one of the common mistakes is to check the browser’s (or client-side) error message. The errors that you see in a browser are usually generic messages sent by the server. It is considered good practice to hide detailed error messages from the public, and so those generic messages are not usually helpful for troubleshooting a server side issue. The server normally hides the details in logs, which should be your starting point.

Moreover, following a direction that you don’t understand might apparently fix your problem but in most likelihood is not going to instill the confidence that you have taken the right steps.

First Commandment of Troubleshooting: Isolate the Issue

While troubleshooting, it is best to start by isolating the issue, identifying the root cause, and then fixing the problem by introducing changes. Depending on the situation this can be a very easy or a very difficult thing to do. A few scenarios should help in learning some basic troubleshooting skills.

Scenario 1: Page Cannot Be Displayed in the Browser

Let’s set up a new server block

and troubleshoot the issues one by one until the problem is fixed. The server block in this case is not actually wrong. It is just that it needs additional actions on your side so that it starts working.

Start by logging on to WFE1 server ( ssh -p 3026 user1@127.0.0.1) and change the main.conf ( sudo vi /etc/nginx/conf.d/main.conf) file like this:

server {
    listen       90;
    server_name  localhost;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }
}

Save the configuration

and reload using sudo nginx -s reload. Ensure that you can browse the website locally after changes:

$ curl localhost:90
wfe1.localdomain

Now, try http://127.0.0.1:8006/ using your host machine. Does it work? Ideally, it shouldn’t. But why is that so, and how can you ascertain the root cause?

  • You might notice that the request fails almost immediately and appears that the request is not even reaching the server. To ensure that is the case, check the access logs and you will find that the request is not even reaching the server. (You can tail the logs and make requests. If you don’t see anything in the logs, it will give you a clue that the request is not really making it to the server.)

  • So, if the request is not reaching, could it be that the port is not allowed (the server block is fairly simple and doesn’t really have too many variables)? To test it, you can use telnet like this:

telnet 127.0.0.1 8006
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.

As you can see, the connection gets closed immediately. The conclusion is that there is something wrong with the connection, and your host is not even allowing the connection to the WFE1 server. A quick look at the network configuration and you can see that the guest port is 80 whereas the configuration says 90. Change the port as you can see in Figure 12-1 and try again.

Figure 12-1. Change the guest port to 90 for WFE1
  • telnet will now work and the connection won’t close automatically.

telnet 127.0.0.1 8006
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

To get out of the telnet prompt, type ^] and hit enter.

Try browsing to localhost:8006 from the host again. The behavior will now be different. The page will take a long time before it errors out. What does this imply? Telnet works on port 90 and you have checked it already. So, why does the page not render? If you make a quick educated guess about what all things can be in between, you will know that a proxy server or a firewall can make this happen. Since there is no proxy in this setup, let’s check the firewall.

Just for testing (this is not recommended in production), let’s disable the firewall by running sudo systemctl stop firewalld. Refresh localhost:8006 and this time it should work. Great! So, you know it is because of firewall. Issue is isolated. Start the firewall service again by running
sudo systemctl start firewalld . Now, instead of stopping the firewall, a better solution would be to create a firewall rule that allows port 90. Do that using sudo firewall-cmd –zone=public –add-port=90/tcp –permanent and your website will start working as expected.

Isolation, as you can see, has helped tremendously in giving a direction to this troubleshooting session. Not only that, you can remain confident of what you have done, since you have not shot an arrow in the dark after a random search.

Scenario 2: Conflicting Ports

In this scenario you will learn about troubleshooting conflicting ports. Change your configuration like this:

server {
    listen       3306;
    server_name  localhost;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }
}

After saving the configuration, execute
sudo kill nginxto stop nginx. Try starting nginx and you will an error message like this:

$ sudo nginx
nginx: [emerg] bind() to 0.0.0.0:3306 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:3306 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:3306 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:3306 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:3306 failed (98: Address already in use)
nginx: [emerg] still could not bind()

The error message is evidently telling you that it doesn’t like the port. To figure out which application has grabbed that port, you can run the netstat command:

$ sudo netstat -nlp | grep 3306
tcp6       0      0 :::3306                 :::*                    LISTEN      1264/mysqld      

netstatis a built-in tool that can show you a lot of information about the network connections, routing tables, interface statistics, and more. The output reveals the application ( mysqld) that has been listening on 3306. One way to resolve this issue is to change the port in your configuration. Another way would be to stop and remove mysqld from WFE1. Based on your requirement, you can decide which way is better.

A key lesson that needs to be highlighted in this scenario is that a good web administrator knows a lot about tools that are at his disposal. The more tools and utilities you know, the easier it would be for you to isolate the issue. Explore the tools in advance so that you can use it when needed.

Scenario 3: Bad Permissions

Bad permissions on the folder can lead to a variety of errors that are hard to troubleshoot. Typically, the end result would be 404 and it would mean that the file was not found. When you check out server, your file might already be existing. In these cases, it is recommended to check out your access logs (use nginx -V to find your access log path) and file permissions of the directory. The following command will give you permissions in a recursive fashion:

$ namei -om /etc/nginx/conf.d/main.conf
f: /etc/nginx/conf.d/main.conf
 dr-xr-xr-x root root /
 drwxr-xr-x root root etc
 drwxr-xr-x root root nginx
 drwxr-xr-x root root conf.d
 -rw-r--r-- root root main.conf

In times of distress, logs are your best friends. Ensure that you are logging at the highest level during your troubleshooting session. Reproduce the error; read the logs; and more often than not, you will have decent pointers to act upon.

Scenario 4: Bad Configuration

Nginx command line has a switch -t that tests the configuration for any syntactical error. Keep in mind that this switch only takes care of syntax issues. There are a few things that it cannot test. For example, if you have a typo in your hostname, the switch will have no way to figure out if the name is correct or not.

nginx -tis one of the things that you take with a grain of salt. Run it to ensure that there are no syntactical and other common errors. But don’t bet all you have on it. Certain settings related to configuration might not kick in when you say nginx -s reload. If you have any doubts, restart Nginx and test your expected output appropriately.

Scenario 5: Rewrite Rules

Rewrites happen all the time in Nginx and yet they are not logged by default. This can create a lot of confusion while troubleshooting. When you are seeing 404 or unexpected pages, ensure that the rewrite_log directive is set to on.

server {
        #snipped
        error_log    /var/logs/nginx/site.com.error.log;
        rewrite_log on;
        #snipped
}

rewrite_log directive just sets a flag. When turned on, it will send rewrite related log messages with [notice] level and can help you tremendously in understanding what is going on within the hoods. Once you turn it on, look for messages in the configured log file.

Scenario 6: Log Only Your Requests

When you set the log level to debug, your error logs will log tremendous amounts of information and it might become overwhelming to troubleshoot if yours is a public facing website with a lot of traffic. To avoid it, you can set debug_connection directive to your public IP. This way, only your requests will be logged. The debug_connection directive is configured in your events block and looks like so:

events{
        debug_connection x.x.x.x;
}

Important Tools for Web Administrators

As mentioned earlier, a web administrator

should explore and learn about as many tools as possible. The tools help in isolating the issues quicker. In this section you will find a list of tools that could prove useful in different scenarios.

ping

Send ICMP ECHO_REQUEST packets

to network hosts. Useful to check if the host is reachable and which IP it is pointing to.

traceroute

It displays the route and measures the delay in packets across a network. To use it, simply type
traceroute sitename.com

top

The top program

provides a dynamic real-time view of a running system. It can display system summary information as well as a list of processes or threads currently being managed by the Linux kernel.

htop

It is a much more advanced version of top and a lot more configurable. It gives you an overall picture (Figure 12-2), and it is easily configurable. Use
sudo yum install htopto install.

Figure 12-2. htop. Notice the function keys available in the bottom row

atop

Similar to top and htop, but has logging functionality for long-term evaluation and analysis. Use sudo yum install
atopto install.

uptime

uptime

gives a one-line display of the following information: the current time; how long the system has been running; how many users are currently logged on; and the system load averages for the past 1, 5, and 15 minutes.

free

This command displays the total amount of free and used physical and swap memory in the system, as well as the buffers and caches used by the kernel.

ifconfig or ip addr

This is used to get more details and configure the network interfaces.

$ ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.2.6  netmask 255.255.255.0  broadcast 10.0.2.255
        inet6 fe80::a00:27ff:fe90:7e9a  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:90:7e:9a  txqueuelen 1000  (Ethernet)
        RX packets 19143  bytes 13860101 (13.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 15411  bytes 3249752 (3.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 128  bytes 10250 (10.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 128  bytes 10250 (10.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ulimit

It is not usual, but what if a single user starts too many processes so that the system becomes unusable for everyone else? The ulimit command can be helpful in getting and setting the limits of a system.

Use ulimit -a to know the current limits:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 3899
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3899
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

nslookup

Nslookup

is a program to query Internet domain name servers. You can use it to get the IP address of the hostname:

$ nslookup google.in
Server:        192.168.1.1
Address:       192.168.1.1#53

Non-authoritative answer:
Name:   google.in
Address: 216.58.197.68

powertop

powertop

(Figure 12-3) is a program that helps to diagnose various issues with power consumption and power management. It also has an interactive mode allowing one to experiment with various power management settings. Use sudo yum install powertop to install.

Figure 12-3.
powertop
. Use tabs to switch between different screens

iotop

iotop (Figure 12-4) is helps to diagnose issues with IO. Use sudo yum install iotop to install.

Figure 12-4. iotop is useful in analyzing IO related issues

iptraf

iptraf
is an IP LAN monitor that generates various network statistics including TCP info, UDP counts, ICMP, and OSPF information, Ethernet load info, node stats, IP checksum errors, and others. You can install it using sudo yum install iptraf. To execute it, use sudo iptraf-ng

tcpdump

Tcpdump

prints out a description of the contents of packets on a network interface. It can be run with the -w flag, which causes it to save the packet data to a file for later analysis. The following command, for instance, would print all passing packets:

sudo tcpdump

WireShark

WireShark

is one of the most famous network protocol analyzers and has a GUI that makes visualizing network traffic a lot easier. It can be downloaded from
https://www.wireshark.org/

Nagios

Nagios

is monitoring software that helps you monitor many servers together. It can also alert you when things go wrong. It is one of the most famous monitoring solutions available, is open sourced, and has a plethora of plug-ins available.

zabbix

Zabbix

is an open source infrastructure monitoring solution. It can use most databases out there to store the monitoring statistics. The Core is written in C and has a front end in PHP. If you don’t like installing an agent, Zabbix might be an option for you.

w

A seemingly simple, but important command is w. It joins the output of uptime, along with the information about everyone logged on to the server.

16:33:16 up  8:03,  1 user,  load average: 0.00, 0.01, 0.05
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
user1    pts/0    10.0.2.2         08:44    4.00s  0.19s  0.00s w

lsof

Another built-in super powerful tool is
lsof. It is an acronym for List Open Files and as the name suggests, it gives you a list of all open files and network connections. One of the main reasons for using this command is when a disk cannot be unmounted and displays the error that files are being used or opened.

With this command you can easily identify which files are in use. You can use grep or other similar filters to narrow your list to show only files opened by any process or user. You can then kill the process if needed.

Common Pitfalls to Avoid

New and old users alike can run into pitfalls. Nginx administrators have been often found making some of the following mistakes. Read the following section carefully to avoid common configuration issues and mistakes.

Chmod 777

Don’t use 777, ever. It has been mentioned earlier in the book as well, but it is worth cautioning you again. If you ever feel like using it, most likely you are not aware of what’s going on. Try to isolate, identify, and fix the problem instead of doing chmod 777. You can use the following command to check the directory hierarchy for missing permissions:

namei -om /path/of/directory

Having Root Inside Location Block

If you have a configuration file that looks like the following, think again. Syntactically, there is nothing wrong here. But, having a root directive in each location block will imply that if there is a location block without root directive, there will be no root path for that location.

server {
    server_name www.site.com;
    location / {
        root /var/www/nginx-default/;
        # [...]
      }
    location /foo {
        root /var/www/nginx-default/;
        # [...]
    }
    location /bar {
        root /var/www/nginx-default/;
        # [...]
    }
}

Instead, have a common root directive and override where necessary, like so:

server {
    server_name www.site.com;
    root /var/www/nginx-default/;
    location / {
          # [...]
      }
    location /foo {
          # [...]
    }
    location /bar {
         # [...]
    }
}

This caution also applies to index directive.

Using if Blocks

It is one of those blocks in Nginx that are more frequently misused than used. An if block creates a block similar to location block, and if the condition matches, the inner block is executed. This execution helps in assigning the configuration inside the if configuration for the designated request. In general, it is better to avoid an if directive. That said, there are a couple of things that are 100 percent safe inside the if directive.

  • return …;

  • rewrite… last;

A couple of problematic configurations to drive the point home:

# only second header will be present in response
# not really bug, just how it works
location /only-one-if {
    set $true 1;
    if ($true) {
        add_header X-First 1;
    }
    if ($true) {
        add_header X-Second 2;
    }
    return 204;
}

Consider the following configuration

. In this configuration if is evaluated every time there is a request to site.com or *.site.com. This is inefficient since the evaluation will happen for each and every request:

server {
    server_name site.com *.site.com;
        if ($host ∼* ^www\.(.+)) {
            set $raw_domain $1;
            rewrite ^/(.*)$ $raw_domain/$1 permanent;
        }
        # [...]
    }
}

To avoid evaluation on every request, you can split the configuration into two like so and get the same result:

server {
    server_name www.site.com;
    return 301 $scheme://site.com$request_uri;
}
server {
    server_name site.com;
    # [...]
}

You should also avoid using if to check the existence of files or directories. try_files directive is a more suitable choice in these cases. The following is an example of if block that you should avoid:

server {
    root /var/www/site.com;
    location / {
        if (!-f $request_filename) {
            break;
        }
    }
}

Replace such blocks with:

server {
    root /var/www/site.com;
    location / {
        try_files $uri $uri/ /index.html;
    }
}

Passing Uncontrolled Requests to PHP

If you pass all your PHP requests

directly to the FastCGI back end, you are at risk. This is because the default PHP configuration tries to guess which file you want executed in case the actual file doesn’t exist. For example, a request to /path/to/url/malicious.jpg/file.php might lead to execution of embedded code inside a malicious.jpg file. A lot of sites allow uploading pictures, so it is easy to upload a picture and get your own code to run on the server using this vulnerability. A typical configuration that leads to this looks as follows:

location ∼* \.php$ {
   fastcgi_pass backend;
   # more config ...;
}

The preceding code block allows all requests ending with PHP to be sent directly to the FastCGI back end. To avoid this pitfall, you can do the following:

  • Set cgi.fix_pathinfo=0 in php.ini (this will tell PHP to avoid processing the files if not found)

  • Pass only the application’s PHP file to Nginx like the following:

location ∼* (file_a|file_b|file_c)\.php$ {
    fastcgi_pass backend;
    # [...]
}
  • Disable execution of any code from the upload directories:

location /uploaddirectory {
    location ∼ \.php$ {return 403;}
    # [...]
}
  • Use try_files directives:

location ∼* \.php$ {
    try_files $uri =404;
    fastcgi_pass backend;
    # [...]
}

Rewrite Issues

You should avoid writing complex regular expressions. Try to keep them as neat and clean as possible. Also be aware that rewrites are relative by default, so it becomes important to rewrite using an absolute path. Add http:// wherever necessary and intended.

Using Hostname in Configuration

Never use a hostname in a listen directive since it might not be able to resolve during boot time. It is preferable to use IP addresses that need to be bound. This will help Nginx even more since it will not have to look up the address.

Frequently Asked Questions

This section will answer some of the frequently asked questions across popular websites. Instead of replicating the entire content, you will be pointed to those links for further reading. A small summary will be presented with links wherever appropriate.

“Is there an option to compare Nginx and Nginx Plus?”

To summarize: You use Open Source Nginx
for any site or service that is yearning for the best web server. Nginx Plus, in comparison, offers support and extra functionality that is often required by organizations.

You can find a feature matrix available at
https://www.nginx.com/products/feature-matrix/

“Is there a location for sample configurations?”

Yes. In fact, Nginx has a wiki that contains a plethora of samples that might assist you with various common configurations. These include configuration samples for WordPress, FastCGI, Caching, Log Rotation, and more. Read about it here:


https://www.nginx.com/resources/wiki/start/

Scroll a bit to the Pre-canned Configurations section and you will find a huge list of configurations to get you up to speed instantly. See Figure 12-5.

Figure 12-5. A partial list of pre-canned configurations on Nginx website

“How can I redirect from www to no-www and vice versa?”

This is one of the most common requests and it is an important one. Your SEO depends on this and web administrators often like to stick with just one of the URLs. There is no right or wrong approach here, since a lot depends on various factors. There are famous examples like
http://twitter.com
where they don’t use www prefix and others like
http://www.facebook.com
. To configure it, read the following discussion on StackOverflow:

http://stackoverflow.com/questions/7947030/nginx-no-www-to-www-and-www-to-no-www

“How can I write all http requests to https while maintaining a sub-domain?”

You can read more about this here:

http://serverfault.com/questions/67316/in-nginx-how-can-i-rewrite-all-http-requests-to-https-while-maintaining-sub-dom                

“How can I find which flags Nginx was compiled with?”

This one is easy and has been discussed throughout the book. Simply execute nginx -V. There are other variations that will help you compare different configuration files

:

http://serverfault.com/questions/223509/how-can-i-see-which-flags-nginx-was-compiled-with

“Is there any mechanism for detailed debugging?”

Yes. Nginx provides extensive debugging support. By default, it is turned off but it can be activated if you have compiled Nginx with –with-debug argument. Read more about detailed debugging at the following site:

https://www.nginx.com/resources/wiki/start/topics/tutorials/debugging/

“How many third-party modules does Nginx have?”

Plenty! There are a lot of third-party modules listed at the Nginx website, and new ones keep popping up. You can find the detailed list here:

https://www.nginx.com/resources/wiki/modules/

“What happens if I have Nginx Plus and the license expires?”

After your support contract expires, you are no longer licensed to use Nginx Plus or obtain support from Nginx, Inc. Access to Nginx Plus updates will be prohibited, and you must stop and delete your Nginx Plus instances. In short, you should contact them and renew in order to continue using Nginx Plus.

“Is there design or consulting help available?”

Yes. You can seek help in architecture, design or configuration using the Professional Services team at Nginx. Details can be found here:

https://www.nginx.com/services/

Summary

This chapter dealt with some troubleshooting scenarios and also a typical troubleshooting approach should you need help in case of desperate situations. You must keep adding various tools to your support toolbelt so that you can use them when the time is right. During pressure scenarios the thing that helps most is your knowledge about the infrastructure and how things are placed overall. The better you know your infrastructure, the better suited you will be to fix the issue. Keep baselining, learn new tools, engage with the community, and push the limits.

Happy learning!

Comments are closed.

loading...