Sunday, February 11, 2024

Comprehensive steps to Fixing 'Too Many File Descriptors Open' Errors in Linux

 

Introduction

The "Too many file descriptors open" error in Linux often signifies that your system has exceeded its limit on open file descriptors. This can lead to a range of issues, from application crashes to performance degradation. Understanding file descriptors, how they work, and how to manage them effectively can help you resolve this error and prevent it in the future. This guide covers the common causes of this issue, methods for diagnosing and monitoring file descriptor usage, and strategies for increasing file descriptor limits.

Understanding File Descriptors

In Linux, everything is treated as a file, including network sockets, devices, and regular files. A file descriptor is a non-negative integer that uniquely identifies an open file within a process.

Key Concepts:

  • File Descriptor (FD): An unsigned integer used by a process to identify an open file. Each process maintains a table of file descriptors.
  • Default File Descriptors: Every process starts with three default file descriptors:
    • 0: Standard Input (stdin)
    • 1: Standard Output (stdout)
    • 2: Standard Error (stderr)

Common Causes of 'Too Many File Descriptors Open' Errors

1. High Volume of Network Calls

If your application makes a large number of network calls during API hits and the downstream systems respond slowly, it can exhaust available file descriptors.

2. Resource Leaks

Resource leaks, where file descriptors are not properly closed after use, can gradually consume available file descriptors. To avoid this, use language-specific features or libraries designed to manage resources efficiently. For example, in Java, use the try-with-resources statement.

Diagnosing File Descriptor Usage

Global Usage

To check the total number of file descriptors currently open on the system, use:

total number of file descriptors open on the system awk ‘{print $1}’ /proc/sys/fs/file-nr

High File Descriptor Usage

To identify high file descriptor usage or potential issues with sockets:

  • Identify CLOSE_WAIT State:
    netstat -tonp | grep CLOSE_WAIT //CLOSE_WAIT means your program is still running, and hasn’t closed the socket
    netstat -p // connections with processId. kill -9 <P_ID> to kill the close_wait connection

Per-Process Usage
We can use the lsof command to check the file descriptor usage of a process. Let’s take the caddy web server as an example:

sudo lsof -p <tomcat_pid>
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tomcat_pid 135 tomcat_pid cwd DIR 254,1 4096 1023029 /etc/sv/caddy

“purge” unused sockets from the file table to free up space to accommodate new file descriptors.

Current FD socket usage
netstat -tonp | grep CLOSE_WAIT

Per-Session Limit
ulimit -Sn
ulimit -Hn

Per-Process Limit
grep “Max open files” /proc/$pid/limits
cat /proc/sys/fs/file-max

Increasing File Descriptor Limits
5.1. Temporarily (Per-Session) ulimit -n 4096

Per-User
appending a few lines to the /etc/security/limits.conf file and re-logging in:

Per-Service:

sudo mkdir -p /etc/systemd/system/apache.service.d/

$ sudo tee /etc/systemd/system/apache.service.d/filelimit.conf <<EOF
[Service]
LimitNOFILE=500000
EOF
$ sudo systemctl daemon-reload

$ sudo systemctl restart apache.service

Globally
fs.file-max = 500000
sysctl -p command to reload the configuration file.

No comments:

Post a Comment