Systemd Service Keeps Restarting: Understanding Exit Codes, Restart Policies, and Journal Logs

You deploy your application as a systemd service. It starts, runs for a few seconds, crashes, and systemd restarts it. It starts again, crashes again, and this loop continues until systemd gives up and declares the service "failed." Or worse, the restart loop continues indefinitely, consuming resources and flooding your logs without ever producing a stable running service.

Understanding how systemd manages service lifecycle, what exit codes mean, how restart policies work, and how to read journal logs effectively is essential for diagnosing and fixing these issues.

Step 1: Check the Service Status

The first command to run is systemctl status, which shows the current state and the most recent log entries:

sudo systemctl status myapp.service

This output tells you several important things: whether the service is active, inactive, or failed; the main process ID; how long it has been running (or when it stopped); the exit code and signal that caused the last stop; and the last 10 log lines from the service.

Key status indicators: active (running) means the service is operating normally. activating (auto-restart) means the service crashed and systemd is about to restart it. failed means the service has crashed too many times and systemd has stopped trying to restart it. inactive (dead) means the service is not running and is not being restarted.

Step 2: Read the Full Journal Logs

The status output only shows the last 10 lines. To see the full log history for your service:

# Show all logs for the service
sudo journalctl -u myapp.service

# Show logs since the last boot
sudo journalctl -u myapp.service -b

# Show logs from the last hour
sudo journalctl -u myapp.service --since "1 hour ago"

# Follow logs in real-time
sudo journalctl -u myapp.service -f

# Show only error-level messages
sudo journalctl -u myapp.service -p err

Look for the last few lines before each restart. The crash reason is almost always logged in the final output before the service stops. Common patterns include uncaught exceptions with stack traces, "port already in use" errors, "permission denied" errors, "file not found" errors for configuration or data files, segmentation faults, and out-of-memory kills.

Step 3: Understand Exit Codes

When a process terminates, it returns an exit code that tells systemd why it stopped. The exit code appears in systemctl status as status=N or code=exited, status=N:

Exit code 0 — Success. The process exited cleanly. If your service exits with code 0 but should be running continuously, the application is terminating itself rather than staying in the foreground. Common cause: running a Node.js script that finishes and exits instead of a server that listens for connections.

Exit code 1 — General error. The application encountered an error and exited. Check the logs for specific error messages.

Exit code 2 — Misuse of shell command or incorrect arguments. Often indicates a typo in the ExecStart command or incorrect command-line flags.

Exit code 126 — Permission problem. The binary or script exists but is not executable. Fix with chmod +x /path/to/binary.

Exit code 127 — Command not found. The binary specified in ExecStart does not exist at the given path. Verify the path with which or type.

Exit code 137 — Killed by signal 9 (SIGKILL). The process was forcibly killed, usually by the OOM (Out of Memory) killer. Check dmesg | grep -i kill to confirm. The fix is to increase available memory, reduce the application's memory usage, or set memory limits in the service file.

Exit code 143 — Killed by signal 15 (SIGTERM). The process received a termination signal and exited. This is normal during systemctl stop but abnormal during regular operation.

Exit code 200-243 — systemd-specific errors. These indicate problems with the service configuration itself: 200 means ExecStart failed, 201 means a fork error, 203 means the exec format is wrong (trying to run a script without a shebang), 205 means the specified group does not exist, and so on.

Step 4: Review the Service File

A properly configured service file prevents most restart loop issues. Here is a production-ready template:

[Unit]
Description=My Application
Documentation=https://docs.myapp.com
After=network.target
Wants=network.target

[Service]
Type=simple
User=myapp
Group=myapp
WorkingDirectory=/opt/myapp
ExecStart=/usr/bin/node /opt/myapp/server.js
ExecReload=/bin/kill -HUP $MAINPID

# Environment
Environment=NODE_ENV=production
Environment=PORT=3000
EnvironmentFile=-/opt/myapp/.env

# Restart policy
Restart=on-failure
RestartSec=5s

# Start limits (prevent infinite restart loops)
StartLimitIntervalSec=300
StartLimitBurst=5

# Security
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/myapp/data /var/log/myapp
PrivateTmp=true

# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=myapp

[Install]
WantedBy=multi-user.target

Step 5: Restart Policies Explained

The Restart= directive controls when systemd restarts the service. The options are:

Restart=no — Never restart the service automatically. Use this for one-shot tasks or during debugging.

Restart=on-failure — Restart only when the service exits with a non-zero exit code, is killed by a signal, or times out. This is the most common production setting because it restarts after crashes but does not restart after a clean shutdown.

Restart=always — Restart the service regardless of exit status. Even systemctl stop will be followed by a restart (though systemctl handles this by disabling the restart before stopping). Use this for critical services that must always be running.

Restart=on-abnormal — Restart on signal kills, timeouts, and watchdog triggers, but not on clean exits (code 0) or non-zero exit codes.

The RestartSec=5s directive adds a 5-second delay between restart attempts, preventing the service from consuming resources during a rapid restart loop.

The StartLimitIntervalSec and StartLimitBurst directives work together as a circuit breaker. In the example above, if the service fails 5 times within 300 seconds (5 minutes), systemd stops trying to restart it and marks it as failed. This prevents infinite restart loops from consuming system resources and flooding logs.

If you need to reset the failure counter after fixing the issue:

sudo systemctl reset-failed myapp.service
sudo systemctl start myapp.service

Common Causes of Restart Loops

The application is not running in the foreground. Systemd expects the main process to stay running. If your application forks into the background (daemonizes), systemd thinks it has exited and restarts it. Either configure your application to run in the foreground or set Type=forking in the service file and specify the PID file with PIDFile=.

Missing environment variables. The service file runs with a minimal environment. Variables from your shell profile (.bashrc, .profile) are not available. Set every required variable explicitly using Environment= or EnvironmentFile=.

Wrong working directory. If your application uses relative paths for configuration files, data directories, or modules, it needs the correct working directory. Set it with WorkingDirectory=.

File permission issues. The service runs as the user specified by User=. If the application's files are owned by root but the service runs as a non-root user, permission errors cause immediate crashes. Fix ownership: sudo chown -R myapp:myapp /opt/myapp.

Port conflicts. If the port your application needs is already in use (by another instance of the same service, or by another application), the application crashes on startup. Check with ss -tlnp | grep PORT.

Systemd is a powerful service manager that handles complex lifecycle management. Understanding its restart policies, exit code interpretation, and journal logging turns frustrating restart loops into quickly diagnosed and resolved issues.

ZeonEdge provides Linux server administration, service deployment, and monitoring setup. Learn more about our infrastructure services.

Systemd Service Keeps Restarting: Understanding Exit Codes, Restart Policies, and Journal Logs

Step 1: Check the Service Status

Step 2: Read the Full Journal Logs

Step 3: Understand Exit Codes

Step 4: Review the Service File

Step 5: Restart Policies Explained

Common Causes of Restart Loops

Tags

Related Articles

AWS Budgets and Cost Anomaly Detection: Automated FinOps Guardrails

Cloud Tagging Strategy at Scale: Enforcing Cost Allocation Across 100+ AWS Accounts

Multi-Cloud Networking Costs: Transit Gateway, VPC Peering, and Cross-Cloud Egress

Ready to Transform Your Infrastructure?