Linux Common Issues

Quick reference for resolving common Linux issues encountered in MSP environments.

Package Management Issues

Cannot Acquire Lock (apt/dpkg)

Symptoms

Error: “Could not get lock /var/lib/dpkg/lock-frontend”
Package installation fails
Updates blocked

Troubleshooting Steps

1. Check for Running Package Managers

Lang: bash

# Check for running apt/dpkg processes
ps aux | grep -E 'apt|dpkg'

# Find processes using dpkg lock
lsof /var/lib/dpkg/lock-frontend

2. Wait for Unattended Upgrades

Lang: bash

# Check if unattended-upgrades is running
systemctl status unattended-upgrades

# Wait for it to complete or stop it
systemctl stop unattended-upgrades

3. Remove Stale Locks (Last Resort)

Lang: bash

# Only if no package manager is actually running
sudo rm /var/lib/dpkg/lock-frontend
sudo rm /var/lib/dpkg/lock
sudo rm /var/cache/apt/archives/lock
sudo dpkg --configure -a
sudo apt update

Broken Packages

Symptoms

Package installation fails with dependency errors
“unmet dependencies” messages
System in inconsistent state

Fix Steps

Lang: bash

# Fix broken packages
sudo apt --fix-broken install

# Reconfigure packages
sudo dpkg --configure -a

# Clean package cache
sudo apt clean
sudo apt autoclean

# Update and try again
sudo apt update
sudo apt upgrade

Repository Issues

Symptoms

“Failed to fetch” errors
GPG key errors
Repository not found

Troubleshooting

Lang: bash

# Check repository configuration
cat /etc/apt/sources.list
ls /etc/apt/sources.list.d/

# Test repository connectivity
ping archive.ubuntu.com
curl -I http://archive.ubuntu.com

# Fix GPG key issues
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys [KEY_ID]

# Or for newer systems
sudo gpg --keyserver keyserver.ubuntu.com --recv-keys [KEY_ID]

Service Management (systemd)

Service Won’t Start

Symptoms

Service fails to start
Status shows “failed” or “inactive”
Error messages in logs

Diagnostic Commands

Lang: bash

# Check service status
systemctl status servicename.service

# View recent logs
journalctl -u servicename.service -n 50

# Check for errors since last boot
journalctl -u servicename.service -b

# View real-time logs
journalctl -u servicename.service -f

Common Fixes

Lang: bash

# Reload systemd after config changes
systemctl daemon-reload

# Reset failed state
systemctl reset-failed servicename.service

# Check service file syntax
systemd-analyze verify /etc/systemd/system/servicename.service

# Restart service with verbose output
systemctl restart servicename.service -l

Service Keeps Restarting

Symptoms

Service status shows “activating” repeatedly
Rapid restarts in logs
Application crashes immediately

Investigation

Lang: bash

# Check restart configuration
systemctl cat servicename.service | grep -i restart

# View crash logs
coredumpctl list servicename
coredumpctl info [PID]

# Check resource limits
systemctl show servicename.service | grep -i limit

# Test service in foreground
# (Stop the service first, then run the actual command manually)

Enable Service on Boot

Symptoms

Service doesn’t start after reboot
Manual start required

Fix

Lang: bash

# Enable service
systemctl enable servicename.service

# Enable and start immediately
systemctl enable --now servicename.service

# Verify it's enabled
systemctl is-enabled servicename.service

# List all enabled services
systemctl list-unit-files --state=enabled

Disk Space Issues

Root Filesystem Full

Symptoms

“No space left on device” errors
Services failing to start
Cannot create files

Quick Diagnosis

Lang: bash

# Check disk usage
df -h

# Find large directories
du -h / | sort -rh | head -20

# Find large files
find / -type f -size +100M -exec ls -lh {} \;

# Check inode usage (sometimes the problem)
df -i

Common Culprits

1. Log Files

Lang: bash

# Check log directory size
du -sh /var/log/*

# Find large log files
find /var/log -type f -size +100M

# Truncate (don't delete) active log files
truncate -s 0 /var/log/large-file.log

# Clean old journal logs
journalctl --vacuum-time=7d
journalctl --vacuum-size=500M

2. Package Cache

Lang: bash

# Clean apt cache (Ubuntu/Debian)
apt clean
apt autoclean

# Clean yum cache (RHEL/CentOS)
yum clean all

# Remove old kernels (Ubuntu)
apt autoremove --purge

3. Docker/Container Images

Lang: bash

# Remove unused Docker resources
docker system prune -a

# Remove dangling volumes
docker volume prune

4. Temporary Files

Lang: bash

# Clean /tmp (be careful)
find /tmp -type f -atime +7 -delete

# Clean user cache
rm -rf ~/.cache/*

Partition Mounted Read-Only

Symptoms

Cannot write to filesystem
“Read-only file system” errors
Usually after filesystem errors

Fix Steps

Lang: bash

# Check filesystem for errors (requires unmount or single-user mode)
fsck /dev/sda1

# Remount read-write
mount -o remount,rw /

# Check for hardware issues
dmesg | grep -i error
smartctl -a /dev/sda

Permission Issues

Permission Denied

Symptoms

Cannot access files or directories
“Permission denied” errors
Scripts won’t execute

Troubleshooting

Lang: bash

# Check file permissions
ls -la /path/to/file

# Check ownership
ls -l /path/to/file

# Check if file is executable
file /path/to/file

Common Fixes

Lang: bash

# Fix ownership
chown user:group /path/to/file

# Fix permissions
chmod 644 /path/to/file      # rw-r--r--
chmod 755 /path/to/script    # rwxr-xr-x
chmod 700 /path/to/private   # rwx------

# Fix directory permissions recursively
chown -R user:group /path/to/directory
chmod -R 755 /path/to/directory

# Make script executable
chmod +x /path/to/script.sh

SELinux Blocking Access

Symptoms

Permission denied despite correct ownership/permissions
SELinux in enforcing mode
AVC denial messages in logs

Diagnosis

Lang: bash

# Check SELinux status
getenforce

# Check for denials
ausearch -m avc -ts recent

# Check file context
ls -Z /path/to/file

# Check process context
ps auxZ | grep process_name

Fixes

Lang: bash

# Temporarily set to permissive (for testing only)
setenforce 0

# Restore default context
restorecon -v /path/to/file

# Change context permanently
semanage fcontext -a -t httpd_sys_content_t "/web/content(/.*)?"
restorecon -Rv /web/content

# Re-enable enforcing
setenforce 1

SSH Connection Issues

Connection Refused

Symptoms

“Connection refused” error
Cannot connect remotely
SSH port not responding

Troubleshooting

Lang: bash

# Check if SSH is running
systemctl status sshd

# Check if listening on port
netstat -tlnp | grep :22
ss -tlnp | grep :22

# Check firewall
iptables -L -n | grep 22
firewall-cmd --list-all

# Test locally
ssh localhost

Fixes

Lang: bash

# Start SSH service
systemctl start sshd
systemctl enable sshd

# Open firewall port
firewall-cmd --permanent --add-service=ssh
firewall-cmd --reload

# Or with iptables
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

Permission Denied (publickey)

Symptoms

SSH key authentication fails
Falls back to password (if enabled)
“Permission denied (publickey)” error

Check Server Side

Lang: bash

# Verify SSH directory permissions
ls -la ~/.ssh

# Should be:
# ~/.ssh = 700
# ~/.ssh/authorized_keys = 600

# Fix permissions
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

# Check authorized_keys content
cat ~/.ssh/authorized_keys

# Check SSH logs
tail -f /var/log/auth.log  # Ubuntu/Debian
tail -f /var/log/secure    # RHEL/CentOS

Check Client Side

Lang: bash

# Verify private key permissions
ls -la ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa

# Test with verbose output
ssh -v user@hostname

# Specify key explicitly
ssh -i ~/.ssh/specific_key user@hostname

Too Many Authentication Failures

Symptoms

“Received disconnect: Too many authentication failures”
Multiple key attempts failing

Fix

Lang: bash

# Disable automatic key loading
ssh -o IdentitiesOnly=yes user@hostname

# Or add to ~/.ssh/config
Host hostname
    IdentitiesOnly yes
    IdentityFile ~/.ssh/specific_key

Network Issues

Network Interface Not Starting

Symptoms

No network connectivity
Interface shows “down”
IP address not assigned

Troubleshooting

Lang: bash

# List interfaces
ip link show

# Check interface status
ip addr show eth0

# Check NetworkManager status
systemctl status NetworkManager

# Check for errors
journalctl -u NetworkManager -n 50

Fixes

Lang: bash

# Bring interface up
ip link set eth0 up

# Restart networking (Ubuntu 18.04+)
systemctl restart systemd-networkd

# Restart NetworkManager
systemctl restart NetworkManager

# Renew DHCP lease
dhclient -r
dhclient eth0

DNS Resolution Failing

Symptoms

Cannot resolve hostnames
“Name or service not known” errors
Ping IP works, ping hostname fails

Diagnosis

Lang: bash

# Check DNS configuration
cat /etc/resolv.conf

# Test DNS resolution
nslookup google.com
dig google.com
host google.com

# Test with specific DNS server
nslookup google.com 8.8.8.8

Fixes

Lang: bash

# Edit resolv.conf (temporary - will be overwritten)
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf

# For systemd-resolved systems
systemctl restart systemd-resolved

# Check if resolv.conf is a symlink
ls -la /etc/resolv.conf

# If using NetworkManager
nmcli connection modify "Connection Name" ipv4.dns "8.8.8.8 1.1.1.1"
nmcli connection up "Connection Name"

High CPU/Memory Usage

Identify Resource Hog

Symptoms

System sluggish
High load average
Applications slow or unresponsive

Diagnostic Commands

Lang: bash

# Real-time process monitoring
top
htop

# Sort by CPU
top -o %CPU

# Sort by memory
top -o %MEM

# List top processes
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head

# Check load average
uptime
cat /proc/loadavg

Memory Investigation

Lang: bash

# Memory usage summary
free -h

# Detailed memory info
cat /proc/meminfo

# Per-process memory
ps aux | awk '{print $6/1024 " MB\t" $11}' | sort -n

# Check for memory leaks
valgrind --leak-check=full ./program

Common Fixes

Lang: bash

# Kill misbehaving process
kill -15 [PID]      # Graceful
kill -9 [PID]       # Force

# Restart service
systemctl restart servicename

# Clear page cache (safe)
sync; echo 1 > /proc/sys/vm/drop_caches

# Clear dentries and inodes (safe)
sync; echo 2 > /proc/sys/vm/drop_caches

# Clear all caches (safe)
sync; echo 3 > /proc/sys/vm/drop_caches

Boot Issues

System Won’t Boot

Common Scenarios and Fixes

Grub Rescue Mode

Lang: bash

# List partitions
ls

# Find root partition
ls (hd0,1)/
ls (hd0,2)/

# Set root and boot
set root=(hd0,1)
set prefix=(hd0,1)/boot/grub
insmod normal
normal

Emergency Mode

Lang: bash

# At grub menu, edit entry (press 'e')
# Add to linux line:
systemd.unit=rescue.target

# Or for single user mode:
single

# Or:
systemd.unit=emergency.target

Repair Grub

Lang: bash

# From live USB
sudo mount /dev/sda1 /mnt
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys
sudo chroot /mnt

# Reinstall grub
grub-install /dev/sda
update-grub

# Exit and reboot
exit
sudo reboot

User Account Issues

User Locked Out

Symptoms

“Account locked” message
Too many failed login attempts
pam_faillock active

Unlock Account

Lang: bash

# Check if account is locked
passwd -S username

# Unlock account
passwd -u username

# For faillock (failed login attempts)
faillock --user username --reset

# Check failed attempts
faillock --user username

Forgotten Root Password

Recovery Steps

Lang: bash

# 1. Reboot and enter single user mode at grub
# 2. At grub, press 'e' and add to linux line:
rw init=/bin/bash

# 3. Mount filesystem
mount -o remount,rw /

# 4. Change password
passwd root

# 5. Reboot
exec /sbin/init

Table of Contents

Package Management Issues

Cannot Acquire Lock (apt/dpkg)

Broken Packages

Repository Issues

Service Management (systemd)

Service Won’t Start

Service Keeps Restarting

Enable Service on Boot

Disk Space Issues

Root Filesystem Full

Partition Mounted Read-Only

Permission Issues

Permission Denied

SELinux Blocking Access

SSH Connection Issues

Connection Refused

Permission Denied (publickey)

Too Many Authentication Failures

Network Issues

Network Interface Not Starting

DNS Resolution Failing

High CPU/Memory Usage

Identify Resource Hog

Boot Issues

System Won’t Boot

User Account Issues

User Locked Out

Forgotten Root Password

Related Topics