Quick reference for resolving common Linux issues encountered in MSP environments.

Package Management Issues

Cannot Acquire Lock (apt/dpkg)

Symptoms

  • Error: “Could not get lock /var/lib/dpkg/lock-frontend”
  • Package installation fails
  • Updates blocked

Troubleshooting Steps

1. Check for Running Package Managers

Lang: bash
# Check for running apt/dpkg processes
ps aux | grep -E 'apt|dpkg'

# Find processes using dpkg lock
lsof /var/lib/dpkg/lock-frontend

2. Wait for Unattended Upgrades

Lang: bash
# Check if unattended-upgrades is running
systemctl status unattended-upgrades

# Wait for it to complete or stop it
systemctl stop unattended-upgrades

3. Remove Stale Locks (Last Resort)

Lang: bash
# Only if no package manager is actually running
sudo rm /var/lib/dpkg/lock-frontend
sudo rm /var/lib/dpkg/lock
sudo rm /var/cache/apt/archives/lock
sudo dpkg --configure -a
sudo apt update

Broken Packages

Symptoms

  • Package installation fails with dependency errors
  • “unmet dependencies” messages
  • System in inconsistent state

Fix Steps

Lang: bash
# Fix broken packages
sudo apt --fix-broken install

# Reconfigure packages
sudo dpkg --configure -a

# Clean package cache
sudo apt clean
sudo apt autoclean

# Update and try again
sudo apt update
sudo apt upgrade

Repository Issues

Symptoms

  • “Failed to fetch” errors
  • GPG key errors
  • Repository not found

Troubleshooting

Lang: bash
# Check repository configuration
cat /etc/apt/sources.list
ls /etc/apt/sources.list.d/

# Test repository connectivity
ping archive.ubuntu.com
curl -I http://archive.ubuntu.com

# Fix GPG key issues
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys [KEY_ID]

# Or for newer systems
sudo gpg --keyserver keyserver.ubuntu.com --recv-keys [KEY_ID]

Service Management (systemd)

Service Won’t Start

Symptoms

  • Service fails to start
  • Status shows “failed” or “inactive”
  • Error messages in logs

Diagnostic Commands

Lang: bash
# Check service status
systemctl status servicename.service

# View recent logs
journalctl -u servicename.service -n 50

# Check for errors since last boot
journalctl -u servicename.service -b

# View real-time logs
journalctl -u servicename.service -f

Common Fixes

Lang: bash
# Reload systemd after config changes
systemctl daemon-reload

# Reset failed state
systemctl reset-failed servicename.service

# Check service file syntax
systemd-analyze verify /etc/systemd/system/servicename.service

# Restart service with verbose output
systemctl restart servicename.service -l

Service Keeps Restarting

Symptoms

  • Service status shows “activating” repeatedly
  • Rapid restarts in logs
  • Application crashes immediately

Investigation

Lang: bash
# Check restart configuration
systemctl cat servicename.service | grep -i restart

# View crash logs
coredumpctl list servicename
coredumpctl info [PID]

# Check resource limits
systemctl show servicename.service | grep -i limit

# Test service in foreground
# (Stop the service first, then run the actual command manually)

Enable Service on Boot

Symptoms

  • Service doesn’t start after reboot
  • Manual start required

Fix

Lang: bash
# Enable service
systemctl enable servicename.service

# Enable and start immediately
systemctl enable --now servicename.service

# Verify it's enabled
systemctl is-enabled servicename.service

# List all enabled services
systemctl list-unit-files --state=enabled

Disk Space Issues

Root Filesystem Full

Symptoms

  • “No space left on device” errors
  • Services failing to start
  • Cannot create files

Quick Diagnosis

Lang: bash
# Check disk usage
df -h

# Find large directories
du -h / | sort -rh | head -20

# Find large files
find / -type f -size +100M -exec ls -lh {} \;

# Check inode usage (sometimes the problem)
df -i

Common Culprits

1. Log Files

Lang: bash
# Check log directory size
du -sh /var/log/*

# Find large log files
find /var/log -type f -size +100M

# Truncate (don't delete) active log files
truncate -s 0 /var/log/large-file.log

# Clean old journal logs
journalctl --vacuum-time=7d
journalctl --vacuum-size=500M

2. Package Cache

Lang: bash
# Clean apt cache (Ubuntu/Debian)
apt clean
apt autoclean

# Clean yum cache (RHEL/CentOS)
yum clean all

# Remove old kernels (Ubuntu)
apt autoremove --purge

3. Docker/Container Images

Lang: bash
# Remove unused Docker resources
docker system prune -a

# Remove dangling volumes
docker volume prune

4. Temporary Files

Lang: bash
# Clean /tmp (be careful)
find /tmp -type f -atime +7 -delete

# Clean user cache
rm -rf ~/.cache/*

Partition Mounted Read-Only

Symptoms

  • Cannot write to filesystem
  • “Read-only file system” errors
  • Usually after filesystem errors

Fix Steps

Lang: bash
# Check filesystem for errors (requires unmount or single-user mode)
fsck /dev/sda1

# Remount read-write
mount -o remount,rw /

# Check for hardware issues
dmesg | grep -i error
smartctl -a /dev/sda

Permission Issues

Permission Denied

Symptoms

  • Cannot access files or directories
  • “Permission denied” errors
  • Scripts won’t execute

Troubleshooting

Lang: bash
# Check file permissions
ls -la /path/to/file

# Check ownership
ls -l /path/to/file

# Check if file is executable
file /path/to/file

Common Fixes

Lang: bash
# Fix ownership
chown user:group /path/to/file

# Fix permissions
chmod 644 /path/to/file      # rw-r--r--
chmod 755 /path/to/script    # rwxr-xr-x
chmod 700 /path/to/private   # rwx------

# Fix directory permissions recursively
chown -R user:group /path/to/directory
chmod -R 755 /path/to/directory

# Make script executable
chmod +x /path/to/script.sh

SELinux Blocking Access

Symptoms

  • Permission denied despite correct ownership/permissions
  • SELinux in enforcing mode
  • AVC denial messages in logs

Diagnosis

Lang: bash
# Check SELinux status
getenforce

# Check for denials
ausearch -m avc -ts recent

# Check file context
ls -Z /path/to/file

# Check process context
ps auxZ | grep process_name

Fixes

Lang: bash
# Temporarily set to permissive (for testing only)
setenforce 0

# Restore default context
restorecon -v /path/to/file

# Change context permanently
semanage fcontext -a -t httpd_sys_content_t "/web/content(/.*)?"
restorecon -Rv /web/content

# Re-enable enforcing
setenforce 1

SSH Connection Issues

Connection Refused

Symptoms

  • “Connection refused” error
  • Cannot connect remotely
  • SSH port not responding

Troubleshooting

Lang: bash
# Check if SSH is running
systemctl status sshd

# Check if listening on port
netstat -tlnp | grep :22
ss -tlnp | grep :22

# Check firewall
iptables -L -n | grep 22
firewall-cmd --list-all

# Test locally
ssh localhost

Fixes

Lang: bash
# Start SSH service
systemctl start sshd
systemctl enable sshd

# Open firewall port
firewall-cmd --permanent --add-service=ssh
firewall-cmd --reload

# Or with iptables
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

Permission Denied (publickey)

Symptoms

  • SSH key authentication fails
  • Falls back to password (if enabled)
  • “Permission denied (publickey)” error

Check Server Side

Lang: bash
# Verify SSH directory permissions
ls -la ~/.ssh

# Should be:
# ~/.ssh = 700
# ~/.ssh/authorized_keys = 600

# Fix permissions
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

# Check authorized_keys content
cat ~/.ssh/authorized_keys

# Check SSH logs
tail -f /var/log/auth.log  # Ubuntu/Debian
tail -f /var/log/secure    # RHEL/CentOS

Check Client Side

Lang: bash
# Verify private key permissions
ls -la ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa

# Test with verbose output
ssh -v user@hostname

# Specify key explicitly
ssh -i ~/.ssh/specific_key user@hostname

Too Many Authentication Failures

Symptoms

  • “Received disconnect: Too many authentication failures”
  • Multiple key attempts failing

Fix

Lang: bash
# Disable automatic key loading
ssh -o IdentitiesOnly=yes user@hostname

# Or add to ~/.ssh/config
Host hostname
    IdentitiesOnly yes
    IdentityFile ~/.ssh/specific_key

Network Issues

Network Interface Not Starting

Symptoms

  • No network connectivity
  • Interface shows “down”
  • IP address not assigned

Troubleshooting

Lang: bash
# List interfaces
ip link show

# Check interface status
ip addr show eth0

# Check NetworkManager status
systemctl status NetworkManager

# Check for errors
journalctl -u NetworkManager -n 50

Fixes

Lang: bash
# Bring interface up
ip link set eth0 up

# Restart networking (Ubuntu 18.04+)
systemctl restart systemd-networkd

# Restart NetworkManager
systemctl restart NetworkManager

# Renew DHCP lease
dhclient -r
dhclient eth0

DNS Resolution Failing

Symptoms

  • Cannot resolve hostnames
  • “Name or service not known” errors
  • Ping IP works, ping hostname fails

Diagnosis

Lang: bash
# Check DNS configuration
cat /etc/resolv.conf

# Test DNS resolution
nslookup google.com
dig google.com
host google.com

# Test with specific DNS server
nslookup google.com 8.8.8.8

Fixes

Lang: bash
# Edit resolv.conf (temporary - will be overwritten)
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf

# For systemd-resolved systems
systemctl restart systemd-resolved

# Check if resolv.conf is a symlink
ls -la /etc/resolv.conf

# If using NetworkManager
nmcli connection modify "Connection Name" ipv4.dns "8.8.8.8 1.1.1.1"
nmcli connection up "Connection Name"

High CPU/Memory Usage

Identify Resource Hog

Symptoms

  • System sluggish
  • High load average
  • Applications slow or unresponsive

Diagnostic Commands

Lang: bash
# Real-time process monitoring
top
htop

# Sort by CPU
top -o %CPU

# Sort by memory
top -o %MEM

# List top processes
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head

# Check load average
uptime
cat /proc/loadavg

Memory Investigation

Lang: bash
# Memory usage summary
free -h

# Detailed memory info
cat /proc/meminfo

# Per-process memory
ps aux | awk '{print $6/1024 " MB\t" $11}' | sort -n

# Check for memory leaks
valgrind --leak-check=full ./program

Common Fixes

Lang: bash
# Kill misbehaving process
kill -15 [PID]      # Graceful
kill -9 [PID]       # Force

# Restart service
systemctl restart servicename

# Clear page cache (safe)
sync; echo 1 > /proc/sys/vm/drop_caches

# Clear dentries and inodes (safe)
sync; echo 2 > /proc/sys/vm/drop_caches

# Clear all caches (safe)
sync; echo 3 > /proc/sys/vm/drop_caches

Boot Issues

System Won’t Boot

Common Scenarios and Fixes

Grub Rescue Mode

Lang: bash
# List partitions
ls

# Find root partition
ls (hd0,1)/
ls (hd0,2)/

# Set root and boot
set root=(hd0,1)
set prefix=(hd0,1)/boot/grub
insmod normal
normal

Emergency Mode

Lang: bash
# At grub menu, edit entry (press 'e')
# Add to linux line:
systemd.unit=rescue.target

# Or for single user mode:
single

# Or:
systemd.unit=emergency.target

Repair Grub

Lang: bash
# From live USB
sudo mount /dev/sda1 /mnt
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys
sudo chroot /mnt

# Reinstall grub
grub-install /dev/sda
update-grub

# Exit and reboot
exit
sudo reboot

User Account Issues

User Locked Out

Symptoms

  • “Account locked” message
  • Too many failed login attempts
  • pam_faillock active

Unlock Account

Lang: bash
# Check if account is locked
passwd -S username

# Unlock account
passwd -u username

# For faillock (failed login attempts)
faillock --user username --reset

# Check failed attempts
faillock --user username

Forgotten Root Password

Recovery Steps

Lang: bash
# 1. Reboot and enter single user mode at grub
# 2. At grub, press 'e' and add to linux line:
rw init=/bin/bash

# 3. Mount filesystem
mount -o remount,rw /

# 4. Change password
passwd root

# 5. Reboot
exec /sbin/init