Linux Common Issues
Common Linux problems and solutions for MSP environments
Table of Contents
Quick reference for resolving common Linux issues encountered in MSP environments.
Package Management Issues
Cannot Acquire Lock (apt/dpkg)
Symptoms
- Error: “Could not get lock /var/lib/dpkg/lock-frontend”
- Package installation fails
- Updates blocked
Troubleshooting Steps
1. Check for Running Package Managers
# Check for running apt/dpkg processes
ps aux | grep -E 'apt|dpkg'
# Find processes using dpkg lock
lsof /var/lib/dpkg/lock-frontend2. Wait for Unattended Upgrades
# Check if unattended-upgrades is running
systemctl status unattended-upgrades
# Wait for it to complete or stop it
systemctl stop unattended-upgrades3. Remove Stale Locks (Last Resort)
# Only if no package manager is actually running
sudo rm /var/lib/dpkg/lock-frontend
sudo rm /var/lib/dpkg/lock
sudo rm /var/cache/apt/archives/lock
sudo dpkg --configure -a
sudo apt updateBroken Packages
Symptoms
- Package installation fails with dependency errors
- “unmet dependencies” messages
- System in inconsistent state
Fix Steps
# Fix broken packages
sudo apt --fix-broken install
# Reconfigure packages
sudo dpkg --configure -a
# Clean package cache
sudo apt clean
sudo apt autoclean
# Update and try again
sudo apt update
sudo apt upgradeRepository Issues
Symptoms
- “Failed to fetch” errors
- GPG key errors
- Repository not found
Troubleshooting
# Check repository configuration
cat /etc/apt/sources.list
ls /etc/apt/sources.list.d/
# Test repository connectivity
ping archive.ubuntu.com
curl -I http://archive.ubuntu.com
# Fix GPG key issues
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys [KEY_ID]
# Or for newer systems
sudo gpg --keyserver keyserver.ubuntu.com --recv-keys [KEY_ID]Service Management (systemd)
Service Won’t Start
Symptoms
- Service fails to start
- Status shows “failed” or “inactive”
- Error messages in logs
Diagnostic Commands
# Check service status
systemctl status servicename.service
# View recent logs
journalctl -u servicename.service -n 50
# Check for errors since last boot
journalctl -u servicename.service -b
# View real-time logs
journalctl -u servicename.service -fCommon Fixes
# Reload systemd after config changes
systemctl daemon-reload
# Reset failed state
systemctl reset-failed servicename.service
# Check service file syntax
systemd-analyze verify /etc/systemd/system/servicename.service
# Restart service with verbose output
systemctl restart servicename.service -lService Keeps Restarting
Symptoms
- Service status shows “activating” repeatedly
- Rapid restarts in logs
- Application crashes immediately
Investigation
# Check restart configuration
systemctl cat servicename.service | grep -i restart
# View crash logs
coredumpctl list servicename
coredumpctl info [PID]
# Check resource limits
systemctl show servicename.service | grep -i limit
# Test service in foreground
# (Stop the service first, then run the actual command manually)Enable Service on Boot
Symptoms
- Service doesn’t start after reboot
- Manual start required
Fix
# Enable service
systemctl enable servicename.service
# Enable and start immediately
systemctl enable --now servicename.service
# Verify it's enabled
systemctl is-enabled servicename.service
# List all enabled services
systemctl list-unit-files --state=enabledDisk Space Issues
Root Filesystem Full
Symptoms
- “No space left on device” errors
- Services failing to start
- Cannot create files
Quick Diagnosis
# Check disk usage
df -h
# Find large directories
du -h / | sort -rh | head -20
# Find large files
find / -type f -size +100M -exec ls -lh {} \;
# Check inode usage (sometimes the problem)
df -iCommon Culprits
1. Log Files
# Check log directory size
du -sh /var/log/*
# Find large log files
find /var/log -type f -size +100M
# Truncate (don't delete) active log files
truncate -s 0 /var/log/large-file.log
# Clean old journal logs
journalctl --vacuum-time=7d
journalctl --vacuum-size=500M2. Package Cache
# Clean apt cache (Ubuntu/Debian)
apt clean
apt autoclean
# Clean yum cache (RHEL/CentOS)
yum clean all
# Remove old kernels (Ubuntu)
apt autoremove --purge3. Docker/Container Images
# Remove unused Docker resources
docker system prune -a
# Remove dangling volumes
docker volume prune4. Temporary Files
# Clean /tmp (be careful)
find /tmp -type f -atime +7 -delete
# Clean user cache
rm -rf ~/.cache/*Partition Mounted Read-Only
Symptoms
- Cannot write to filesystem
- “Read-only file system” errors
- Usually after filesystem errors
Fix Steps
# Check filesystem for errors (requires unmount or single-user mode)
fsck /dev/sda1
# Remount read-write
mount -o remount,rw /
# Check for hardware issues
dmesg | grep -i error
smartctl -a /dev/sdaPermission Issues
Permission Denied
Symptoms
- Cannot access files or directories
- “Permission denied” errors
- Scripts won’t execute
Troubleshooting
# Check file permissions
ls -la /path/to/file
# Check ownership
ls -l /path/to/file
# Check if file is executable
file /path/to/fileCommon Fixes
# Fix ownership
chown user:group /path/to/file
# Fix permissions
chmod 644 /path/to/file # rw-r--r--
chmod 755 /path/to/script # rwxr-xr-x
chmod 700 /path/to/private # rwx------
# Fix directory permissions recursively
chown -R user:group /path/to/directory
chmod -R 755 /path/to/directory
# Make script executable
chmod +x /path/to/script.shSELinux Blocking Access
Symptoms
- Permission denied despite correct ownership/permissions
- SELinux in enforcing mode
- AVC denial messages in logs
Diagnosis
# Check SELinux status
getenforce
# Check for denials
ausearch -m avc -ts recent
# Check file context
ls -Z /path/to/file
# Check process context
ps auxZ | grep process_nameFixes
# Temporarily set to permissive (for testing only)
setenforce 0
# Restore default context
restorecon -v /path/to/file
# Change context permanently
semanage fcontext -a -t httpd_sys_content_t "/web/content(/.*)?"
restorecon -Rv /web/content
# Re-enable enforcing
setenforce 1SSH Connection Issues
Connection Refused
Symptoms
- “Connection refused” error
- Cannot connect remotely
- SSH port not responding
Troubleshooting
# Check if SSH is running
systemctl status sshd
# Check if listening on port
netstat -tlnp | grep :22
ss -tlnp | grep :22
# Check firewall
iptables -L -n | grep 22
firewall-cmd --list-all
# Test locally
ssh localhostFixes
# Start SSH service
systemctl start sshd
systemctl enable sshd
# Open firewall port
firewall-cmd --permanent --add-service=ssh
firewall-cmd --reload
# Or with iptables
iptables -A INPUT -p tcp --dport 22 -j ACCEPTPermission Denied (publickey)
Symptoms
- SSH key authentication fails
- Falls back to password (if enabled)
- “Permission denied (publickey)” error
Check Server Side
# Verify SSH directory permissions
ls -la ~/.ssh
# Should be:
# ~/.ssh = 700
# ~/.ssh/authorized_keys = 600
# Fix permissions
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
# Check authorized_keys content
cat ~/.ssh/authorized_keys
# Check SSH logs
tail -f /var/log/auth.log # Ubuntu/Debian
tail -f /var/log/secure # RHEL/CentOSCheck Client Side
# Verify private key permissions
ls -la ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa
# Test with verbose output
ssh -v user@hostname
# Specify key explicitly
ssh -i ~/.ssh/specific_key user@hostnameToo Many Authentication Failures
Symptoms
- “Received disconnect: Too many authentication failures”
- Multiple key attempts failing
Fix
# Disable automatic key loading
ssh -o IdentitiesOnly=yes user@hostname
# Or add to ~/.ssh/config
Host hostname
IdentitiesOnly yes
IdentityFile ~/.ssh/specific_keyNetwork Issues
Network Interface Not Starting
Symptoms
- No network connectivity
- Interface shows “down”
- IP address not assigned
Troubleshooting
# List interfaces
ip link show
# Check interface status
ip addr show eth0
# Check NetworkManager status
systemctl status NetworkManager
# Check for errors
journalctl -u NetworkManager -n 50Fixes
# Bring interface up
ip link set eth0 up
# Restart networking (Ubuntu 18.04+)
systemctl restart systemd-networkd
# Restart NetworkManager
systemctl restart NetworkManager
# Renew DHCP lease
dhclient -r
dhclient eth0DNS Resolution Failing
Symptoms
- Cannot resolve hostnames
- “Name or service not known” errors
- Ping IP works, ping hostname fails
Diagnosis
# Check DNS configuration
cat /etc/resolv.conf
# Test DNS resolution
nslookup google.com
dig google.com
host google.com
# Test with specific DNS server
nslookup google.com 8.8.8.8Fixes
# Edit resolv.conf (temporary - will be overwritten)
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
# For systemd-resolved systems
systemctl restart systemd-resolved
# Check if resolv.conf is a symlink
ls -la /etc/resolv.conf
# If using NetworkManager
nmcli connection modify "Connection Name" ipv4.dns "8.8.8.8 1.1.1.1"
nmcli connection up "Connection Name"High CPU/Memory Usage
Identify Resource Hog
Symptoms
- System sluggish
- High load average
- Applications slow or unresponsive
Diagnostic Commands
# Real-time process monitoring
top
htop
# Sort by CPU
top -o %CPU
# Sort by memory
top -o %MEM
# List top processes
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head
# Check load average
uptime
cat /proc/loadavgMemory Investigation
# Memory usage summary
free -h
# Detailed memory info
cat /proc/meminfo
# Per-process memory
ps aux | awk '{print $6/1024 " MB\t" $11}' | sort -n
# Check for memory leaks
valgrind --leak-check=full ./programCommon Fixes
# Kill misbehaving process
kill -15 [PID] # Graceful
kill -9 [PID] # Force
# Restart service
systemctl restart servicename
# Clear page cache (safe)
sync; echo 1 > /proc/sys/vm/drop_caches
# Clear dentries and inodes (safe)
sync; echo 2 > /proc/sys/vm/drop_caches
# Clear all caches (safe)
sync; echo 3 > /proc/sys/vm/drop_cachesBoot Issues
System Won’t Boot
Common Scenarios and Fixes
Grub Rescue Mode
# List partitions
ls
# Find root partition
ls (hd0,1)/
ls (hd0,2)/
# Set root and boot
set root=(hd0,1)
set prefix=(hd0,1)/boot/grub
insmod normal
normalEmergency Mode
# At grub menu, edit entry (press 'e')
# Add to linux line:
systemd.unit=rescue.target
# Or for single user mode:
single
# Or:
systemd.unit=emergency.targetRepair Grub
# From live USB
sudo mount /dev/sda1 /mnt
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys
sudo chroot /mnt
# Reinstall grub
grub-install /dev/sda
update-grub
# Exit and reboot
exit
sudo rebootUser Account Issues
User Locked Out
Symptoms
- “Account locked” message
- Too many failed login attempts
- pam_faillock active
Unlock Account
# Check if account is locked
passwd -S username
# Unlock account
passwd -u username
# For faillock (failed login attempts)
faillock --user username --reset
# Check failed attempts
faillock --user usernameForgotten Root Password
Recovery Steps
# 1. Reboot and enter single user mode at grub
# 2. At grub, press 'e' and add to linux line:
rw init=/bin/bash
# 3. Mount filesystem
mount -o remount,rw /
# 4. Change password
passwd root
# 5. Reboot
exec /sbin/init