Server Documentation Standards
Standards and templates for server infrastructure documentation
Table of Contents
Comprehensive guide for documenting server infrastructure in MSP and enterprise environments.
Why Document Servers?
Proper server documentation:
- Reduces mean time to repair (MTTR)
- Simplifies onboarding of new team members
- Enables consistent configurations
- Supports compliance and audits
- Facilitates capacity planning
- Critical for disaster recovery
Server Inventory
Essential Server Information
Minimum Documentation per Server:
Hostname: WEBSRV-01
Purpose: Production web server
Environment: Production
OS: Ubuntu Server 22.04 LTS
IP Address: 10.10.10.10
Subnet Mask: 255.255.255.0
Gateway: 10.10.10.1
DNS Servers: 10.10.0.10, 10.10.0.11
VLAN: 10 (Servers)
Hardware:
Type: Physical / Virtual / Cloud
Make/Model: Dell PowerEdge R740 / VMware / AWS EC2 t3.large
Serial Number: SVC1234567
CPU: 2x Intel Xeon Gold 6248R (48 cores total)
RAM: 128GB DDR4
Storage:
- OS Drive: 2x 480GB SSD RAID1
- Data Drive: 6x 1.8TB SAS 10K RAID10
Network: 2x 10GbE (bonded)
Location:
Data Center: HQ-DC1
Rack: A-12
Rack Units: 24-25
Virtualization:
Hypervisor: VMware ESXi 8.0 / Hyper-V / N/A
Host: ESX-HOST-03
Datastore: SAN-PROD-01
Purchase Info:
Vendor: Dell
Purchase Date: 2023-06-15
Warranty Expiration: 2028-06-14
Support Contract: Dell ProSupport 24x7
Asset Tag: ASSET-12345
PO Number: PO-2023-0456
Access:
Console: iDRAC at https://10.10.0.210 / iLO / IPMI
SSH: Yes, port 22
RDP: No
Management URL: https://websrv-01.company.local
Backup:
Method: Veeam Backup & Replication
Schedule: Daily incremental, weekly full
Retention: 30 days daily, 12 months monthly
Last Backup: 2024-11-01 23:00
Backup Size: 120GB
Services Running:
- Nginx 1.24.0 (ports 80, 443)
- PHP-FPM 8.2
- MySQL 8.0.34 (port 3306)
Dependencies:
- Database: DBSRV-01 (MySQL replication slave)
- Storage: SAN-PROD-01 via iSCSI
- Authentication: DC-01 (LDAP)
- Monitoring: Zabbix server at 10.10.0.100
Monitoring:
- Zabbix agent installed
- SNMP enabled (community: public)
- Alerts sent to: ops@company.com
Change Log:
- 2024-10-15: Upgraded Nginx to 1.24.0
- 2024-09-01: Increased RAM from 64GB to 128GB
- 2024-08-10: Migrated to new SAN storage
Notes:
- SSL cert expires 2025-03-01
- Requires monthly Windows updates (2nd Tuesday)
- Database connection pooling configured
- Rate limiting enabled on nginxConfiguration Documentation
Operating System Configuration
Document all OS-level configurations:
OS Configuration:
Hostname: WEBSRV-01
Domain: company.local
Timezone: America/New_York
NTP Servers:
- 10.10.0.10
- 10.10.0.11
Firewall:
Status: Enabled
Rules:
- Allow 80/tcp from 0.0.0.0/0
- Allow 443/tcp from 0.0.0.0/0
- Allow 22/tcp from 10.10.0.0/24
- Allow 3306/tcp from 10.10.20.0/24
Users and Groups:
Local Admins: admin, backup_user
Service Accounts: nginx_svc, mysql_svc
SSH Keys: /root/.ssh/authorized_keys (ops team)
File Systems:
- /dev/sda1: / (root) - 200GB ext4
- /dev/sda2: /var - 100GB ext4
- /dev/sdb1: /data - 2TB xfs
- /dev/sdc1: /backup - 500GB ext4
Network Interfaces:
eth0: 10.10.10.10/24 (Production)
eth1: 10.10.20.10/24 (Backup network)
bond0: eth2+eth3 (10GbE bonded)Application Configuration
Web Application Stack:
Application: Customer Portal
Version: 3.2.1
Installation Path: /var/www/portal
Components:
Web Server:
Software: Nginx 1.24.0
Config: /etc/nginx/nginx.conf
Sites: /etc/nginx/sites-enabled/
SSL Cert: /etc/ssl/certs/portal.company.com.crt
SSL Key: /etc/ssl/private/portal.company.com.key
Application:
Runtime: PHP 8.2-FPM
Config: /etc/php/8.2/fpm/php.ini
Pool Config: /etc/php/8.2/fpm/pool.d/www.conf
Max Workers: 50
Memory Limit: 256M
Database:
Type: MySQL 8.0.34
Config: /etc/mysql/my.cnf
Data Dir: /var/lib/mysql
Port: 3306
Max Connections: 200
Buffer Pool Size: 16GB
Dependencies:
- Redis 7.0 (session cache) - port 6379
- Memcached 1.6 (object cache) - port 11211
- Elasticsearch 8.10 (search) - port 9200
Environment Variables:
APP_ENV: production
DB_HOST: 10.10.10.11
DB_NAME: portal_prod
DB_USER: portal_app
REDIS_HOST: 10.10.10.12
API_KEY: [stored in vault]
Cron Jobs:
- 0 2 * * * /var/www/portal/scripts/daily_cleanup.sh
- */15 * * * * /var/www/portal/scripts/queue_worker.php
- 0 0 * * 0 /var/www/portal/scripts/weekly_report.shSecurity Documentation
Security Hardening
Document all security configurations:
Patch Management:
- OS patches: Monthly (2nd Tuesday)
- Application patches: As needed, tested in staging first
- Last patched: 2024-10-08
Access Control:
- SSH: Key-based auth only, no password login
- Sudo: Limited to ops team members
- Service accounts: No interactive login
- MFA: Required for all admin access
Encryption:
- Data at rest: LUKS encryption on /data partition
- Data in transit: TLS 1.2+ only, strong ciphers
- Database: Encrypted connections required
Logging and Auditing:
- Syslog forwarding: To SIEM at 10.10.0.100
- Audit logs: /var/log/audit/
- Retention: 90 days local, 1 year in SIEM
- Monitored events: Login attempts, sudo usage, file changes
Vulnerability Scanning:
- Tool: Nessus
- Schedule: Weekly
- Last scan: 2024-11-01
- Critical vulns: 0
Compliance Requirements
Industry Standards:
- SOC 2 Type II: Yes
- PCI DSS: N/A
- HIPAA: No
- GDPR: Yes (EU customer data)
Required Controls:
- Access logging enabled
- Data encryption at rest and in transit
- Regular vulnerability scanning
- Incident response procedures
- Backup verification
Maintenance and Operations
Maintenance Schedule
Daily Tasks:
- 00:00: Full backup starts
- 02:00: Database optimization
- 03:00: Log rotation
- 04:00: Cleanup temp files
Weekly Tasks:
- Sunday 01:00: Security scan
- Sunday 02:00: Weekly report generation
- Sunday 23:00: Full system backup
Monthly Tasks:
- 2nd Tuesday: OS patches (maintenance window)
- Last Sunday: Certificate renewal check
- 1st Monday: Capacity review
Quarterly Tasks:
- Disaster recovery test
- Access review
- Documentation review
- Vulnerability assessmentPerformance Baselines
Normal Operating Metrics:
CPU Usage:
Average: 25-35%
Peak: 60-70% (business hours)
Alert Threshold: >80% for 15 min
Memory Usage:
Average: 60GB/128GB (47%)
Alert Threshold: >90% (115GB)
Disk I/O:
Read: 50-100 MB/s average
Write: 20-50 MB/s average
IOPS: 1000-2000 average
Alert Threshold: >80% capacity
Network:
Inbound: 100-200 Mbps average
Outbound: 50-100 Mbps average
Connections: 500-1000 concurrent
Alert Threshold: >8 Gbps sustained
Application Metrics:
Response Time: <200ms (95th percentile)
Request Rate: 1000-2000 req/sec
Error Rate: <0.1%
Active Sessions: 500-1500Disaster Recovery
Recovery Procedures
Recovery Time Objective (RTO): 4 hours Recovery Point Objective (RPO): 1 hour
Disaster Recovery Steps:
Server Failure:
- Restore from last full backup to standby hardware
- Update DNS to point to standby server
- Verify application functionality
- Restore incremental backups if needed
- Monitor for issues
Data Corruption:
- Identify last known good backup
- Restore to staging environment
- Verify data integrity
- Promote to production during maintenance window
Ransomware:
- Isolate affected server (disconnect network)
- Preserve evidence for forensics
- Rebuild server from clean image
- Restore data from backup (verify backup is clean)
- Restore services incrementally
- Conduct security review
Backup Restoration Test:
- Frequency: Quarterly
- Last Test: 2024-10-01
- Result: Success - 3.5 hour recovery time
- Issues: None
- Next Test: 2025-01-01
Runbooks and Procedures
Common Procedures
Server Restart Procedure:
# 1. Notify stakeholders (30 min advance notice)
# 2. Stop application gracefully
systemctl stop nginx
systemctl stop php8.2-fpm
# 3. Stop database with proper shutdown
systemctl stop mysql
# 4. Reboot server
reboot
# 5. Verify services after boot
systemctl status mysql
systemctl status php8.2-fpm
systemctl status nginx
# 6. Test application access
curl -I https://portal.company.com
# 7. Monitor logs for errors
tail -f /var/log/nginx/error.log
tail -f /var/log/mysql/error.logSSL Certificate Renewal:
# 1. Generate CSR
openssl req -new -key /etc/ssl/private/portal.company.com.key \
-out /tmp/portal.company.com.csr
# 2. Submit CSR to CA (DigiCert)
# 3. Download new certificate files
# 4. Backup old certificates
cp /etc/ssl/certs/portal.company.com.crt \
/etc/ssl/certs/portal.company.com.crt.backup
# 5. Install new certificate
cp new_cert.crt /etc/ssl/certs/portal.company.com.crt
cp intermediate.crt /etc/ssl/certs/portal_intermediate.crt
# 6. Test certificate
openssl x509 -in /etc/ssl/certs/portal.company.com.crt -text -noout
# 7. Reload nginx
nginx -t && systemctl reload nginx
# 8. Verify in browser and with SSL checkerApplication Deployment:
- Update code from Git repository
- Run database migrations if needed
- Clear application cache
- Restart PHP-FPM
- Verify no errors in logs
- Smoke test critical paths
- Monitor error rates for 30 minutes
Database Backup Verification:
# Weekly backup verification script
# 1. Restore backup to test database
mysql -h test-db-server -u root -p < /backup/portal_backup.sql
# 2. Verify table counts
mysql -h test-db-server -u root -p -e \
"SELECT COUNT(*) FROM portal_test.users;"
# 3. Check data integrity
mysql -h test-db-server -u root -p -e \
"SELECT MAX(created_at) FROM portal_test.orders;"
# 4. Document results in backup logNetwork Documentation
Network Connectivity
Network Segments:
Production VLAN:
VLAN ID: 10
Subnet: 10.10.10.0/24
Gateway: 10.10.10.1
Purpose: Production servers
Database VLAN:
VLAN ID: 20
Subnet: 10.10.20.0/24
Gateway: 10.10.20.1
Purpose: Database servers
Backup VLAN:
VLAN ID: 30
Subnet: 10.10.30.0/24
Gateway: 10.10.30.1
Purpose: Backup traffic
Firewall Rules:
- Production VLAN → Database VLAN: port 3306
- Production VLAN → Internet: ports 80, 443 (outbound)
- Ops Network → Production VLAN: port 22
- Backup VLAN → Storage: iSCSI ports
Load Balancer:
VIP: 10.10.10.100
Backend Servers:
- WEBSRV-01: 10.10.10.10
- WEBSRV-02: 10.10.10.11
Algorithm: Least connections
Health Check: GET /health HTTP/1.1
SSL Offloading: EnabledDiagrams and Visual Documentation
Required Diagrams
Network Diagram:
- Physical topology showing switches, routers, firewalls
- Logical topology showing VLANs and subnets
- External connections (internet, VPN, site-to-site)
Application Architecture:
- Load balancer configuration
- Web tier (Nginx servers)
- Application tier (PHP-FPM)
- Data tier (MySQL, Redis, Elasticsearch)
- Integration points (APIs, external services)
Data Flow:
- User request flow through load balancer
- Database replication topology
- Backup data flow
- Log aggregation flow
Rack Elevation:
- Physical server locations in rack
- Network switch locations
- PDU connections
- Cable management
Change Management
Change Documentation Template
Change Request: CR-2024-1234
Date: 2024-11-01
Requested By: John Doe
Implemented By: Jane Smith
Priority: Medium
Description:
Upgrade MySQL from 8.0.33 to 8.0.34 for security patches
Impact Analysis:
Systems Affected: WEBSRV-01, DBSRV-01
Downtime Required: 15 minutes
Risk Level: Low
Rollback Plan: Restore from snapshot
Testing:
- Tested in staging environment
- Backup verified before change
- Rollback procedure documented
Implementation Steps:
- Stop application
- Backup database
- Upgrade MySQL packages
- Run mysql_upgrade
- Restart MySQL
- Verify replication status
- Start application
- Monitor for 1 hour
Verification:
- mysql --version shows 8.0.34
- All services running normally
- No errors in logs
- Application responding correctly
- Replication lag: 0 seconds
Post-Implementation:
- Update documentation
- Notify stakeholders
- Schedule follow-up review in 1 weekDocumentation Best Practices
Maintenance Guidelines
- Keep Current: Review monthly, update immediately after changes
- Version Control: Use Git for documentation files
- Access Control: Sensitive info (passwords, keys) in separate vault
- Automation: Generate inventory from monitoring tools where possible
- Templates: Use consistent templates across all servers
- Validation: Quarterly audit to verify accuracy
Information to NEVER Document in Plain Text
- Passwords or API keys
- Private encryption keys
- Personal data
- Credit card information
- Social Security numbers
Instead use:
- Password manager (1Password, LastPass)
- Secrets management (HashiCorp Vault, AWS Secrets Manager)
- Environment variables
- Encrypted configuration files
Documentation Storage
Recommended Tools:
- Wiki: Confluence, BookStack, Gitea Wiki
- Version Control: GitLab, GitHub, Bitbucket
- Diagrams: Draw.io, Lucidchart, Visio
- Password Vault: 1Password, Bitwarden, KeePass
- CMDB: ServiceNow, Device42, Netbox